Author: Heather Holt
Abstract
This paper examines the relationship between health and labour force participation using data from the first three waves of the Survey of Family, Income and Employment (SoFIE) (2002/05). Using various health measures, the results show that health is significantly related to labour force participation, even after accounting for certain types of endogeneity.
The results of the standard regression models including individual chronic diseases indicate that five out of the nine chronic diseases considered have a significant negative relationship with labour force participation once other factors are controlled for. These diseases are: psychiatric conditions (depression, manic depression or schizophrenia); stroke; heart disease; diabetes and high blood pressure. For psychiatric conditions, stroke and diabetes the negative relationship with fulltime work is larger than that for parttime work (ie, the chance of working fulltime rather than being inactive is reduced more than the reduction in the chance of working parttime rather than being inactive). This suggests that the presence of these diseases is associated not only with lower participation but also with working fewer hours.
Various modelling techniques and a more general measure of overall health (selfrated health) are then used to account for possible endogeneity. The results of these models indicate that poorer selfrated health is associated with a reduced chance of participating in the labour force. The relationship between selfrated health and labour market participation is found to be significant even when timeconstant unobserved variables are controlled for and when selfrated health is adjusted to account for possible rationalisation of labour force participation using selfrated health. More specifically, a health shock (measured using adjusted or unadjusted selfrated health) was found to be associated with a reduction in the chance of participating. While the results from all models are in a similar direction, they have different strengths and the preferred estimators are those from the fixed effects model.
Using various assumptions, the model results were used to estimate the impact at the economy level. The point estimates from these models indicate that if there was an improvement in health (ie, no negative health shocks and/or everyone had excellent average health) an additional 12,700 to 66,800 people may participate; that represents a 0.7% to 3.6% increase in the total number of people participating. Based on the limitations of the models discussed in the paper it is more sensible to assume that, if there was an improvement in health, the additional number of people who may participate is likely to be between 5,300 and 38,700; that is, a 0.3% to 2.1% increase in the total number of people participating.
The results do not control for unobserved variables that vary over time. They also do not allow for the “feedback effect”; that is, that participation could influence health. As such, the results do not address causality but only establish relationships between health and participation. Feasible instruments were explored to try to instrument health, thus making it possible to take into account both unobserved variables that change over time and causality, but no suitable instrument was found.
Acknowledgements
Thank you to Dean Hyslop, Steve Stillman and Katy Henderson for their advice and guidance throughout. Thanks also to Kristie Carter, Tony Burton, Grant Scobie, Gerald Minnee, Ken Richardson and Martin Tobias for providing useful comments.
The Health Research Council of New Zealand, and Health Inequalities Research Programme of the University of Otago, Wellington, are acknowledged for funding and establishing the SoFIEHealth data utilised in this publication.
Disclaimer
The views, opinions, findings and conclusions or recommendations expressed in this Working Paper are strictly those of the authors. Theydo not necessarily reflect the views of the New Zealand Treasury. The Treasury takes no responsibility for any errors or omissions in, or for the correctness of, the information contained in these Working Papers. The paper is presented not as policy, but with a view to inform and stimulate wider debate.
1 Introduction
Health is a key factor in a person's ability to develop their skills and knowledge. The mix of skills, knowledge and capabilities that a person possesses (their human capital) is positively related to their productivity and the demand for their labour. If poor health is a barrier to developing or using skills, then improving health could raise labour force participation and economic output. In addition, if poor health reduces the number of hours worked, or lowers productivity when at work, then further output could be lost. The costs of treating poor health and the value of lost output are measures of the economic cost of ill health. A better understanding of the relationships between health and labour market participation is a first step towards estimating these costs.
Chronic diseases are of particular health interest as they are a major component of ill health and deaths in New Zealand, and could place even greater burdens on the health system over time. Furthermore, the incidence of chronic disease is partly driven by lifestylerelated risk factors such as unhealthy diet and tobacco consumption that can potentially be modified. In 2005, around threequarters of deaths in New Zealand resulted from chronic diseases, a proportion that has been rising in recent years.^{[1]} In countries such as New Zealand, that have an ageing population, understanding this relationship becomes even more important as more people reach the lifestage (note: there are different views about how ageing might affect morbidity as longevity rises) at which their health tends to deteriorate and affect their labour market behaviour (Currie and Madrian, 1999). If both prevalence of, and deaths from, chronic disease continue to rise, there may be significant longterm negative economic impacts arising from increased health care costs and lower labour market participation.
This paper assesses the relationship between health and labour market participation for working age adults in New Zealand. Limited data means there has been little research into the effect of health on labour market participation in New Zealand. However, the inclusion of a detailed health module in the third wave of the longitudinal Survey of Family, Income and Expenditure (SoFIE) has allowed such analysis to be undertaken.
Section 2 of this paper summarises other work done in this area, while Section 3 describes the data used in the paper. Section 4 summarises the methods used, Section 5 reviews the results of the relationship between chronic diseases and labour market participation, Section 6 summarises the results of the relationship between selfrated health and labour market participation and Section 7 concludes. Section 8 presents estimates of the potential impact at the population level; based on the individual level results. Full details of the variables used, methods and the model results can be found in the appendices.
The paper is not a review of current health policy or spending; the focus is identifying relationships (if any) between health and labour force participation. Where any relationships are established, the paper does not attempt to assess how changes in current health policies may interact with these relationships. For example, the case for investing more resources in managing particular chronic diseases to improve labour market participation would require evidence on: how far such investments might reduce the incidence and prevalence of that disease; and how that, in turn, might affect labour market behaviour. This paper does not address such evidence.
Notes
 [1]Figure based on data from the New Zealand Health Information Service.
2 Previous studies
Previous research in New Zealand has identified extensive interactions between health and human capital development (Biddulph, F., Biddulph, J. and Biddulph, C., 2003). However, most work has focused on the impact of poor health on the human capital development of young people, rather than the impact of poor health in later life. One health related measure is the presence of a disability. A recent paper using the New Zealand Disability Survey found that all of the six disabilities considered had a negative impact on employment.[2] In addition, for all disabilities other than hearing, increased severity of the disability was found to reduce the rate of employment (Jensen et al, 2005). This work also found that the impact of disability on fulltime employment was much larger than for total employment (fulltime and parttime).
Another healthrelated measure is injury. A paper using Statistics New Zealand's Linked EmployerEmployee Database (LEED) estimated the effects of injuries on employment (Crichton, Stillman and Hyslop, 2007). Crichton et al found that injuries resulting in more than three months of earnings compensation have negative effects on future labour market outcomes; with the magnitude of these effects increasing with injury duration. While disability and injury are possible indicators of health, more direct measures, such as the presence of chronic disease, are better measures of poor health. No New Zealand studies examining the impact of chronic diseases on labour market participation were found.
Interest in the relationship between health and labour market participation is not confined to New Zealand. Literature reviews (Currie and Madrian, 1999; Chirikos, 1993, in Currie and Madrian, 1999) have identified considerable evidence linking health and labour market activity, but wide disagreement on the magnitude of the effect. Numerous papers using US data suggest a strong link between health and labour market participation. In 1989, Stern found that health problems limiting the amount of work that can be done and poor selfrated health reduced the probability of labour market participation. While looking at the relationship between health and retirement in the later part of working life, (Bound, Schoenbaum, Stienbrickner and Waidmann, 1999) found that poorer health lead many older workers to withdraw from the labour force.
Evidence from the US on the relationship between labour force participation and health is not directly applicable to New Zealand. For instance, those with poorer health in the US may be motivated to participate in the labour force as health insurance is often tied to employment (Cai and Kalb, 2006). As such, a better comparator may be Australia or the UK. A few recent papers using the Australian equivalent of SoFIE (the Household, Income and Labour Dynamics in Australia (HILDA)) have examined the relationship between health and participation. Using data from HILDA, Cai and Kalb (2006) examined the effect of selfrated health on labour force participation for men and women of working age. They found that health was positively associated with participation for four groups (younger males, younger females, older males and older females) even after controlling for the fact that labour force participation may in turn affect health. Further work by Cai (2007) confirmed these findings.
Work by the Australian Productivity Commission examined the impact of chronic diseases on labour market participation (Laplagne, Glover and Shomos, 2007). The chronic diseases considered were cancer, cardiovascular disease, mental/nervous condition, major injury, diabetes and arthritis. They found that absence of chronic diseases can result in substantially greater labour force participation for those affected again even after using different methods to allow for unobserved variables that may affect labour force participation and to allow for the fact that participating in the labour market may in turn affect health. Of the six health conditions considered, mental health or a nervous condition had the largest impact on labour market participation.
Turning to evidence from Britain, work by the Institute of Fiscal Studies, using the British Household Panel survey, examined the role of ill health in retirement decisions (Disney, Emmerson and Wakefield, 2003). They found that deterioration in an individual's selfreported health was strongly associated with movements out of work.
Notes
 [2]The disabilities considered included vision; hearing; restricted mobility; restricted coordination; learning/memory; and psychological disabilities.
3 Data
3.1 Survey methodology
The Survey of Family, Income and Employment (SoFIE) is the main data source analysed in this paper. SoFIE is a survey of a nationally representative sample of New Zealand permanent residents in private households. It is conducted by Statistics New Zealand. The core SoFIE survey modules include questions on: demographics; dependent children; labour force involvement; education; family; and income. All respondents in the original sample are followed over time, even if their household or family circumstances change, forming a longitudinal sample. The survey commenced in 2002 and will continue until 2010. When the present study was undertaken, there were three waves of data available for analysis (SoFIE Waves 13 Version 4). Further information on the survey methodology can be found in Appendix B.
3.2 Population and sample of interest
The analysis is based on those people who remain eligible and respond in Waves 13 who are aged 15 and over at the end of the reference period in Wave 1, as this is the group that were asked the health module in Wave 3. The results are therefore representative of the usual adult resident population of New Zealand who lived in private dwellings on the main islands of New Zealand in 2002/03 and who remain alive and are noninstitutionalised by 2004/05. Those over working age or who are fulltime students in each wave are excluded from the analysis.
As with all surveys, not all those approached to take part agree to participate. In addition, those who initially respond may choose not to respond in subsequent waves of the survey (attrition). While the response rates are good compared with similar surveys, longitudinal response rates were lower for those of fair or poor health compared with those of better health. Statistics New Zealand provides a standard longitudinal weight that accounts for nonresponse and aligns the composition of the sample with that of the New Zealand population in October 2002 in terms of age, gender and Māori. However, the weights do not completely restore the distribution of people across the health states.
For these reasons the results in this paper reflect the SoFIE population, who are likely to be somewhat healthier than both the population it aims to represent and the New Zealand population more generally. More specifically, those with the most severe health conditions considered may die or be institutionalised, and so are not covered by the survey results used in this analysis. Therefore, the impact of the health conditions considered in this study on labour force participation may be higher than the results based on SoFIE suggest. Further information on the limitations and strengths of SoFIE more generally can be found in Appendix B.
4 Measurement and methods
4.1 Measurement of labour market activity
Labour market activity at the household interview date is used for this analysis. Two breakdowns of labour market activity are used: labour market participationand labour market outcome.
The main focus of the report will be on labour market participation; that is:
 participating (working fulltime or parttime (including unpaid work) or being unemployed (that is not working but actively looking for work))
 not participating (that is, not working and not looking for work so that the person is economically inactive).[3]
Labour market outcome is also briefly considered; that is:
 fulltime paid or unpaid work (30 hours or more on average in a week)
 parttime paid or unpaid work (less than 30 hours on average in a week)
 unemployed
 inactive.
4.2 Measurement of health
In Wave 3 of the survey respondents were asked a detailed set of health questions. Hence a respondent's health status could be linked to their current and previous labour market outcomes to see what relationships could be established. Two measures of health are available in all three waves of the survey: the presence of chronic diseases (derived from Wave 3 responses); and selfrated health. Neither provide perfect measures of ill health (the subsections below provide further discussion of the problems with each health measure). In a review of the literature, Currie and Madrian (1999) concluded that the effects of health on labour supply are sensitive to the way health is measured, so a range of health measures need to be considered to properly understand the impact of health on labour market status. For these reasons this paper summarises and compares results using each of the available health measures in turn.
4.2.1 Chronic diseases
The health module asked respondents if, before the interview date, they have ever been told by a doctor that they have any of the following eight health conditions:
 asthma
 high blood pressure
 high cholesterol
 heart disease
 diabetes (other than during pregnancy for women)
 stroke
 migraines
 psychiatric conditions (depression, manic depression or schizophrenia).
The inclusion of these eight health conditions on the survey defined the conditions to be considered in this report (with the addition of cancer). They are loosely termed “chronic diseases”, a term that has been used by others to refer to similar groups of diseases (DeVol and Bedroussian, 2007). Chronic diseases represent a diverse mix of health conditions. For example, the characteristics of migraines, which are a series of often infrequent brief, acute episodes separated by long periods with no functional loss, are very different from those of cancer. And even cancer covers a large mix of disease characteristics. Some chronic conditions, such as high blood pressure and high cholesterol, are in fact risk factors for diseases. This should be borne in mind when interpreting the results.
As well as the detailed information on each individual disease, a summary variable that indicates the presence of one or more chronic diseases is also used. For people who reported having a particular disease, the age at diagnosis was asked for diseases other than psychiatric conditions. This age of diagnosis was used to estimate the number of years since a disease was diagnosed. The presence of chronic diseases is only asked in Wave 3. For all diseases other than psychiatric conditions, the derived number of years since diagnosis was used to measure its presence in Waves 1 and 2. Diagnosis of mental illnesses (other than depression) almost always have onset in childhood and adolescence. After analysis of the group who had this disease in Wave 3, all these respondents were assumed to have had the disease in Waves 1 and 2. While this may not be the case for all respondents, the assumption is likely to hold for the majority.
The number of years since diagnosis was also used in combination with the presence of chronic disease information to break those with a disease into two groups. Using asthma as an example, this resulted in a variable with the following categories:
 No diagnosis of asthma
 Asthma diagnosed in the last 5 years
 Asthma diagnosed more than 5 years ago.
While the age of diagnosis variable is useful for estimating the time since the onset of each health condition there are likely to be issues with respondents being able to accurately recall this information, especially if this was some years in the past. This should be borne in mind when assessing the results. This is one of the reasons that the time since diagnosis variables were not disaggregated further.
An additional disease of interest not covered in the SoFIE questionnaire is cancer. SoFIE respondents were asked to give permission for their data to be linked to information on cancer registrations held by the New Zealand Health Information Service. For those respondents who agreed to the data linkage (and were successfully matched), it was possible to construct the same presence and years since diagnosis variables in each wave as for the other chronic diseases covered by SoFIE. These variables will only be available for those in the linked data and are only available back to 1990 so the proportion of the population who have had a cancer diagnosis will be an underestimate. The linked sample is used for descriptive statistics that relate to cancer only.[4] In the models a “cancer unknown” category was included so the sample size available for analysis was not reduced.
Finally, using diagnosis of a chronic disease is an incomplete indicator of health status, which does not capture the relative severity of respondents' conditions. At best, this indicator focuses on a particular set of chronic diseases, and is not an encompassing measure of current health. SoFIE respondents are asked if they have ever been told by a doctor that they had the disease (or if they have ever had a cancer registration). A person may have had a disease diagnosis but no longer suffer symptoms. An example would be asthma or migraines, from which respondents may have suffered in their youth, but be symptom free by adulthood. On the other hand, a person may have the disease but not have been diagnosed by a doctor. Hence, this indicator of the disease diagnosis gives no indication of severity, and may not capture all those with a disease. An indication of the severity of such diseases, in terms of the functional losses or activity limitations, would allow better analysis of the relationship between health and labour market participation.
4.2.2 Selfrated health
An alternative health measure available in all three waves is selfrated health. Respondents are asked “In general how would you rate your health  excellent, very good, good, fair or poor?” Selfrated health is potentially a more encompassing measure of current health state than presence of chronic diseases as it can include other illnesses as well as chronic diseases and is collected for all respondents. As a result of this wider coverage, there is potential for more changes in health to be observed during the survey period. While this may be a more current and inclusive measure of health, allowing for the fact that a respondent may no longer suffer from symptoms of a chronic disease and including other health factors such as injury and illness, it is more subjective and, as such, may be subject to potential bias.
Firstly, selfrated health may not be entirely comparable between respondents. Some respondents may be consistently more optimistic in their health rating and others consistently more pessimistic. Secondly, with only three waves of data, most respondents are unlikely to experience many dramatic health status changes over this short period; and reported changes may not be true changes (Mathiowetz and Laird, 1994 in Bound at al, 1999). In addition, the subjective health baseline respondents use as a comparator when answering this question is illdefined and may change over time. For example, the SoFIE question on selfrated health does not ask respondents to rate their health relative to health of other people of the same age. Some respondents may compare their health to that of others, but others may compare their current health to their past health.[5] Given that there are only three waves of data, and that this report focuses on those of working age, this ageing effect appears to be small and is therefore not considered further in this work. Finally, even for the same person, selfrated health may be dependent on labour market status. This is considered in detail later in this paper.
Notes
 [3]This definition differs from the more standard definition of labour force participation as unpaid workers here are defined to be participating rather than not participating.
 [4]Where only the linked sample was used, adjusted weights were used to realign the sample with the population (adjusted longitudinal weight) as oppose to the weights provided by Statistics New Zealand (standard longitudinal weights).
 [5]In fact, data for all longitudinal respondents indicates a fall in the proportion of those who rate their health as excellent between Wave 1 and Wave 3 of around 5 percentage points and an increase in other health states, possibly indicating the ageing SoFIE population. This occurs despite the fact that those respondents who are most unwell are likely to die or move into institutions.
4.3 Modelling the health effect
4.3.1 Modelling methods and issues
Standard logistic regressions were the starting point for this analysis. Binomial and multinomial logistic regression models were fitted to the data to quantify the relationship between: the presence of different chronic diseases and labour force status; and selfrated health and labour force status (while holding all other variables constant). The binomial and multinomial models use the available characteristics of people to predict the chance of being in each labour market state. All other characteristics can then be held constant to determine the impact of a small change in one characteristic on the chance of participating. In this crosssectional analysis, responses in each wave were combined together (pooled) so that each respondent had up to three responses in the data. Standard binomial or multinomial logistic regressions were then fit to this pooled data (these models are hereafter referred to as pooled logistic regressions). This “pooling” maximises the data available for analysis. The correlation between the error term for the same respondent in each wave was allowed for by identifying the people as clusters. Full details of the model and methods used in this paper can be found in Appendix C.
The results of binomial logistic regressions can be presented in two main ways:
 Probability  This is the chance that a respondent with certain characteristics participates in the labour market. In a logit model a marginal effect is the relationship between a small change in a variable and the change in the probability of the outcome. As an example, where the characteristic of interest is a binary variable (such as disease present/not present), the difference between the probabilities of the outcome (participating) for two groups (which share all the same characteristics other than for the binary variable) is known as the marginal effect.
 Odds ratio  This is defined as the ratio of the odds of an event occurring in one group to the odds of it occurring in another group.[6] For example, the ratio of the odds of participating for those with chronic diseases to the odds of participating for those with no chronic diseases. The odds ratios are equal to the exponential of the coefficient when all other factors are held constant. An odds ratio greater than one indicates a positive effect, whilst one between zero and one indicates a negative effect. It is important to remember that a relative change in odds is not the same thing as a relative change in probabilities. In general,the magnitude of the odds ratios will be larger than that of the marginal effects because they are summarising the results in different ways.
The relationship between probabilities, odds, odds ratios and marginal effects in a binomial logistic regression model can be seen in Figure 1, where the results from the first model described in Section 5.1 are presented. The benefit of using odds ratios is that all other variables can be held constant but a value for these variables does not have to be specified. This is not the case for probabilities (or marginal effects) where the values of the other variables need to be specified (these are usually set at their mean value for the whole sample).[7] However, the interpretation of marginal effects is more intuitive. For these reasons, both odds ratios and marginal effects are presented here.
Figure 1  Relationship between results from binomial logistic regression  numeric example
When all other variables are fixed at their mean value the probability of participating in the labour force for people:
 with a chronic disease = P_{1} = 0.865
 without a chronic disease = P_{2} = 0.903.
The odds of participating in the labour force for people:
 with a chronic disease = [P_{1}/(1 P_{1})] = [0.865/(10.865)] = 6.40
 without a chronic disease = [P_{2}/(1 P_{1})] = [0.903/(10.903)] = 9.34.
That means that people with chronic diseases are 6.4 times more likely to participate in the labour force than not participate, while people without chronic diseases are 9.34 times more likely to participate in the labour force than not.
The odds ratio for those with chronic diseases is the ratio of the odds of participating for those with chronic diseases to those without chronic diseases. If this value is less than 1 then the odds of participating are lower for those with chronic diseases compared to those without a chronic diseases:
 Odds ratio = [P_{1}/(1 P_{1})] /[ P_{2}/(1 P_{2})] = 6.40/9.34 = 0.685
 Percentage change in odds = (0.6851)*100 = 31.5%.
The marginal effect is the difference in the probability of participating for those with chronic diseases compared to those without chronic diseases:
 Marginal effect = P_{1}P_{2} = 0.8650.903 = 0.038
 Percentage point (ppts) change in probability = 0.038*100 = 3.8ppts
 Percentage change in probability = (0.038/0.903)*100 = 4.3%.
This leads to the following conclusions:
1. The odds of participating (relative to not participating) are 31.4% lower for people with a chronic disease compared to people without a chronic disease.
2. The probability of participating in the labour force is 3.8 percentage points lower for people with a chronic disease compared to people without a chronic disease.
3. The probability of participating in the labour force is 4.3% lower for people with a chronic disease compared to people without a chronic disease.
Note: These results are derived from Appendix Tables D1 and D2. Probabilities are calculated using the formula outlined in Appendix Figure C1.
While a binomial logistic regression model predicts the chance of participating, multinomial models predict the chance of multiple states (ie, working fulltime, parttime, being unemployed or being inactive). As with the binomial logistic regression the results from the multinomial logistic regression can be presented in various ways, including probabilities/marginal effects or odds ratios. However, there is a slight difference in how these are interpreted for the multinomial model which is important to understand. The interpretation of the results is explained below and a numeric example, based on the first multinomial model discussed in Section 5.2, can be found in Figure 2.
 Probability  This is the chance that a respondent with certain characteristics is in each labour market state: that is fulltime; parttime; unemployed; or inactive. Each respondent has a probability of being in each of the four labour market outcomes (although the probability for any state can be zero). These four probabilities always sum to one, as a person has to be in one of the four states. The marginal effect is the relationship between a small change in a variable and the change in the probabilities of being in each of the four labour market outcomes. As an example, where the characteristic of interest is a binary variable (disease present/no disease present), the difference between the probabilities of being in each labour market outcome (fulltime/parttime/unemployed/inactive) for two groups (which share all the same characteristics other than for the binary variable) are known as the marginal effects. The marginal effects sum to zero across each respondent. So if the chance of being in three of the four labour market states increases, then the chance of being in the fourth labour market state must decrease by the same amount. Unlike the odds ratios, the marginal effects are not interpreted relative to a particular labour market category, but need to be interpreted across the labour market states.
 Odds ratio  This is defined as the ratio of the odds of an event occurring in one group to the odds of it occurring in another group. The odds ratios are equal to the exponential of the coefficient when all other factors are held constant. In these results the reference labour market outcome is inactive. Taking parttime as an example, the odds ratios for those with chronic diseases is the ratio of the odds of working parttime (rather than being inactive) for those with one or more chronic diseases to the same odds for those without chronic diseases.[8] As with the binomial models an odds ratio greater than one indicates a positive effect, whilst one between zero and one indicates a negative effect.
Owing to the differences as to what odds ratios and marginal effects measure, and therefore the different magnitudes of the two measures, it is perfectly plausible for the odds ratio for a specific category to be significantly different from the reference category, but for the marginal effect for the same group to not be significant. When calculating the odds ratio, the baseline odds (the ratio of the probability of an event occurring to the probability of it not occurring) drop out, so the magnitude of the probability is not important in the odds ratio calculation. The test for significance indicates whether the odds ratio (which is not dependent on the baseline odds) is different from one. However, the magnitude of the probabilities is important in testing the significance of a marginal effect. The test here is whether the marginal effect significantly changes the baseline probability. If the base probability for the sample is very small or very large then small marginal effects may not be significant. Another way of thinking about this is that a big sounding odds ratio can easily correspond to a very small sounding difference in marginal effect.
Notes
 [6]Where the odds is the ratio of the probability of an event occurring to the probability of it not occurring within a group; so the probability of participating to the probability of not participating.
 [7]The marginal effects presented here use this method. Alternative methods include using the means for certain groups (ie, those with chronic diseases) or calculating the personspecific marginal effects and averaging them over the groups of interest. These methods were considered here but, as the differences in the resulting marginal effects using these methods were small, the mean for the whole sample was used.
 [8]So the odds are the probability of working parttime to the probability of being inactive.
4.3.1 Modelling methods and issues (continued)
Figure 2  Relationship between results from multinomial logit model  numeric example
When all other variables are fixed at their mean value the probability of being in each labour force state for people:
 with a chronic disease are:
 P_{1Fulltime} = 0.663
 P_{1Parttime} = 0.165
 P_{1Unemployed} = 0.023
 P_{1Inactive} = 0.149
 without a chronic disease are:
 P_{2Fulltime} = 0.715
 P_{2Parttime} = 0.160
 P_{2Unemployed}= 0.019
 P_{2Inactive}= 0.106.
Focusing on fulltime, the odds of being in each labour market state relative to being inactive for people:
 with a chronic disease = P_{1Fulltime} / P_{1Inactive} = [0.663/0.149] = 4.45
 without a chronic disease = P_{2Fulltime} / P_{2Inactive} = [0.715/0.106] = 6.75.
That means people with chronic diseases are 4.45 times more likely to work fulltime than be inactive, while people without chronic diseases are 6.75 times more likely to work fulltime than be inactive.
The odds ratio for those with chronic diseases is the ratio of the odds of working fulltime (relative to inactive) for those with chronic diseases to those without chronic diseases. If this value is less than 1 then the odds of participating is lower for those with chronic diseases compared to those without a chronic diseases:
 Odds ratio = [P_{1Fulltime} / P_{1Inactive}] /[P_{2Fulltime} / P_{2Inactive}]
= 4.45/6.75 = 0.659
 Percentage change in odds = (0.6591)*100 = 34.1%.
The marginal effect is the difference in the probability of working fulltime for those with chronic diseases compared to those without chronic diseases. However, the probabilities for each labour market state are not independent; each person must be in one of the four labour market states, so the probabilities across each group must sum to one; that means the marginal effects across each state must sum to zero:
 Marginal effect:
 Fulltime = P_{1Fulltime}  P_{2Fulltime} = 0.6630.715 = 0.052
 Parttime = P_{1Parttime}  P_{2Parttime} = 0.1650.160 = 0.005
 Unemployed = P_{1Unemployed}  P_{2Unemployed} = 0.0230.019 = 0.004
 Inactive = P_{1Inactive}  P_{2Inactive} = 0.1490.106 = 0.043
 Percentage point (ppts) change in probability of working fulltime
= 0.052*100 = 5.2 ppts
 Percentage change in probability of working fulltime
= (0.052/0.715)*100 = 7.3%.
This leads to the following conclusions:

 The odds of working fulltime relative to being inactive are 34.1% lower for people with a chronic disease compared to people without a chronic disease.
 The probability of people with a chronic disease working fulltime in the labour force is 5.2 ppts lower than for those without chronic diseases. Comparing the same groups, the probability of working parttime is 0.05 ppts higher, being unemployed is 0.04 ppts higher and being inactive is 4.3 ppts higher.
 The probability of working fulltime in the labour force is 7.3% lower for people with a chronic disease.
Note: These results are derived from Appendix Tables D1 and D4. As described in Appendix C, probabilities are calculated using a variation of the formula outlined in Appendix Figure C1.
One of the common problems encountered when trying to estimate the effect of a variable on a particular outcome is endogeneity. Endogeneity occurs if the value of one of the explanatory variables (for example, health status) is dependent on the value of other unobserved variables or on the outcome variable (in this case, labour market participation). In other words, the explanatory variables are not exogenous; true exogenous variables are not affected by the outcome variable or by other unobserved characteristics. One of the assumptions of the standard logistic regression model is that the explanatory variables are exogenous. If endogeneity is present standard logistic regression models can produce inconsistent and possibly biased (incorrect) regression coefficients. While giving an initial indication of possible relationships between labour force participation and health, the standard logistic regression models cannot account for endogeneity. Endogeneity is likely to be an issue when trying to estimate the impact of health on participation for the following reasons:
 Previous studies have shown that for some groups, as well as affecting labour force participation, health may in turn be influenced by labour force participation; or labour force participation and health may be simultaneously determined (eg, Cai and Kalb, 2006). For example, being inactive may lead some people to be depressed, while being employed in a stressful role may lead to high blood pressure. Therefore the fact that a model may indicate a relationship between the dependent and explanatory variables does not necessarily mean the explanatory variables cause the outcome (Tabachnick and Fidell, Using Multivariate Statistics 4^{th} Edition, 2001). These problems are referred to in the literature as “reverse causality and simultaneity”.
 Other factors that are not observed in the data may influence both labour force participation and/or health. An example would be average motivation (Laplagne et al, 2007).[9] Someone who is less motivated to participate in the labour force may also be less motivated to take the steps to stay healthy (for example. undertaking exercise). Differences in these unobservables between respondents may explain variation in both health and labour force participation. If they are excluded from the model, the variation in labour force participation will appear to be owing to variation in health and therefore the estimated health effect will be biased. This is a particular kind of “unobserved individual heterogeneity”.[10]
 The way health variables are reported may reflect the respondent's labour force participation. For example, respondents may report their health state to justify their labour market state (eg, someone who is not participating in the labour force may report that their health is poorer than they would report if they were participating). This is referred to as “rationalisation bias or endogeneity”.
The longitudinal design of SoFIE allows more complex modelling techniques to try to account for the types of endogeneity outlined above. In addition to the standard logistic regression models for selfrated health the following methods were considered:
 Fixed and correlated random effects panel logistic regression  This technique examines the impact of changes in actual selfrated health on participation taking into account unobserved time constant variables that will vary between people and may influence labour force participation and/or health (time constant unobserved heterogeneity). By looking at how changes in participation relate to changes in other variables between waves the time constant unobserved variables are removed when fixed and random effects models are used.[11]
 Standard pooled binomial and multinomial models and fixed and correlated random effects panel logistic regression with an adjusted health measure  These models adjust health for potential rationalisation bias and account for unobserved factors that do not change over time. First, selfrated health was modelled based on a set of more objective health measures and a set of other healthrelated variables. An adjusted measure of health stock was then predicted using these models. This adjusted measure of health was then included in all of the previous models.
 Instrumental variables/simultaneous equations These techniques can account for unobserved variables that do and do not change over time, and for reverse causality. Although considered in depth, no successful instrument was found.
More discussion of these modelling methods can be found in Appendix C. Ideally these techniques would also have been applied to individual chronic diseases which (like selfrated health) could suffer from endogeneity. Not all of these diseases will be open to all three types of endogeneity identified above and some diseases are more susceptible to certain types of endogeneity than others. Some literature (Cai and Kalb, 2006; Laplagne et al, 2007) suggests that rationalisation endogeneity is less likely for the chronic diseases considered given they are less subjective as they depend on a doctor’s diagnosis. However, doctors’ diagnoses of diseases may in turn affect labour force participation decisions, even when the symptoms of the disease are mild. Applying the techniques to control for possible endogeneity to individual chronic diseases proved problematic for several reasons. These include: the relatively small numbers of people with each chronic disease; the fact that the presence of chronic disease is slow changing (making it hard to compare changes in participation and disease diagnosis within respondents); and only three waves of SoFIE data were available at the time of the analysis. This means that trying to use panel models to account for possible endogeneity would not be especially effective for chronic diseases until more waves of data are available. Standard logistic regression models are therefore the only models considered for individual chronic diseases despite the possibility of endogeneity bias. For selfrated health, results for standard logistic regression models are reported before the more advanced panel model results to compare with both the models for individual chronic diseases and to the panel models as a way of demonstrating possible endogeneity bias.
The chronic disease questions are considered to be more objective than selfreported health (Bound et al, 1999), suggesting that such measures are less likely to suffer from rationalisation bias. However, these more objective measures may not always be good predictors of overall health and the ability to work. As noted in Section 4.1.1, a person may no longer suffer from symptoms of a previously diagnosed disease, while others may suffer from a disease but be undiagnosed. Further, modelling difficulties may emerge as the presence of some of these diseases is likely to be collinear to some degree (owing to comorbidity or secondary diseases), making the coefficients more difficult to interpret (Bound et al, 1999). As an example, diabetes is associated with an increased risk of developing heart disease; as such, heart disease may be a secondary disease. These interactions are complex and therefore difficult to include in the analysis. As a result they are not considered further in this report.
Note that throughout the remainder of this paper, words such as “impact” and “effect” are used to describe relationships but do not denote causation. This should be borne in mind when reading the results. Further, where results of the standard logistic regression models are discussed in this paper potential endogeneity bias should be remembered.
Notes
 [9]Motivation is not totally fixed over time as, even with a short period, motivation can vary. However, average motivation will be fixed within a person and is likely to vary across individuals.
 [10]Another form of unobserved heterogeneity occurs when the unobserved variables are not related to the other explanatory variables although they do explain a certain amount of variation in labour force participation. Note that this form of unobserved heterogeneity would not bias the coefficient on health.
 [11]Only binomial panel models were considered. This work could be extended in future to consider multinomial panel models.
4.3.2 Model variables
The decisions on which variables to include in the models were made based on reviews of the literature and best practice. The following variables were included in the standard logistic regression (crosssectional models):[12]
 gender
 region
 age (and whether aged 50 or above)[13]
 highest qualification
 study status
 marital status
 place of birth
 ethnicity
 presence of children
 household income less personal income
 years in paid employment
 the unemployment rate at the time of the interview.[14]
Those variables included in the crosssectional models that were slow or little changing or that could be directly impacted by changes in health were excluded from the fixed and random effects models (longitudinal models) leaving the following variables: gender (random effects only); region; age (and whether aged 50 or above); marital status; place of birth (random effects only); children; household income less personal income; and the unemployment rate at the time of the interview.
In addition to the variables for the crosssectional models, the model creating the adjusted health measure included the following variables: total household income (as opposed to household income less personal income); health benefit receipt; housing tenure; and whether a respondent has ever smoked. All these variables are defined in Appendix A, Tables A1, A2 and A3.
Notes
 [12]Wealth of the respondent and the labour force state of any parents the respondent lived with at age 10 were also considered for inclusion. Wealth was not available in all three waves and the labour force state of parents was not significant in the models once other variables were included.
 [13]Unadjusted age was included. The aged 50 and over indicator was included to pick up a change in participation habits that appeared to occur for men and women around the age of 50.
 [14]The unemployment rate for the time of the interview was included to reflect the rolling interview period throughout the year.
5 Chronic diseases
This section explores the relationship between different types of chronic disease and labour market participation. It begins by reporting basic descriptive statistics and then summarises the results from the logistic regression models. The analysis in this section is based on pooled crosssectional data analysis. As previously mentioned, it should be remembered that words such as “impact” and “effect” are used to describe relationships but do not attempt to denote causation and that theresults of the standard logistic regression models are subject to potential endogeneity bias. Full tables of results from the main models, including unweighted means and standard deviations for the variables, can be found in Appendix D where the reference categories are labelled.
5.1 Chronic disease and labour market participation
Table 1 shows the proportion of the sample with various disease diagnoses. The results indicate that around half of the sample has been diagnosed with one or more chronic diseases.[15] Table 1 indicates that the most common disease is asthma with 18.5% of respondents having been diagnosed with this disease at some point. The rarest disease is a stroke with only 1% of respondents having been diagnosed with a stroke. This small disease prevalence is not surprising given that strokes are likely to be quite rare for those of working age, the group being analysed. Further, stroke is one disease that is more likely to result in death for this group. In other words, for some diseases the prevalence is higher than others as a result of being more likely to survive with the disease (survivor bias).
Disease  Disease prevalence (%) 

Any chronic disease  49.5 
Asthma  18.5 
High blood pressure  14.9 
High cholesterol  13.4 
Heart disease  2.9 
Diabetes  3.0 
Stroke  1.0 
Migraine  13.4 
Psychiatric conditions  9.5 
Cancer*  3.5 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights (*adjusted longitundial weight), Statistics New Zealand.
Note: Results are for those aged 1564 and who are not fulltime students. Data for all three waves is pooled together to create an average rate.
Table 2 shows the labour market participation rates by disease presence. The observed labour market participation rates are considerably lower for those with a disease diagnosis compared to the overall participation rate. Participation is lowest for those who have suffered from a stroke. About half (54%) of people with a diagnosed stroke participate in the labour market, compared to the average participation rate of 83%, a reduction in the likelihood of participation of 35% (29 percentage points). However, this estimate is subject to a larger error given it is based on a relatively small group. Only 1% of the sample reported ever being told by a doctor they had suffered a stroke.
Disease  Average number participating over 3 waves (count) 
Participation rate (%) 

Total  1,835,000  82.6 
No chronic disease  958,600  85.5 
Any chronic disease  876,500  79.7 
Asthma  327,500  80.0 
High blood pressure  251,800  76.0 
High cholesterol  237,500  80.6 
Heart disease  40,500  64.0 
Diabetes  41,900  63.7 
Stroke  11,700  53.8 
Migraine  234,200  78.4 
Psychiatric conditions  146,700  69.0 
Cancer*  59,000  76.4 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights (*adjusted longitudinal weights), Statistics New Zealand
Notes:
1. See note on Table 1.
2. This is just a crude particaption rate. It had not been age standardised.
3. Counts may not sum to totals owing to rounding.
The bivariate analysis in Table 2 above, while interesting, does not control for other factors that may be related to participation. Pooled crosssectional logistic regressions were used to determine the relationship between disease presence and participation when some other factors were controlled for.
Initially a basic model was conducted including a summary chronic disease indicator (rather than the individual chronic diseases) to determine the overall impact of having chronic disease on participation. Results show that, even after controlling for other variables, the relationship between chronic disease presence and participation is significant (Appendix Table D1). Figure 3 shows that the odds of participating in the labour force are reduced by 31.5% for those with any chronic disease(s).[16] When all variables are fixed at their mean value, the probability of participating is 0.885. This is above the unconditional mean participation rate of 0.827, perhaps because of the more rapid decline in participation for those over 50 years of age which reduces the unconditional average. For those with no chronic diseases, the estimated probability of participating is 0.903, while for those with a chronic disease the estimated probability is reduced to 0.865; a marginal effect of 0.038 (Table 3). This suggests that for an average person, having chronic diseases reduces labour market participation by 3.8 percentage points on average, or 4.3% in a relative sense.
By contrast, the bivariate analysis in Table 2 indicated a difference of 5.8 percentage points. This suggests that other differences in characteristics are important in explaining the lower participation rate of those diagnosed with a chronic disease (Table 2). For example, the odds of participating are lower for: females with young children (this is associated with a reduction in the odds of participating of 90%); those with nonworking partners or no partner (75% and 65% reduction respectively); and for females (22% reduction).
Next, models were considered that included variables for each individual disease, rather than a summary variable indicating disease presence. Figure 3 shows the estimated ratio of the odds of labour market participation for those with each disease to the odds for those without each disease. An odds ratio greater than one indicates a positive effect, whilst one between zero and one indicates a negative effect on the odds of participation for those with each disease. If the vertical line for each bar, showing the 95% confidence interval for the odds ratio, crosses one (indicated by the horizontal 95% significance line), then the chance of participation for those with the disease is not significantly different from those without the disease at the 95% level (once other factors are controlled for). Therefore there was insufficient evidence that those with an asthma, high cholesterol, migraine or cancer diagnosis were any less likely to be participating in the labour market than those without these diseases, once other factors were controlled for. For asthma, migraine and high cholesterol this may be a result of such diseases typically being manageable once identified and therefore not inhibiting labour market participation in many cases.
Having been diagnosed with any of the following diseases (in order of impact from highest to lowest) is associated with a significantly reduced odds of labour market participation compared to someone without the disease, once other factors are controlled for:
 psychiatric conditions (are associated with a 70% reduction in the odds of labour market participation for males and 40% for females)
 stroke (59% reduction);
 heart disease (48% reduction);
 diabetes (42% reduction)
 high blood pressure (16% reduction).
For some of these, the presence of the particular reported condition may not itself be associated with lower odds of participating. Rather, other secondary diseases related to the primary disease may be causing the association. For example, high blood pressure may not be associated with reduced odds of participating, but kidney failure resulting from high blood pressure may. Further, collinearity between these health conditions is not formally investigated here.
Notes
 [15]The true proportion is likely to be slightly higher than this as those for whom the presence of cancer is unknown and who have no other chronic diseases have been assumed to have no chronic diseases.
 [16]The odds of participating for those with one or more chronic diseases are 6.4:1, without disease are 9.3:1, giving an odds ratio of 0.685 = 6.4/9.3.
5.1 Chronic disease and labour market participation (continued)
 Figure 3  Estimated odds ratios of participating in the labour force  pooled logistic regression  grouped and individual diseases: 2002/03 to 2004/05

 Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. The odds ratios for the summary chronic disease indicator and for individual diseases are derived from different models. Odds ratios for summary chronic disease indicator are derived from Appendix Table D1, while those for individual chronic diseases are derived from Appendix Table D3. The footnotes from those tables apply to this chart.
2. The following factors were held constant: gender; region; age (and whether 50 years of age or above); highest qualification; study status; marital status; place of birth; ethnicity; children; household income less personal income; years in paid employment; and unemployment rate at the time of the interview.
The impact of a disease diagnosis on labour market participation did not differ significantly by gender other than for psychiatric conditions. The presence of this disease was associated with a 70% reduction in the odds of participating in the labour market for men, and 40% for women, and the 95% confidence intervals do not overlap. This substantial and significant difference is in line with work done by the Australian Productivity Commission (Laplange et al, 2007).
The results of the model indicate a reduction in the odds of participation of 9% for males with psychiatric conditions relative to females with psychiatric conditions (with an odds ratio of 0.91).[17] This difference by gender may in part be owing to compositional differences between the kinds of men and women who go to the doctor and are diagnosed with psychiatric conditions. A higher proportion of women have been told by a doctor that they suffer from psychiatric conditions (12.8% compared to 6.2%), suggesting that the threshold for men seeking psychiatric help may be higher. Tests indicate that the impact of psychiatric conditions for men is significantly higher than that of heart disease and diabetes, but not significantly different from that for a stroke.
As an illustration of the impact on the probability of participating in the labour force, Table 3 shows the marginal effects on labour market participation as a result of moving from not having a disease to having a disease when all other variables are held at their mean. The probabilities the marginal effects are based on are derived from Appendix Tables D1, D2 and D3. For instance, when all other variables are fixed at the mean values, the probability of a person participating in the labour market given they have no diabetes diagnosis is 0.890 (which is similar than the average participation probability for all respondents from the model of 0.888). Given a diagnosis of diabetes, the probability is lower at 0.823, giving a marginal effect on participation of 0.067 (shown in Table 3).
Disease  Marginal effects 

Any chronic disease  0.038*** 
Asthma  0.009 
High blood pressure  0.018** 
High cholesterol  0.008 
Heart disease  0.083*** 
Diabetes  0.067*** 
Stroke  0.123*** 
Migraine  0.004 
Psychiatric conditions  male  0.132*** 
Psychiatric conditions  female  0.065*** 
Cancer  0.007 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. The marginal effects for the summary chronic disease indicator and for individual diseases are derived from different models. Marginal effects for summary chronic disease indicators are derived from Appendix Table D1, while those for individual chronic diseases are derived from Appendix Table D3. All marginal effects are calculated holding all other variables at their mean. The footnotes from those tables apply to this table.
2. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
3. These marginal effects are the actual differences in probabilities compared to those without each condition.
The analysis of marginal effects indicates that, in terms of magnitude, the impact of psychiatric conditions is much lower than suggested by the odds ratios. Holding all other values at their mean, the probability of a male with psychiatric conditions participating is 0.797, compared to the probability for a male without psychiatric conditions of 0.929, giving a marginal effect of 0.132. In other words, the labour force participation rate for men with psychiatric conditions is 13.2 percentage points below that for men without psychiatric conditions on average. Similarly, the probability of a female with psychiatric conditions participating in the labour market is 0.812 compared to a probability of 0.878 for females without psychiatric conditions on average, giving a marginal effect of around 0.065. The marginal effect between males with psychiatric conditions and females with psychiatric conditions is 0.015.
The coefficient for cancer indicated a negative relationship with participation, but this relationship wasn't significant.[18] This may reflect the nature of cancer treatment, which is very intensive over a compressed period, or that those with the most severe cases of cancer die. Having cancer diagnosed may result in people of working age taking sick leave for cancer treatment rather than leaving the labour force completely. The result may not hold if full cancer information were available, as those diagnosed with cancer before 1990 are not identifiable but they may have poorer health than those diagnosed later. Interestingly, the coefficient for those respondents who did not agree for their data to be linked to the cancer information (and so were coded as unknown for cancer presence) were significantly less likely to participate in the labour market than those without cancer. This may indicate potential differences in the unobserved characteristics of those who do and do not consent.
Interestingly, the impact of a disease diagnosis on labour market participation did not vary significantly by age. In other words, the reduction in the chance of labour market participation for those with a disease diagnosis was no higher if the respondent was young compared to if they were old.
The nonhealth related variables indicate that, when all other explanatory factors are held constant, the following groups have lower chance of participating in the labour market: females; those born outside of New Zealand; those who are older; those with no qualifications; those undertaking some form of study; those with nonworking partners; and those with higher other household income (relative to the reference categories). Additional years of paid employment is associated with an increase in the chance of participation.[19] For males, having no partner is associated with a reduced chance of participation. This is also true for females but to a lesser extent. Men who have young children are more likely to work than those without children, while men with older children are less likely to work than those without children. For women, having children of any age is associated with a reduction in the chance of participating, with the chance of participating being reduced by the most for those with young children.
Finally, the model of individual diseases was then developed to include, where possible, a variable summarising the presence of the disease and the years since diagnosis. This was done to determine whether more recent diagnoses are associated with higher or lower labour market participation.[20][21] Of the diseases found to be significantly negatively related to participation (other than psychiatric conditions for which this durational breakdown is not possible), the impact of a more recent diagnosis (in the last five years) of high blood pressure, heart disease or stroke appeared more detrimental than an older diagnosis. For example, the odds of participating for those who have had a stroke in the last five years are reduced by 62%. This compares to a 57% reduction for those who had a stroke five or more years ago. This difference may in part be because the further from the point of diagnosis, the more a person may have recovered. It also may be because the person may no longer be undergoing intensive treatments that prevent them from working, or have learnt how to manage their conditions. Conversely, the difference may also reflect the fact that those who suffer more severe strokes die within five years of being diagnosed and are therefore not included in the data (survivorship bias).
The effect was reversed for diabetes, with a less recent diagnosis being associated with a larger reduction in participation than a more recent diagnosis. The odds of a respondent working who had been diagnosed with diabetes in the last five years were reduced by 29% (which was not significantly different from those with no diabetes diagnosis) while the odds of those with a diagnosis of diabetes five years ago or more participating were 52% lower than those with no diabetes, possibly indicating the progressive nature of diabetes. While providing a possible indication of direction, the coefficients for the two periods of diagnosis were only found to be significantly different from each for diabetes and heart disease when tested for equality (using a Wald test).
When all those with high cholesterol were considered together, there was insufficient evidence to suggest this group were less likely to participate in the labour force than those without high cholesterol. However, when the period since diagnosis was interacted with high cholesterol, there was a significant reduction in the chance of participating for those who had been diagnosed with high cholesterol five years ago or more, compared to those without high cholesterol. Again, this is possibly owing to the progressive nature of high cholesterol risk.
Notes
 [17]These figures are not presented in the chart. They can be derived using the information in Appendix Tables D2 and D3.
 [18]In part this maybe owing to the larger error around the estimate owing to cancer information only being known for a restricted sample. Interestingly, when those over working age were included in the model, cancer was found to be significantly related to a reduction in participation.
 [19]Over the relevant range (the quadratic peaks in the mid1980s).
 [20]Again, the following variables were held constant: gender; region; age (and whether 50 years of age or above); highest qualification; study status; marital status; place of birth; ethnicity; children; household income less personal income; years in paid employment; and unemployment rate at the time of the interview.
 [21]Full model results are available on request.
5.2 Chronic disease and labour market outcome
Not only do those with certain chronic diseases participate less in the labour market, Table 4 shows that those who do participate seem more likely to work parttime than those who have not been diagnosed with a chronic disease. The largest difference is for those who have had a stroke. About a third (33%) of those people who have had a stroke and who are participating in the labour market work parttime, compared to only 19% of all participating respondents. The previous analysis was therefore developed to examine the impact of chronic disease on level of participation, once other factors are controlled for.
Labour market outcome (%)  

Disease  Fulltime employment  Parttime employment  Unemployment  Total participating 
Total  78.4  19.0  2.7  100.0 
Any chronic disease  76.3  20.9  2.8  100.0 
Asthma  77.6  19.4  3.1  100.0 
High blood pressure  76.8  20.7  2.5  100.0 
High cholesterol  79.2  18.6  2.2  100.0 
Heart disease  76.6  21.6  1.9  100.0 
Diabetes  71.7  22.8  5.4  100.0 
Stroke  63.7  32.6  3.7  100.0 
Migraine  71.5  25.1  3.4  100.0 
Psychiatric conditions  68.0  26.9  5.1  100.0 
Cancer*  71.5  26.7  1.9  100.0 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights (*adjusted longitudinal weights), Statistics New Zealand
Note: See footnote on Table 1.
Table 5 summarises the odds ratios from the model. The model with an indicator of any chronic disease presence indicates that, even after controlling for other factors, having a chronic disease is also associated with a larger reduction in the odds of working fulltime (relative to being inactive) compared to parttime (relative to being inactive). The odds of a person with one or more chronic diseases working fulltime (relative to being inactive) are around 34% lower than those for a person without a chronic disease; however, the odds of a person with one or more chronic diseases working parttime (relative to being inactive) are around 27% lower than those for a person without a chronic disease.
The results of the model including each individual chronic disease indicate that even after controlling for other factors, the presence of diabetes, stroke and psychiatric conditions (which are associated with a significant reduction in the odds of participation) are also associated with a larger reduction in the odds of working fulltime (relative to being inactive) compared to parttime (relative to being inactive).[22] As an example, the odds of a person with a stroke working fulltime (relative to being inactive) are around 67% lower than those for a person without a stroke. However, the odds of a person with a stroke working parttime (relative to being inactive) are only around 39% lower than those of someone without a stroke. The effect for high blood pressure and heart disease (the other two diseases which were found to be significantly related to participation) is the reverse, with the impact of working fulltime (relative to being inactive) being less than the impact of working parttime (again relative to being inactive). However, the differences between the effects for fulltime and parttime for high blood pressure were not found to be significant at the 95% level.
For those with asthma, high cholesterol, migraine or cancer the odds of being in each of the employment states are not significantly different from those without these diseases (relative to being inactive).[23]
Odd ratios  

Disease  Fulltime employment  Parttime employment  Unemployment 
Any chronic disease (base=no known chronic disease)  0.659***  0.733***  0.878 
Asthma (base=no asthma)  0.923  0.892*  0.962 
High blood pressure (base=no high blood pressure)  0.849**  0.833**  0.842 
High cholesterol (base=no high cholesterol)  0.916  0.923  0.997 
Heart disease (base=no heart disease)  0.539***  0.530***  0.417*** 
Diabetes (base=no diabetes)  0.497***  0.697***  0.985 
Stroke (base=no stroke)  0.327***  0.612**  0.446** 
Migraine (base=no migraine)  0.923  0.989  1.257* 
Psychiatric conditions  male (base=male no psychiatric conditions)  0.265***  0.472***  0.550*** 
Psychiatric conditions  female (base=female no psychiatric conditions)  0.531***  0.679  0.958 
Cancer (base=no cancer)  0.945  0.935  0.828 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
 The odds ratios for the summary chronic disease indicator and for individual diseases are derived from different models. Odds ratios for summary chronic disease indicator are derived from Appendix Table D4, while those for individual chronic diseases are derived from Appendix Table D5. The footnotes from that table apply to this table.
 The following factors were held constant: gender; region; age (and whether 50 years of age or above); highest qualification; study status; marital status; place of birth; ethnicity; children; household income less personal income; years in paid employment; and unemployment rate at the time of the interview.
 *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
Table 6 summarises the marginal effects for each disease. Looking at the result for grouped chronic diseases indicates that an average person with chronic diseases is 5.2 percentage points less likely to be fulltime, 0.5 percentage points more likely to be parttime, 0.4 percentage points more likely to be unemployed and 4.3 percentage points more likely to be inactive, than an average person with no chronic diseases.
Turning to the model including the individual chronic diseases shows that, for an average person, having heart disease, diabetes, a stroke or a psychiatric condition is highly significant in reducing the chance of working fulltime and increasing the chance of being inactive. For example, for an average person, having a stroke is associated with a 19.9 percentage point decrease in the chance of working fulltime, a 5.5 percentage point increase in the chance of working parttime and a 14.5 percentage point increase in the chance of being inactive. So while the odds of working parttime rather than being inactive for those with a stroke are higher than the odds for those without a stroke, the chance of working parttime for those with a stroke is higher than for those without a stroke (ie, some of those with a stroke who do not work fulltime work parttime instead).
Marginal effects  

Disease  Fulltime employment  Parttime employment  Unemployment  Inactive 
Any chronic disease  0.052***  0.005  0.004**  0.043*** 
Asthma  0.004  0.006  0.001  0.009 
High blood pressure  0.012  0.006  0.001  0.019** 
High cholesterol  0.010  0.001  0.002  0.009 
Heart disease  0.061**  0.017  0.006  0.084*** 
Diabetes  0.120***  0.026  0.013*  0.081*** 
Stroke  0.199***  0.055*  0.001  0.145*** 
Migraine  0.020*  0.007  0.007**  0.006 
Psychiatric conditions  male  0.185***  0.030*  0.014  0.141*** 
Psychiatric conditions  female  0.099***  0.017  0.008  0.074*** 
Cancer  0.002  0.002  0.003  0.007 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. The marginal effects for the summary chronic disease indicator and for individual diseases are derived from different models. Marginal effects for summary chronic disease indicators are derived from Appendix Table D4, while those for individual chronic diseases are derived from Appendix Table D5. The footnotes from those tables apply to this table.
2. The following variables were held at the mean value for the whole sample: gender; region; age (and whether 50 years of age or above); highest qualification; study status; marital status; place of birth; ethnicity; children; household income less personal income; years in paid employment; and unemployment rate at the time of the interview.
3. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
6 Selfrated health and labour market participation
This section explores the relationship between selfrated health and labour market participation. It begins by revisiting the reasons for considering selfrated health and for using the various modelling approaches. Basic descriptive statistics related to selfrated health are then presented, before the results from the corresponding pooled (crosssectional) models, as considered in the previous section, are summarised. The results of the fixed and correlated random effects (longitudinal) logistic regression models, and the equivalent models using an adjusted measure of selfrated health, are then discussed. Again, words such as “impact” and “effect” are used to describe relationships but do not denote causation. Full tables of results from the main models, including unweighted means and standard deviations for the selfrated health variable, can be found in Appendices E, F and G where the reference categories are labelled.
6.1 Models used
As outlined in Section 4, two measures of health are available in all three waves of SoFIE: chronic diseases; and selfrated health. Given the issues with both of these health measures, and the conclusion of an earlier literature review in the area, it is preferable to consider the relationships between both of these measures and labour force participation. This section therefore begins by reporting basic descriptive statistics related to selfrated health and then summarises the results from the corresponding pooled (crosssectional) models as presented in the previous section. The results of the pooled logistic regression model are presented to enable comparison with both the equivalent models for selfrated health and with the subsequent panel models for selfrated health. Where results of the standard logistic regression models are discussed in this paper potential endogeneity bias should be remembered (as explained in Section 4.3.1).
The results of the fixed and correlated random effects (longitudinal) logistic regression models and the equivalent models using an adjusted measure of selfrated health are then presented. These models make use of the longitudinal nature of the data and aim to resolve some of the endogeneity issues identified in Section 4. Ideally these models would have been applied to the models including individual chronic diseases but owing to small numbers in some groups and that the diagnosis of chronic diseases is slow changing this was not possible. Unlike the standard logistic regression results (for which the assumptions may not be satisfied owing to endogeneity, thus possibility resulting in inconsistent (and biased) regression coefficients) the panel models account for some forms of endogeneity, and thus should produce estimates that are consistent and unbiased, if the model assumptions are satisfied.
In addition to the above, the health coefficients from the standard pooled regression, the fixed effects and the correlated random effects models are interpreted differently. The coefficients from the pooled regressions indicate how health levels are related to the chance of participation for a crosssection, while the health coefficients from the fixed and correlated random effects models use longitudinal data to indicate how health shocks are related to participation (although health level is also estimated in the latter model). The fixed effects model attempts to explain variation within (rather than between) respondents over time, making direct comparison of the odds ratios with those from the standard and random effects models problematic.
All three types of models identify a highly significant relationship between health and labour force participation; however, no model is perfect. The best model is found to be the fixed effects model. However, this model is not without its drawbacks. By definition a fixed effects model excludes all those for whom participation does not change over the period from the analysis, meaning there is no estimate of the relationship between health and participation for those continually inactive. Also the fixed effects model focuses on variation in participation for each respondent. This means that only within (rather than within and between) person variation is considered. Finally, there may be other types of endogeneity present that it is not possible to account for using a fixed effects model; for example, unobserved variables that change over time and are related to the explanatory variables. Assuming that this is not the case, the fixed effects model should produce estimates that are consistent (and unbiased).
The crosssectional pooled regression considers the relationship between health state and participation for all respondents but does not consider within person variation. It is also not possible to control for any types of endogeneity so the results are likely to be biased. The correlated random effects model considers within and between person variation and includes an estimate of the average health level for respondents as well as looking at health shocks. However, if the assumption that the only correlation between health shocks and the unobserved variables that are fixed over time is through average health is not valid, or if average health is itself correlated with unobserved variables, then the coefficients from this model may be biased. Further, other types of endogeneity such as unobserved variables that change over time cannot be accounted for. Owing to the pros and cons of each of the models, and to allow comparisons between the models to be seen, all of the model results are presented in this section to illustrate the different types of relationships identified between health and labour force participation.
6.2 Unadjusted selfrated health
First, basic descriptives are considered. Table 7 shows the distribution of selfrated health across the population. Around threequarters of the people consider themselves to be in excellent or very good health. A further 18% feel they are in good health. The remaining 6% feel they are in fair or poor health.
Health status  Distribution (%) 
Participation rate (%) 

Excellent health  41.3  87.7 
Very good health  33.9  85.6 
Good health  18.4  77.0 
Fair health  5.0  56.9 
Poor health  1.4  29.1 
Total  100.0  82.7 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights, Statistics New Zealand
Note: Results are for those aged 1564 and are not fulltime students. Data for all three waves is pooled together to create an average rate.
Table 7 also shows that, as with the individual diseases, participation decreases as health declines. Around 88% of those in excellent health participate in the labour market, compared to just 29% of those in poor health.
6.2.1 Standard pooled regression
The odds ratios for the pooled logistic regression model where the chronic disease variables have been replaced by the selfrated health variable are shown in Figure 4. The participation rates for those in excellent health appear to be above those for people in very good health. However, the odds of participating for those of very good health are not significantly different from those of excellent health, once other factors are controlled for. Being in good, fair or poor selfrated health is associated with a reduction in the odds of participating compared to those of excellent selfrated health, by 46%, 76% or 92% respectively.
The equivalent marginal effects indicate that being of good, fair or poor health reduces the probability of participating by 6, 22 and 50 percentage points respectively (see Table 13).[24] The impact of being in these health states is significantly different from being in excellent health but also the impact of each health state is significantly different from one another (ie, the magnitude of the relationship between being in fair health and participation is less than that between poor health and participation). The R^{2} for the selfrated health model is slightly higher than that for the individual diseases (0.3227 compared to 0.3090), suggesting selfrated health explains slightly more of the variation. An alternative test statistic to compare the models is the area under the Receiver Operating Characteristic(ROC) curve.[25] As with the R^{2} these diagnostics indicate that the model including selfrated health performs slightly better than the model including individual diseases, with the area under the ROC curve of 0.871 and 0.864 respectively.
The only other variable that has odds of participating in the labour force of a similar magnitude to those for fair or poor health is having a young child for females (a reduction in the odds of participating of around 90%). This indicates the relative magnitude of the relationship between fair/poor health and participation.
 Figure 4  Estimated odds ratios of participating in the labour force  pooled logistic regression  selfrated health: 2002/03 to 2004/05

 Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. Odds ratios are derived from Appendix Table E2 and are relative to excellent health. The footnotes from that table apply to this chart.
2. The following factors were held constant: gender; region; age (and whether 50 years of age or above); highest qualification; study status; marital status; place of birth; ethnicity; children; household income less personal income; years in paid employment; and unemployment rate at the time of the interview.
Table 8 indicates that around 18% of those in excellent health are participating in parttime work compared to 31% of those in poor health. As selfrated health decreases, the likelihood of working fulltime appears to fall and the likelihood of working parttime to increase. This is consistent with the earlier observation that those who have been diagnosed with a chronic disease are relatively more likely to work parttime.
Labour market outcome (%)  

Health status  Fulltime employment  Parttime employment  Unemployment  Total participating 
Total  78.4  19.0  2.7  100.0 
Excellent health  80.4  17.5  2.1  100.0 
Very good health  78.6  19.1  2.3  100.0 
Good health  75.9  20.1  3.9  100.0 
Fair health  64.5  28.7  6.8  100.0 
Poor health  58.2  31.1  10.6  100.0 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights, Statistics New Zealand
Note: See footnotes Table 5.
Table 9 shows the odds ratios from a multinomial logistic regression when other factors are controlled for. Even when other factors are held constant, being of good, fair or poor health is associated with a larger reduction in the odds of working fulltime (relative to being inactive) as opposed to parttime (relative to being inactive).[26] For example, being in fair health rather than excellent is associated with an 83% reduction in the odds of working fulltime (relative to being inactive), compared to a 61% reduction in working parttime (relative to being inactive).
Odds ratios  

Health status  Fulltime employment  Parttime employment  Unemployment 
Very good health  0.925  0.974  1.037 
Good health  0.514***  0.626***  0.965 
Fair health  0.174***  0.389***  0.537*** 
Poor health  0.054***  0.139***  0.291*** 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. These odds are derived from the data in Appendix Table E3. For full footnotes see that table.
2. The following factors were held constant: gender; region; age (and whether 50 years of age or above); highest qualification; study status; marital status; place of birth; ethnicity; children; household income less personal income; years in paid employment; and unemployment rate at the time of the interview.
3. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
Table 10 shows the marginal effects from the same model. The results show that, for an average person, being in any health state other than excellent is associated with a reduced chance of working fulltime. For the majority of health states (other than poor health) this reduction in the chance of working parttime is balanced by increases (both significant and not significant) in the chance of working parttime, being unemployed or being inactive. For those in poor health the chance of working parttime is also reduced compared to someone of excellent health. An average person in poor health, compared to an average person in excellent health, is 49.1 percentage points less likely to work fulltime, 4 percentage points less likely to work parttime and 51.9 percentage points more likely to be inactive.
Marginal effects  

Health status  Fulltime employment  Parttime employment  Unemployment  Inactive 
Very good health  0.014*  0.005  0.002  0.007 
Good health  0.095***  0.009  0.012***  0.074*** 
Fair health  0.308***  0.043***  0.015***  0.250*** 
Poor health  0.491***  0.040**  0.012  0.519*** 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. These marginal effects are derived from the data in Appendix Table E3. For full footnotes see that table.
2. The following factors were held at the mean value for the whole sample: gender; region; age (and whether 50 years of age or above); highest qualification; study status; marital status; place of birth; ethnicity; children; household income less personal income; years in paid employment; and unemployment rate at the time of the interview.
3. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
Notes
 [24]In order for the marginal effects to be comparable to those from the fixed and random effects model they are calculated as if the health states are independent. This means the marginal effects are slightly higher than if independence had not been assumed.
 [25]This curve looks at the tradeoff between false negative and false positive rates for the model at various cutoff points; in other words, the ROC curve is the representation of the tradeoffs between sensitivity and specificity. The larger the area (with the maximum being one) the better the diagnostic test.
 [26]The coefficients for fulltime and parttime were significantly different from each other at the 95% level.
6.2.2 Fixed and correlated random effects panel models
The standard pooled logit model considered the impact of the selfrated health state at a given point in time, but, unlike panel models, it is not possible to adjust for any possible types of endogeneity that might exist. The panel models estimate the health effect in a slightly different way than standard crosssectional logistic regressions: considering changes in health (health shocks) over time. Table 11 shows transitions across the selfrated health state between two consecutive waves. The results indicate that, while the majority of respondents do not change health state between waves, there is some movement both to better health and poorer health between consecutive waves. For example, while around twothirds of those in excellent health in one wave remain there in the consecutive wave, the remaining third move to poorer health.
Health status in following wave (t+1)  

Excellent  Very good  Good  Fair  Poor  
Health status in wave t 

Excellent  67.5  24.7  6.6  1.0  0.1 
Very good  27.9  50.1  19.3  2.4  0.3 
Good  12.5  31.8  44.8  9.4  1.4 
Fair  3.4  13.7  34.2  39.5  9.3 
Poor  3.0  5.2  17.2  30.7  43.6 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights, Statistics New Zealand
Note: Results are for those aged 1564 and are not fulltime students. Data for changes between Wave 1 and Wave 2 and between Wave 2 and Wave 3 are pooled together to create an average rate.
The fixed effects model looks at how changes in the explanatory variables are related to changes in labour force participation, when other unobserved time constant variables such as genetics, are controlled for. Table 12 shows how changes in participation compare with changes in selfrated health between two consecutive waves. The first part of the table is based on those who are participating in Wave 1 or Wave 2 (21,610).[27] The percentage indicates the proportion of these who move to not participating in Wave 2 and Wave 3 respectively. So around 4% of those who report their health to be excellent in Waves 1 and 2 or in Waves 2 and 3 respectively move from participating to not participating. The proportion moving out of participation is generally higher for those who experience a decline in selfrated health. For example, 16% of those who report their health to be excellent in Wave 1 or Wave 2 but fair or poor in Wave 2 or Wave 3 respectively move out from participating to not participating. The second part of the table shows the reverse of this; that is, those who are not participating in Wave 1 or Wave 2 (4,975).[28] Of those who are not participating in Wave 1 and Wave 2 those who experience negative changes to selfrated health are less likely to move into participation. For example, 40% of those who report being in excellent health in two consecutive waves who are not participating in Wave 1 or Wave 2 move into participation in Wave 2 or Wave 3 respectively. For those who report their health changing from excellent to fair/poor between waves, only 34.1% move into participation.
Health status in following wave (t+1)  

Excellent  Very good  Good  Fair or poor  
Health status in wave t 

Excellent  3.8  5.5  6.5  16.2 
Very good  4.3  3.9  6.1  11.5 
Good  6.4  5.3  6.7  14.3 
Fair or poor  S  13.5  8.2  15.1 
Health status in following wave (t+1)  

Excellent  Very good  Good  Fair or poor  
Health status in wave t 

Excellent  40.2  39.1  41.7  34.1 
Very good  36.3  34.5  28.6  20.5 
Good  38.2  33.3  20.2  15.9 
Fair or poor  44.4  40.4  21.4  9.6 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights, Statistics New Zealand
Notes:
1. Results are for those aged 1564 and are not fulltime students. Data for changes between Wave 1 and Wave 2 and between Wave 2 and Wave 3 is pooled together to create an average rate.
2. Fair and poor are combined owing to small numbers in some of the categories.
3. S  This cell is suppressed as it is subject to sample error too great for most practical purposes.
Of those longitudinal working age nonstudent respondents in the survey period, around 14% (5,710) experience a change in participation status and have nonmissing data in two consecutive waves for the variables of interest. This is the group that are used for analysis in the fixed effects logistic model. Around 20% of these experience a change in selfrated health between two consecutive waves.
In the fixed effects model, the effect of any variables that are nontime varying over the survey period cannot be estimated. In this case the effect of gender and place of birth on labour force participation are not estimated. Also, following best practice, those variables that are little or slow changing (eg, ethnicity and highest qualification); or which could be impacted on by health changes (eg, studying status and years in paid employment) are excluded from both the fixed and random effects models. Full results are presented in Appendix Table F1. The results for the nonhealth variables indicate that a movement to the South Island from Auckland is associated with a significant reduction in the chance of participating. A change to having a partner who does not work reduces the chance of participation, possibly indicating couples taking early retirement together. For females, having a child is associated with an 88% decrease in the odds of participating.
Figure 5 shows the odds ratios for the selfrated health categories from the fixed effects regression model. The results indicate that there is not a significant relationship between a move into very good or good health from excellent health and the chance of participating. However, a move to fair or poor health from excellent health is associated with a 43% or 78% reduction in the odds of participating respectively for each person (equivalent to the odds ratios of 0.57 and 0.22). It should be remembered that the fixed effects model attempts to explain variation in participation for each respondent; that is, only within, rather than within and between, person variation is considered.
 Figure 5  Estimated odds ratios of participating in the labour force  fixed effects model  selfrated health: 2002/03 to 2004/05

 Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Note: Odds ratios are derived from Appendix Table F1. The footnotes from that table apply to this chart. They are the odds within people as between respondent variation is not considered. As such, they are not directly comparable to the odds from the pooled or random effects models. The folowing factors were held constant: region; age (and whether 50 years of age or above); marital status; children; household income less personal income; and unemployment rate at the time of the interview.
Finally, the correlated random effects model was estimated. A standard random effects model allows for time constant unobserved variables that are fixed over time but that are uncorrelated with the explanatory variables in the model. The concern here is that health is correlated with the unobservables. If this were not the case then the coefficients for health would not be biased. Using a correlated random effects model it is assumed that the only correlation between the health and the unobservables is through average health and includes a variable indicating average health in a standard random effects model. Full information on the model, including the equation and the assumed relationship, can be found in Appendix C.
Figure 6 summarises the odds ratios for the health shock variables from the correlated random effects model. Full results can be found in Appendix Table F2. Looking at the health shocks indicates that, as in the fixed effects model, only a fair or poor health shock from excellent is significant in affecting participation, reducing the odds of participating by 34% and 65% respectively (slightly lower than the within person odds estimated in the fixed effects model of 43% and 78%). What is more influential is the average time in a health state of a person. Spending more time in good, fair or poor health significantly reduces the odds of participating relative to being in excellent health. Being in good, fair or poor health for all three waves reduces the odds of participating by 80%, 97% and 99% respectively. The model summary statistics indicate that 59% of the total variation is contributed by the panellevel variance component.
 Figure 6  Estimated odds ratios of participating in the labour force  correlated random effects model  selfrated health: 2002/03 to 2004/05

 Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Note: Odds ratios are derived from Appendix Table F2. The footnotes from that table apply to this chart. The folowing factors were held constant: gender; region; age (and whether 50 years of age or above); marital status; place of birth; children; household income less personal income; and unemployment rate at the time of the interview.
6.2.3 Model comparisons
An alternative way to look at the results is to calculate the marginal effects. The odds ratios for the fixed effects model are not directly comparable with the odds from the pooled or random effects regression because the variation is coming from the variation within individuals. However, an average marginal effect can be computed for the fixed effects model to enable relative comparisons between groups of people with different covariates. These are shown in Table 13. The results of the fixed and correlated random effects models indicate that even after controlling for time invariant unobserved variables, poorer health is still associated with a reduction in the chance of participating (shown by the lower marginal effect for the panel models than the pooled model, consistent with the results from the odds ratios  a higher ratio of which indicates a lower reduction in the chance of participating). This possibly indicates that there are time constant unobserved variables that should have been included in the standard pooled regression that are positively correlated with health and participation (eg, motivation), and hence the coefficients in the pooled model are systematically overestimated. Further, while the magnitude of the impact of health shocks is lower, they are still significant in reducing the chance of participating when average health state over the period is allowed for (shown by the results of the correlated random effects model).
Health status  Marginal effects  

Pooled regression  Fixed effect model  Random effects model  
Very good health  0.006  0.006  0.000 
Good health  0.065***  0.018  0.003 
Fair health  0.222***  0.127***  0.019*** 
Poor health  0.496***  0.340***  0.065*** 
Average time in very good health      0.006 
Average time in good health      0.062*** 
Average time in fair health      0.127*** 
Average time in poor health      0.201*** 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. Marginal effects are derived from Appendix Tables E2, F1 and F2 holding all other factors at the mean value for the whole sample. The footnotes from those tables apply to this table.
2. For the pooled regression the effect is of being in the health state rather than being in excellent health. For the fixed and random effects models the marginal effects for each health state are the effect of a health shock from excellent into that health state. The final marginal effects for the random effects model are the effect of spending all waves in a health state rather than all waves in excellent health.
3. The marginal effects for the fixed effects model are pseudo marginal effects calculated based on the overall sample mean of the predicted probability of a positive outcome.
4. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
5. These marginal effects assume that the health states are independent. Accounting for the fact that the health states aren’t independent reduces the marginal effects slightly.
The results for all three types of models identify a highly significant relationship between health and labour force participation; however, no model is perfect and the type and magnitude of the impact estimated varies. Tests to determine which model is preferred were carried out. A likelihoodratio test indicated that the proportion of variation from the panel component of the random effects model was significantly different from zero and as such the panel element of the data should not be ignored (thus the standard logistic regression results are likely to be biased). This means that the panel models are preferable to the pooled estimator.
A Hausman test comparing the fixed effects model with the uncorrelated random effects model indicated correlation between the unobserved individual level effects and the other covariates, hence the use of the correlated random effects model (making the assumption that the correlation between the unobserved individual level effects and other covariates is only with health and only through average health).
However, a significant Hausman test comparing the fixed effects and correlated random effects model indicates that the unobserved individual level effects are still correlated with the covariates in the fixed effects model, even after controlling for the correlation between these unobserved variables and health. This may be: correlation between the unobserved variables and nonhealth covariates; correlation between the unobserved covariates and health shocks, if the expected value of the unobservables is not equal to a linear function of the average time spent in each health state (which was assumed for the correlated random effects model); or correlation between the average health level variable and the unobserved variables.
This correlation means that the health coefficients (both health level and/or health shocks) from this model may be biased. Further, other types of endogeneity such as unobserved variables that change over time cannot be accounted for. This indicates that the preferred model is the fixed effects model.
However, this model is not without its drawbacks. By definition a fixed effects model excludes all those for whom participation does not change over the period from the analysis, meaning there is no estimate of the relationship between health and participation for those continually inactive. It seems theoretically sensible that some people will be in consistently poor health over the periods considered and not participate as a result of this. These people will not be included in any estimates of impact from this model. Also the fixed effects model focuses on variation in participation for each respondent. This means that only within (rather than within and between) person variation is considered. Finally, there may be other types of endogeneity present that it is not possible to account for using a fixed effects model; for example, unobserved variables that change over time and are related to the explanatory variables. Assuming that this is not the case, the fixed effects model should produce estimates that are consistent (and unbiased).
As the models look at the relationship between health and labour force participation in different ways all the results are informative in their own way. The key result is that a significant relationship between health and participation was indentified in all of the models.
6.3 Adjusted selfrated health
In the previous section it was found that there was a significant relationship between health and participation even after accounting for unobserved variables. However, these results may occur owing to respondents using their health status to rationalise their participation; that is, reporting their health to be worse than it actually is to justify the fact that they are not participating. In previous studies, for example Disney et al (2003), one approach to try to remove this rationalisation bias from health measures has been to model selfrated health using more objective health related variables. Estimates from such a model have then been standardised and included in models to estimate the relationship between health and labour force participation in place of selfrated health. This approach was therefore used to try to rid the selfrated measure of health in SoFIE of its potential rationalisation bias. Full details of how the adjusted health measure was calculated and used in the models can be found in Appendix C. These results complement the findings in the previous section. The key finding is that, even when selfrated health is adjusted to account for potential rationalisation bias, a highly significant relationship is still found between health and labour force participation. This approach also leads to the fixed effects model being identified as the preferred model. The results strengthen the conclusions made in the previous section, in that it seems that the relationship identified between health and labour force participation is not owing to rationalisation bias.
6.3.1 Calculation of adjusted health measure
The following measures are available for each respondent in every wave of SoFIE: whether a respondent has ever smoked; the presence of each individual chronic disease; and the receipt of a health or illness related benefit.[29][30] Table 14 shows for each health state the proportion of people who report each health related measure. For example, 38% of those in excellent health have been diagnosed with one or more chronic diseases, compared to 84.8% of those in poor health. It shows that all three measures are correlated to some extent with health. These three measures are also more objective than selfrated health. These variables will therefore be termed “objective health measures”.
While Table 14 shows that only 4%, 7.2% and 14.8% of those who consider their health to be excellent, very good and good, respectively, receive a health related benefit. However, it is important to remember that these groups account for a large proportion of those who receive health related benefits once the relative size of these health states is considered. Table 7 shows that around 94% of the population consider themselves to be in excellent, very good or good health. Combining the figures from Table 14 and Table 7 indicates that around seventenths of those receiving a health related benefit consider themselves to be in good, very good or excellent health. This is well below the level for the population as a whole but it is still higher than may have been expected. This highlights the possible issues with the survey questions that measure selfrated health (discussed in Section 4.2.2) and the mismatch between health and disability; for example, a person who is blind may be eligible for a disability benefit (included here within health related benefit) but may consider themselves to be in excellent health. This finding is along similar lines to international evidence that suggests that on average one in three qualified recipients of a disability related benefit claim to have no subjectively perceived disability that limits their daily activity (OECD, 2003).
Selfrated health  

Objective health measure  Excellent  Very good  Good  Fair  Poor 
Any chronic disease  38.0  51.4  61.7  77.1  84.8 
Asthma  14.6  19.6  21.7  27.4  34.1 
High blood pressure  8.2  15.3  22.9  33.3  39.3 
High cholesterol  8.8  13.8  18.2  24.6  33.5 
Heart disease  0.8  2.2  4.9  11.6  23.6 
Diabetes  0.6  2.1  5.9  12.4  21.0 
Stroke  0.3  0.7  1.6  4.3  7.8 
Migraine  10.2  13.7  17.2  22.0  28.7 
Psychiatric conditions  5.2  8.8  14.4  25.6  36.6 
Cancer*  2.2  3.8  4.3  7.7  7.9 
Smoked  38.5  48.5  55.9  60.3  66.2 
Health related benefit  4.0  7.2  14.8  36.5  64.2 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights (* adjusted longitudinal weight), Statistics New Zealand
Notes:
1. The figures in each cell are the proportion in a certain health state that report the health related measure.
2. See footnotes Table 5.
It was shown previously that chronic disease presence is correlated with participation. There is also correlation between health benefit receipt and labour market participation and a weak correlation between whether a person has ever smoked and labour market participation. However, it is sensible to assume that if true health was measured correctly these objective health measures should only affect participation through this health measure once other factors are controlled for.
Given this relationship, these objective health measures (along with a set of other health related variables) were used to model selfrated health for each year. The results of these models can be found in Appendix Table G1.
Looking at the model results indicates that all of the objective health measures are highly significant in explaining selfrated health. However, overall, the models only explain around 11% of the variation in the data. In terms of interpreting the model results, a higher value of selfreported health means poorer health. This means that positive coefficients on the objective health measure, for example 0.418 for those who have cancer in Wave 1, are associated with an increase in the predicted probability that an individual will be in poor health and a decrease in the predicted probability that they will be in excellent health. With this in mind the largest health impact is seen from those receiving health related benefits (a coefficient of 1.286 in Wave 1) while the most influential health condition is diabetes (a coefficient of 1.086). The least influential health condition is high cholesterol (a coefficient of 0.177). Looking at the nonhealth coefficients indicates that health is generally predicted to be poorer for those outside Auckland; those born outside of New Zealand; those of nonNZ/European ethnicity; older respondents; those with no qualifications; and those with no partner relative to the reference categories. Health is generally predicted to be better for females; those with tertiary education; those who are undertaking some form of study; and those with higher household income.
The results of these models were used to create an adjusted health stock. The probability of being in poor health was predicted for each person. This probability was then standardised across all respondents to give a continuous measure of adjusted health status (or adjusted health stock). For all respondents (including those over working age) this adjusted measure therefore had a mean zero and standard deviation of one. As with selfrated health, a higher adjusted health stock indicates poorer health. This is illustrated in Table 15 where the mean and standard deviation of the adjusted health stock are presented for each selfrated health state. The mean and standard deviation of this health stock for those of interest (working age nonstudents) is less than zero. This is because those with generally poorer health (respondents aged 65 and over) are included in the model to create the adjusted health stock, but are excluded from the analysis to determine the relationship between health and participation. As with unadjusted selfrated health, this was done to ensure the total distribution of the adjusted health measure reflected that of health in the total population.
Health status  Mean  Standard deviation 

Excellent  0.306  0.171 
Very good health  0.230  0.329 
Good health  0.049  0.708 
Fair health  0.522  1.590 
Poor health  1.556  2.523 
Total  0.167  0.652 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights, Statistics New Zealand
Notes:
 The health measure is derived based on the standardised probabilities of poor health for all longitudinal respondents from the data in Appendix Table G1. For full footnotes see that table. These means and standard deviations are for those of working age who aren't students.
 The total figures are for the sample from the pooled and random effects regression. For the fixed effects regression the mean was 0.062 and standard deviation 0.750 indicating those who change participation status during the period are slightly less healthy than those who do not change.
6.3.2 Standard pooled regression
The adjusted health stock was then included in the standard pooled logistic regression in place of individual chronic diseases or selfrated health. Full results can be found in Table G2. This model explains a similar amount of variation in the data as the model including the unadjusted selfrated health and the model including the individual chronic diseases (32% compared to 32.3% and 30.9% respectively).
The coefficients of the nonhealth variables in this model are little changed from those in the pooled models, including chronic diseases or unadjusted selfrated health. Health is still highly significant in affecting participation even after attempting to adjust for possible incorrect measurement of selfrated health. The coefficient for adjusted health indicates that a one unit increase in the level of health (a move to poorer health) is associated with a 57% reduction in the odds of participating. The adjustment of selfrated health results in difficulties interpreting what a unit change in this measure actually means in the real world. To give an indication of the dispersion of the adjusted health measure for the sample used in analysis, the average adjusted health level was 0.167. The standard deviation was 0.652 indicating that, while a one unit increase in health reduces the odds of participating by around 57%, many respondents will not experience a one unit change in adjusted health. It is therefore more sensible to consider a one standard deviation increase in adjusted health; this is associated with a 42% reduction in the odds of participating. While the categories of selfrated health are subjective and have no definite boundaries, it is easier to relate to a change from excellent to poor health than to a one unit change in the adjusted health stock. However, the fact that this health measure is still significant in impacting on participation illustrates that health is significantly related to participation even allowing for possible rationalisation.
6.3.3 Fixed and correlated random effects panel models
The adjusted health measure was then included in the fixed and correlated random effect models. The results can be found in Appendix Tables G3 and G4 respectively. The coefficients for the nonhealth variables were similar to the models for unadjusted selfrated health (Tables F1 and F2).
The key thing to note from the fixed effects model (Appendix Table G3) is that a one standard deviation increase in adjusted health stock (so a poorer health shock) is associated with a 31% increase in the odds of not participating. This is in line with what was found when comparing the pooled and fixed effects model using unadjusted selfrated health (again the odds are not directly comparable as the fixed effects model only considers within person variation).
Turning to the correlated random effects model, both the health shocks (a change in adjusted health) and the average level of adjusted health are significantly related to participation. A one standard deviation increase in adjusted health is associated with a 31% reduction in the odds of participating. Further, the higher the average adjusted health state over a period is (ie, the poorer a person's longer term health) the less chance there is they will participate and this impact is larger than that for a health shock (a one standard deviation increase in the average adjusted health stock is associated with a 52% reduction in the odds a person will participate). Again these results are similar to what was found in the correlated random effects model including unadjusted selfrated health. This illustrates that health is significantly related to participation even allowing for possible rationalisation.[31]
As with the unadjusted health models a likelihoodratio test for the random effects model indicates that the panel variation is significant and thus a panel model is preferred. A significant Hausman test, comparing the fixed effects and uncorrelated random effects model, indicated that the fixed effects estimator should be used instead of the random effects as the unobserved individual level effects were correlated with the other covariates. This correlation remains even after the correlated random effects model is used. This indicates that the preferred model is the fixed effects model.
Notes
 [29]The latter assumes that people receiving a health related benefit are less healthy than people who don’t. Also note that some illness benefits included are joint income tested so this variable is likely to have a lower correlation with health for those wealthier households.
 [30]These variables are defined in Appendix Tables A1, A2 and A3.
 [31]Based on the arguments given by Bound et al (1999) it may be expected that lagged health might affect current behaviour because transitions may take time. A lagged adjusted health variable was also included in the fixed and correlated random effects model, along with current health, using just two waves of the data to see if a health shock in a previous period was significantly related to participation. However, unlike in Bound et al the lagged effect was not found to be significant on top of current health. It should be noted that this relationship might exist but that with only three waves of data may be hard to estimate.
7 Conclusion
This paper has examined the relationship between health and labour force participation. It found that health was significantly related to participation, using various health measures and even after accounting for certain types of endogeneity. Table 16 summarises the marginal effects from all the models considered.
Health status  Marginal effects  

Pooled regression  Fixed effects model  Random effects model  
Any chronic disease  0.038***     
Asthma  0.009     
High blood pressure  0.018***     
High cholesterol  0.008     
Heart disease  0.083***     
Diabetes  0.067***     
Stroke  0.123***     
Migraine  0.004     
Psychiatric conditions  male  0.132***     
Psychiatric conditions  female  0.065***     
Cancer  0.007     
Very good health  0.006  0.006  0.000 
Good health  0.065***  0.018  0.003 
Fair health  0.222***  0.127***  0.019*** 
Poor health  0.496***  0.340***  0.065*** 
Average time in very good health      0.006 
Average time in good health      0.062*** 
Average time in fair health      0.127*** 
Average time in poor health      0.201*** 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Note:
1. All other variables in the models are fixed at the mean value for the whole sample.
2. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
3. For the pooled regression the effect is of being in the health state rather than being in excellent health. For the fixed and random effects models the marginal effects for each health state are the effects of a health shock from excellent into that health state. The final marginal effects for the random effects model are the effects of spending all waves in a health state rather than all waves in excellent health.
Results of the standard pooled regression models that included individual chronic diseases indicated that there was insufficient evidence that those with asthma, high cholesterol, migraine or cancer were any less likely to be participating in the labour market than those without these diseases, once other factors were controlled for. In contrast, psychiatric conditions, stroke, heart disease, diabetes and high blood pressure were all associated with significant decreases in participation once other factors are held constant. Further, for psychiatric conditions, stroke and high cholesterol, the relationship with fulltime work was higher than that for parttime work (ie, the chance of working fulltime was reduced more than the reduction in the chance of working parttime), suggesting that not only is the presence of these diseases associated with lower participation but it is also associated with working fewer hours.
Psychiatric conditions for males were associated with the largest reduction in the chance of participation. This was the only disease where the relationship with labour force participation was significantly different by gender. When all other variables were fixed at their mean value, being a male with psychiatric conditions reduces labour market participation by 13.2 percentage points compared to that for males without psychiatric conditions. When all other variables were fixed at their mean value, being a female with psychiatric conditions was associated with a reduced labour market participation by 6.5 percentage points compared to that for females without psychiatric conditions. When all other variables were set at their mean level, being a male with psychiatric conditions was associated with a 1.5 percentage point reduction in participation compared to a female with psychiatric conditions. Following psychiatric conditions, for males the diseases that were associated with the largest fall in participation were strokes (a 12.3 percentage point reduction in labour market participation on average), heart disease (8.3 percentage point reduction), diabetes (6.7 percentage point reduction) and high blood pressure (1.8 percentage point reduction). The effect of the presence of disease did not differ significantly by gender, other than for psychiatric conditions.
These pooled regressions did not allow for possible endogeneity and as a result the coefficients may be biased. As the number of chronic diseases of interest diagnosed during the three waves of data available for analysis is relatively small, the paper moved to consider selfrated health. Fixed and correlated random effects models were used to allow for unobserved variables and an adjusted health measure was constructed to allow for possible rationalisation.
Results of the standard pooled regression models for selfrated health indicated that those in good, fair or poor health are significantly less likely to participate than those of excellent health. Being in good, fair or poor health was associated with a reduction in the chance of participating of 6.5, 22.2 and 49.6 percentage points respectively compared to being in excellent health. The only other variable for which the reduction in the chance of participating in the labour force is of a similar magnitude to that for fair or poor health is having a young child for females. This indicates the relative magnitude of the relationship between fair/poor health and participation. As with the individual chronic diseases, being in good, fair or poor health was associated with a larger reduction in the chance of working fulltime than that for working parttime.
The fixed and correlated random effects panel models indicated that a negative health shock significantly reduced the chance of participation even when unobserved timeconstant factors were controlled for. The coefficients for the fixed and correlated random effects model are higher (therefore the reduction in the chance of participation lower) than the pooled regression, suggesting possible unobserved variables that are correlated with health and participation. In the fixed effects model only a fair or poor health shock was associated with a significant reduction in participation; reducing the chance of participating by 12.7 and 34 percentage points respectively. The coefficients for the correlated random effects model indicate that a health shock to fair or poor health from excellent health significantly impacted on participation, reducing the chance of participating by 1.9 and 6.5 percentage points respectively. Further, even after controlling for the average time spent in each health state, health shocks were still found to be significantly related to participation. Spending all three waves in good, fair or poor health was associated with a 6.2, 12.7 and 20.1 percentage point reduction in the chance of participating.
All models indicate a significant relationship between health and labour force participation; as such the results complement each other. Tests suggested that the preferred model was the fixed effects model. If it is assumed that there are no unobserved variables that vary over time that are correlated with the explanatory variables, then estimates from this model are consistent (and unbiased). However, this model also had weakness and, owing to the slightly different things being estimated in the different models, results from all three models including selfrated health are informative.
An attempt was then made to remove possible rationalisation from the selfrated health variable. Results of the pooled, fixed and correlated random effects regression models using the adjusted health measure complement those from the unadjusted selfrated health models; that is, they indicate a significant relationship between adjusted health and participation above that from possible rationalisation. As with the longitudinal models that use unadjusted health, the impact of adjusted health on participation is reduced when unobserved timeconstant variables are taken into account but remains significant.
The results do not control for unobserved variables that change over time. They also do not allow for the “feedback effect”; that is, that participation could influence health. As such, the results do not address causality but simply establish relationships between health and participation. An exploration of feasible instruments was conducted in order to try to instrument health thus making it possible to take into account variables that vary over time and causality, but no suitable instruments were found.
8 Discussion
8.1.1 Impact on the labour force
The results so far have considered the relationship between health and labour force participation at an individual level. For policy purposes, it is helpful to understand the potential impact of these relationships at the population level. While the magnitude of relationship between health and labour force participation is larger for those of poorer health, if the number in poorer health in the population is small then the estimated impact at the population level may not be large. It is important to remember again that, in this section, words such as “impact” and “effect” are used to describe relationships but do not attempt to explain causation.
Table 17 presents the estimated impact of different diseases and health states. These estimates are based on the marginal effects reported in Tables 3 and 13 and the estimated number of working age nonstudents in each group.[32] They therefore provide an indicator of the workforce impact of poor health. The marginal effects were estimated with the other variables set at the whole sample mean; that is, the figures estimate the additional number of people who may participate in the absence of poor health, if they have average values for the remaining characteristics.[33] The error margin around the estimated impact figures only considers error in the marginal effect. The proportion figures in the table illustrate the proportion of the number of participating working age nonstudents the count represents. The number of working age nonstudents estimated to be participating on average over the three waves of SoFIE is 1.84 million. Figures for all diseases and selfrated health states/shocks are reported even if they are not significant. The groups for which the number impacted, or the proportion, crosses zero indicate where the impact is not statistically significant. For the level of health this means that there was insufficient evidence to suggest that the chance of labour force participation for those in this health state was statistically different from those in the “best” health state. For the health shocks this means there is insufficient evidence to suggest that a negative health shock into this health state would significantly affect the chance of labour force participation. For those diseases or health states/shocks that are significant, the asterisks indicate the level of significance of the marginal effect. The categories that are not significant are excluded where totals are calculated. This is justifiable for the purpose of estimating the potential change in labour force participation that can be associated with the movement of those in these categories to better health, or the prevention of a health shock, as the models found insufficient evidence that there would be one.
As discussed in Section 6, the preferred model is the fixed effects model (for which results are assumed to be unbiased). Results from the standard logistic regression models may be biased owing to possible endogeneity. While the correlated random effects model attempts to account for some types of endogeneity, the results of this model may also be subject to bias. Despite this, impact figures are presented from all of these models to allow comparison of the model results and because the fixed effects model does not allow an estimate of the relationship between a constant health level and labour force participation.
Looking at the grouped chronic disease indicator from the pooled regression model indicates that if this group no longer had chronic diseases an additional 42,200 people may participate. This represents a 2.3% increase in the total number of people participating.
Moving on to consider individual chronic diseases, the table shows that the largest increase in the number of additional participants is for females with psychiatric conditions. This is despite the fact that the odds ratios and marginal effects are estimated to be of greater magnitude for stroke, heart disease, diabetes and males with psychiatric conditions. This illustrates the importance of the size of the group of interest when relating the results to the population as a whole. If no females suffered from psychiatric conditions it is estimated that an additional 9,500 people may participate; which represents a 0.5% increase in the total number of people participating. It should be remembered that as the disease groups are not independent (ie, a person may have diabetes as well as heart disease) the number impacted cannot be summed across all diseases.
The results of the pooled logistic regression for selfrated health in Table 17 illustrate the estimated additional number of people who would participate if they had excellent health, as opposed to the health state listed. So if all those people with good health had excellent health an additional 26,400 people may participate; which represents a 1.4% increase in the total number of people participating. Again, this illustrates that, while the marginal effects and odds ratios are higher for those in fair or poor health, the biggest potential increase in participation comes from those in good health. Overall, an additional 66,800 people may participate if they had excellent health; a 3.6% increase in the total number of people participating.
As explained previously, there may be unobserved variables that impact labour force participation and/or health. The logistic regressions do not account for this. Despite this, the estimates from the pooled models give an indication of the possible impact of health on participation. To try to control for unobserved timeconstant variables, panel models were used. Their interpretation is slightly different from the pooled models. The results for the fixed effects model for selfrated health in Table 17 illustrate the additional number of people who may participate in the absence of negative health shocks.[34] That is, if during an annual period there were no negative health shocks, an additional 12,700 people may participate; which represents a 0.7% increase in the total number of people participating. While the coefficients and odds ratio from this model reported earlier were those for a health shock from excellent to a lower health state, other health shocks are possible and these health shocks are accounted for in these figures. For example, if there were no health shocks into poor health (from any of the higher health states) then an additional 5,200 people may participate.
Notes
 [32]While these figures are themselves estimates from SoFIE, and therefore subject to error, they were taken to be fixed in the calculation of the estimated impact.
 [33]The marginal effects estimated using group means rather than whole sample means were broadly the same.
 [34]It should be remembered that the fixed effects model only considers within, rather than within and between, person variation. Despite this, in order to estimate the impact at the population level, the results of the model are assumed to be the same as for the population as a whole.
8 Discussion (continued)
Count  %  

Health status  Point estimate  95% CI (lower; upper)  Point estimate  95% CI (lower; upper) 
Grouped chronic diseases  pooled regression 

Any chronic disease  42,200***  (32,200; 52,200)  2.30  (1.75; 2.84) 
Individual chronic diseases  pooled regression 

Asthma  3,700  (800; 8,300)  0.20  (0.04; 0.45) 
High blood pressure  5,800***  (1,300; 10,300)  0.32  (0.07; 0.56) 
High cholesterol  2,400  (1,800; 6,600)  0.13  (0.10; 0.36) 
Heart disease  5,300***  (3,000; 7,600)  0.29  (0.16; 0.41) 
Diabetes  4,400***  (2,200; 6,600)  0.24  (0.12; 0.36) 
Stroke  2,700***  (1,300; 4,000)  0.15  (0.07; 0.22) 
Migraine  1,300  (2,500; 5,100)  0.07  (0.14; 0.28) 
Psychiatric conditions  male  8,900***  (6,600; 11,600)  0.49  (0.36; 0.63) 
Psychiatric conditions  female  9,500***  (2,500; 18,100)  0.52  (0.14; 0.99) 
Cancer  500  (1,500; 2,600)  0.03  (0.08; 0.14) 
Selfrated health  pooled regression 

Very good health  4,700  (1,900; 11,300)  0.26  (0.10; 0.62) 
Good health  26,400***  (21,000; 31,800)  1.44  (1.14;1.73) 
Fair health  24,900***  (21,200; 28,600)  1.36  (1.16; 1.56) 
Poor health  15,500***  (13,500; 17,400)  0.84  (0.74; 0.95) 
Total (exc. insignificant)  66,800  (55,800; 77,800)  3.64  (3.04;4.24) 
Selfrated health  fixed effects 

Very good health shock  1,500  (10,800; 7,800)  0.08  (0.59; 0.43) 
Good health shock  4,500  (7,100;16,200)  0.25  (0.03; 0.47) 
Fair health shock  7,600***  (2,500;12,600)  0.41  (0.31;0.52) 
Poor health shock  5,200***  (2,700;7,600)  0.28  (0.22; 0.34) 
Total (exc. insignificant)  12,700  (5,300; 20,200)  0.69  (0.29; 1.10) 
Selfrated health  random effects 

Very good health shock  0  (1,300;1,300)  0.00  (0.07;0.07) 
Good health shock  600  (700; 1,800)  0.03  (0.04; 0.10) 
Fair health shock  1,100***  (300; 1,800)  0.06  (0.02; 0.10) 
Poor health shock  900***  (300, 1,500)  0.05  (0.02; 0.08) 
Average time in very good health  4,500  (2,500; 11,400)  0.25  (0.14; 0.62) 
Average time in good health  25,300***  (20,900; 29,600)  1.38  (1.14; 1.61) 
Average time in fair health  13,900***  (12,000; 15,700)  0.76  (0.65; 0.86) 
Average time in poor health  6,000***  (5,200; 6,900)  0.33  (0.28; 0.38) 
Total (exc. insignificant)  47,100  (38,700; 55,500)  2.57  (2.11; 3.02) 
Source: SoFIE Waves 13 Version 4, Statistics New Zealand
Notes:
1. These estimates are calculated using the marginal effects in Tables 3 and 13 from unweighted models and the weighted count participation estimates. Standard longitudinal weights are used other than for cancer where the adjusted weights are used. Data is for 2002/05 period but estimate of impact is for the annual average over this period.
2. Groups may not sum to totals owing to rounding.
3. The totals include only significant estimates.
4. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
5. For the pooled regression the impact is the number of additional people participating if all participants had excellent health/no chronic disease(s). For the fixed and random effects models the impact is the number of additional people participating if there were no negative health shocks in a year (this is from any higher health state into the health state mentioned). The final marginal effects listed for the random effects model illustrate the impact of having excellent health in all waves rather than a proportion of time in the lower selfrated health state listed.
The impact figures for the fixed effects models are much lower than those estimated from the pooled models. This is owing to the differences in what is being estimated; that is the relationship between labour market participation and health shocks rather than the level of health. Using the fixed effects model it is not possible to estimate the impact of the current level of health; it is just the impact for those whose health deteriorates that can be estimated.[35] Given that in an annual period not everyone experiences a health shock, the number impacted is smaller.
The estimates for the health shocks for the random effects model are calculated in the same way as for the fixed effects model; that is, the numbers represent the increase in the number of people participating if they had not had a negative health shock to the state listed. However, this model also takes into account the average level of health. These estimates illustrate the additional number of people who may participate if their average level of health had been excellent in the last three years rather than being at the stated level for some period in recent years. That is, if those who had spent some time in good health in recent years had instead been in excellent health, an additional 25,300 people may participate. These results illustrate that the level of health rather than heath shocks, is much more influential in the relationship with labour force participation. In total the results of the random effects model for selfrated health indicate that if there were no negative health shocks and the average level of health in previous periods had been excellent then an additional 47,100 people may participate.[36]
Tests of the models indicated that the fixed effects model was preferred in a statistical sense. This was because the logistic models did not account for timeconstant unobserved variables, and as such the results are likely to be biased, and the correlated random effects model indicated that, even after including the average health level over the period, there still may be correlation between the unobserved variables and health, again meaning the results may be biased. However, it is important to remember that while the fixed effects model appeared to be the best model, the lack of inclusion of an estimate for the impact of health level is a large drawback. From a theoretical perspective it seems sensible that the level of health will be related to labour force participation. The fixed effects model indicates that a health shock in a specific period is significantly negatively related to participation. In the same period some people will be continuously inactive owing to poor health. It is not possible to estimate the significance or impact of this from the fixed effects models. Some of this group will have experienced a health shock at some point but, as they did not have their health shock in the period of consideration, this cannot be accounted for. While the results of the other models may be biased, they, along with the results of other international research, suggest the impact of the level of health may be nonzero. Given this, the lower confidence interval for the estimated impact figures from the fixed effects model could be seen as a lower bound for the estimate of the true impact of overall health (shocks and level) on labour force participation. It is known that the pooled regression results are likely to be biased as they do not account for unobserved variables that may explain variations in health. The impact estimates for this model are therefore likely to be too high. While the estimates from the random effects model may still be biased they provide an intermediate model, which is an improvement on the pooled model but not the fixed effects model. Owing to the potential bias, and owing to the fact that the relationship between the average health state and health shocks for the same individual are not accounted for in these impact estimates, it seems sensible to take the lower confidence interval of the random effects model as the upper bound for the impact of health on participation.
The point estimates from these models indicate that if there was an improvement in health (ie, no negative health shocks and everyone had excellent average health) an additional 12,700 to 47,100 people may participate; that represents a 0.7% to 2.6% increase in the total number of people participating. Based on the discussion above it is more sensible to assume that, if there was an improvement in health, the additional number of people who may participate is likely to be between 5,300 and 38,700; that is, a 0.3% to 2.1% increase in the total number of people participating.
It is important to remember that all of these impact figures are likely to be an underestimate of the impact of health on labour force participation for the population of New Zealand as a whole. One reason for this is that the SoFIE population is healthier than the population it aims to represent owing to those of poorer health being less likely to respond over time (see Section 3.2 for further explanation). Another reason is that the estimates are for those of working age only. They therefore do not account for the fact that improvements in health may result in those over working age participating in the labour force for longer. Further, reduced labour force participation is unlikely to be the only factor related to poorer health. Poor health will also result in lost output owing to people being away from work ill (absenteeism) and owing to lower productivity when at work (presenteeism). Health also impacts on educational development and skill usage. These “costs” are not considered here.
Table 18 provides the same estimates of impact as Table 17 but for the multinomial models. These estimates are based on the marginal effects from Tables 6 and 10 along with the number estimated to be in these groups. The results illustrate where the increase in fulltime working comes from. In the main, the increase is a result of a decrease in inactive people, but there is often a reduction in the number who work parttime or who are unemployed. As an example, consider the chronic disease indicator. If the group with one or more chronic diseases no longer had these diseases it is estimated that an additional 57,600 people may work fulltime. The majority (47,700) of these people move from being inactive to working fulltime. However, 5,500 of these people move from parttime employment to working fulltime and 4,400 move from being unemployed. It is estimated that, on average over the three waves of SoFIE, 1,436,800 people of working age worked fulltime; 348,700 worked parttime; 49,500 were unemployed; and 385,400 were inactive. The increasing number of people who may work fulltime in the absence of chronic disease therefore represents a 4% increase in the number of people who work fulltime, with falls of 1.6% in the number of people working parttime, 8.9% in the number who are unemployed and 12.4% in the number who are inactive.
Disease  Fulltime employment  Parttime employment  Unemployment  Inactive 

Grouped chronic diseases pooled regression 

Any chronic disease  57,600***  5,500  4,400**  47,700*** 
Individual chronic diseases  pooled regression  
Asthma  1,600  2,500  400  3,700 
High blood pressure  4,000  2,000  300  6,300** 
High cholesterol  2,900  300  600  2,700 
Heart disease  3,900**  1,100  400  5,300*** 
Diabetes  7,900***  1,700  900*  5,300*** 
Stoke  4,300***  1,200*  0  3,100*** 
Migraine  6,000*  2,100  2,100**  1,800 
Psychiatric conditions  male  12,500***  2,000*  1,000  9,600*** 
Psychiatric conditions  female  14,400***  2,500  1,200  10,700*** 
Cancer  200  200  200  500 
Selfrated health  pooled regression 

Very good health  10,600*  3,800  1,500  5,300 
Good health  38,700***  3,700  4,900***  30,100*** 
Fair health  34,500***  4,800***  1,700***  28,000*** 
Poor health  15,300***  1,200**  400  16,200*** 
Total  99,100  11,000  8,400  79,600 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. These estaimtes are based on the marginal effects from Tables 6 and 9 from unweighted models and the weighted count participation estimates. Standard longitudinal weights are used other than for cancer where the adjusted weights are used. Data is for 2002/05 period but estimate of impact is for the annual average over this period.
2. Asterixes indicate the impact is signficantly different from zero when all other variables are evaluated at the mean for the sample. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
3. Counts may not sum to totals and rows may not sum to zero owing to rounding.
4. The totals include only significant estimates.
Notes
 [35]The model takes into account changes in health but only negative health changes are considered here.
 [36]It should be noted that these estimates do not take into account the relationship between health shocks and average health. Average health is thought of as average health in the period before the health shock period.
8.1.2 Concluding remarks
In drawing conclusions from these results it should again be remembered that it was not possible to identify whether health had impacted on labour market status, or vice versa. Further, it is known that there are already interventions to help people with some of these conditions and the efficacy of further intervention may be limited. Nevertheless, there are a number of tentative conclusions that can be drawn from these results.
Firstly, in considering whether tackling chronic diseases would increase participation in the labour market, it is psychiatric illnesses where there is potential to have the greatest impact. As shown in Table 17, an additional 18,400 people may participate in the labour market in the absence of psychiatric conditions, which represents a 1% increase in the total number of people participating. Considering the point estimates, this is over three times more than the potential increase from any of the other conditions considered. Again, it must be remembered that these results are based on basic models which do not determine causality and do not control for unobserved factors that may explain some of the variation in labour force participation that is attributed to these conditions.
Secondly, by far the greatest impact on numbers in the labour market appears to come from people being in good or fair health rather than excellent health. In other words, interventions for basically healthy people may have a greater impact on labour market participation, than attempts to help those with the poorest health. However, in terms of health shocks, the only significant potential impacts are in the absence of health shocks from excellent health into fair or poor health.
Finally, in the absence of ill health, by far the greatest change in status is from not working to working fulltime. While, in the main, the results indicate some movement from working parttime to working fulltime with improvements in health, this movement is often not found to be significant. This is a particularly striking result when the change is from good or very good to excellent health, and suggests that parttime work may not be a common substitution for fulltime work for those in these health categories.
Another interesting finding briefly noted in the report is around health related benefits. While, as would be expected, the proportion of those receiving a health related benefit increases with decreasing selfrated health, around seventenths of those receiving a health related benefit consider themselves to be in excellent, very good or good health. Owing to the survey question (as discussed in Section 4.2.2) and the differences between health and disability (for example, a person who is blind may be eligible for a benefit but consider themselves to be in excellent health), it is perfectly plausible for a person to be eligible for a health related benefit and to selfrate their health to be good or above. However, this does highlight an area where further work could be undertaken to better understand the reasons for this result.
References
Biddulph, F., Biddulph, J. and Biddulph, C. et al (2003), The complexity of community and family influences on children's achievement in New Zealand: Best evidence synthesis. (Wellington: Ministry of Education).
Bound, J., Schoenbaum, M., Stinebrickner, T. R. and Waidmann, T. (1999), “The dynamic effects of health on the labour force of older workers”, Labour Economics, No 6: 179202.
Cai, L., (2007), The relationship between health and labour force participation: Evidence from a panel data simultaneous equation model, Melbourne Institute Working Paper Series Working Paper No. 1/07, February.
Cai, L. and Kalb, G., 2006, Health status and labour force participation: Evidence from Australia, Health Economics, Vol 15: 241261.
Crichton, S., Stillman, S. and Hyslop, D. (2007), Returning to work from injury: Longitudinal evidence on employment and earnings, Statistics New Zealand, December.
Currie, J. and Madrian, B. C. (1999), Health, health insurance and the labour market, In Handbook of labour economics, vol. 3, Ashenfelter O., Card D. (eds) Elsevier Science BV: Amsterdam, 1999: 33103415.
DeVol, R. and Bedroussian, A. (2007), An unhealthy America: The economic burden of chronic disease, Milken Institute, October.
Disney, R., Emmerson, C. and Wakefield, M. (2003), Ill health and retirement in Britain: A panel data based analysis, IFS.
Freese, J. and Scott Long, J. (2006), Regression models for categorical dependent variables using stata: Second edition. Strata Press.
Jensen, J., Sathiyandra, S., Rochford, M., Jones, Davina, Krishnan, V. and McLeod, K. (2005), Disability and work participation in New Zealand: Outcomes relating to paid employment and benefit receipt, Ministry of Social Development, June.
Laplagne, P., Glover, M. and Shomos, A. (2007), The effects of health and education on labour force participation. Australian Productivity Commission.
OECD (2003), Transforming disability into ability: policies to promote work and income security for disabled people. Campus Verlag.
Stata (2007), Stata statistical software: Release 10, StatCorp LP, College Station, Texas.
Stern, S. (1989), Measuring the effect of disability on labour force participation, The Journal of Human Resources, Vol 24, No. 3: 361395.
Tabachnick, B.G. and Fidell L. S. (2001), Using multivariate statistics, 4^{th} Edition.
Wooldridge, J. M. (2006), Introductory econometrics: A modern approach, 3^{rd} edition.
Bibliography
Cai, L. and Kalb, G. (2004), Health status and labour force participation: Evidence from the HILDA data, Melbourne Institute Working Paper No. 4/04, March.
Carter, K., Hayward, M. and Richardson, K. (2008), SoFIEHealth baseline report. University of Otago, Wellington.
Carter, K., Hayward, M. and Richardson, K. (2008), SoFIEHealth data and processing systems. University of Otago, Wellington.
Davis, K. et al (2005), Health and productivity among U.S. workers, The Commonwealth Fund, August.
Department of Labour (2006), 45 Plus: Choices in the labour market, November.
Hsiao, C. (2003), Analysis of panel data: Second edition.
MBF Foundation (2007), The high price of pain: The economic impact of persistent pain in Australia, November.
Richardson K., Carter K., and Hayward M. (2008) SoFIEHealth data and processing systems. SoFIEHealth report 1. University of Otago, Wellington.
Statistics New Zealand (2005), Survey of family, income and expenditure estimation specifications, November. Statistics New Zealand.
Statistics New Zealand (2005), Survey of family, income and expenditure imputation specifications, November. Statistics New Zealand.
Statistics New Zealand (2005). Survey of family, income and expenditure questionnaires. Statistics New Zealand.
Statistics New Zealand (2008), Data laboratory output guide, March. Access Economics, Sydney.
The Treasury (2008), Social mobility: Annexes for Treasury report, June. Treasury
Wooldridge, J. M. (2002), Econometric analysis of cross section and panel data. MIT Press
Appendix A
Variable name  Variable categories  Notes 

Labour market participation 

Labour force participation at the household interview date. 
Labour market outcome 

Labour market activity at the household interview date. Hours are the average weekly hours a respondent worked whilst employed in the annual reference period. Fulltime hours are 30 hours or more. Unemployed is not employed but actively looking for work. Inactive is not employed and not looking for work. 
Gender 

 
Region of residence 

 
Born in New Zealand 

 
Ethnicity 

Respondents could report more than one ethnicity. Where this occurred, respondents were assigned to a prioritised ethnicity in this order Māori, Pacific Islander, Other, NZ/European. 
Age at interview date    Continuous variable. 
Aged 50 and over 

Age is at the interview date. 
Highest qualification 

Some respondents reported a fall in qualification level between waves. Where this occurred the highest level of qualification was taken in later waves. 
Chronic disease indicator (grouped chronic disease) 

An indicator of chronic disease presence. Those for whom cancer is unknown who do not report any other chronic disease are assumed to not have a chronic disease. 
For each chronic disease 

The eight chronic diseases covered in SoFIE are asthma, high blood pressure, high cholesterol, heart disease, diabetes, stroke, migraine and psychiatric conditions (depression, manic depression or schizophrenia). The question on presence of diseases is only asked in Wave 3. Other than for psychiatric conditions and for a small number of cases where data was missing, disease presence in earlier waves was derived using the presence of disease and the age at diagnosis. As the question was on the age of diagnosis rather than the year, the variables created are not exact. The age at diagnosis for psychiatric conditions is unknown. As disease diagnosis in the survey period is likely to be small, after preliminary analysis of this group it was decided to assume that those with psychiatric conditions in Wave 3 have psychiatric conditions in Waves 1 and 2. 
For each chronic disease (excluding psychiatric conditions) 

Derived using chronic disease diagnosis, the age at the household interview date and, where present, the age of disease diagnosis. This is a proxy for the number of years since diagnosis, as we only know the age at diagnosis not the actual date. The age of diagnosis is not asked for psychiatric conditions (depression, schizophrenia, manic depression) so this variable is not available for this disease. 
Selfrated health 

 
Studying 

Each respondent is defined to have undertaken study if they report one month or more in which they have studied fulltime or parttime towards a formal qualification in the reference period. If a respondent was still at school; reported that they were economically inactive as a result of being a student or studied fulltime for nine or more months, they are classified as students and excluded from the analysis. 
Partner 

 
Children 

A dependent child is one who is under 18 years and not in fulltime employment. 
Benefit 

These include ACC, student allowance payment, IRD payment, Veteran Pension Fund and WINZ benefit payment . It also includes the small number of respondents under 65 who receive NZ Superannuation payments. 
Number of years in employment    Variable to note number of years in paid employment. Derived from the number of weeks in paid employment in the wave and the number of years reported to be in paid work before the first interview (this is assumed to be before the beginning of the annual reference period). If a respondent has at least one week in paid employment in the wave they are counted as having an additional year in paid employment. 
Household income less personal income    Continuous variable which is the log of the consumer price adjusted household income less the consumer price adjusted personal income. Personal income is removed owing to its correlation with labour force participation. There was a small number of respondents with negative personal/household income. This is possible if selfemployment income is negative. As the number with negative income was very small, these were imputed to be zero. One was added to all values to enable logs to be taken. Income was not adjusted to reflect family size/composition. 
Appendix A (continued)
Source  Variable name  Variable categories  Notes 

Household Labour Force Survey  Unemployment rate    Variable to denote national unemployment rate at the month of the household interview given the continuous interviewing method used in SoFIE. 
Cancer registration data  Cancer 

This variable indicates a cancer registration prior to the interview date; determined by the age at the registration compared with the age at the interview date. Cancer information is unknown for nonconsenters and nonmatched consenters. 
Cancer registration data  Cancer by age diagnosed 

Derived using the age at diagnosis from the cancer registration data and the age at the household interview date from SoFIE. 
Variable name  Variable categories  Notes 

Total household income    Continuous variable which is the log of the consumer price adjusted personal income. There was a small number of respondents with negative household income. This is possible if selfemployment income is negative. As the number with negative income was very small, these were imputed to be zero. One was added to all values to enable logs to be taken. Household income was not adjusted to reflect family size/composition. 
Health benefit 

This includes any ACC payments, sickness benefit, incapacity benefit and disability benefit. 
Smoked 

Estimated from whether a respondent currently smokes and, if not, whether they ever have. 
Tenure 

Derived from variable indicating ownership status of home. 
Appendix B
Survey methodology
When SoFIE commenced in 2002 a total of 15,000 households were approached, of whom around 11,500 (77%) agreed to participate. In the initial interview, data was collected from around 22,000 individuals aged 15 and over. All respondents in the original sample (original sample members) are followed over time, even if their household or family circumstances change, forming a longitudinal sample. In later waves new cohabitants of the sample members are interviewed but asked only a reduced set of questions. These additional sample members are not followed if in future waves they no longer live with the original sample member. For these reasons, only original sample members are included in this analysis. All SoFIE interviews are carried out face to face using computer assisted interviewing.[37][38]
Statistics New Zealand provides a longitudinal weight which accounts for nonresponse and aligns the composition of the sample with that of the New Zealand population in October 2002. SoFIE interviews were conducted throughout the year with the sample spread evenly over the 12month wave period. Each respondent is asked about the previous 12 months (their annual reference period). As a result of this continuous interviewing, there are 12 reference periods in each wave. Some variables collected in each wave of SoFIE, such as age, can be measured at the household interview date or at a point in the reference period. Figure B1 shows the relationship between these dates for a hypothetical SoFIE respondent.
At the end of the SoFIE health module respondents were asked to give permission for their data to be linked to information on hospitalisations and cancer registrations held by the New Zealand Health Information Service back to 1990. For those respondents who agreed to the data linkage, and were successfully matched, it was possible to identify those respondents who are listed on the Cancer Register as having been diagnosed with cancer.[39] As the linked information only goes back to 1990 this is only a measure of recent cancer diagnosis. Where descriptive (prevalence) statistics are presented where only the linked sample is used, adjusted weights were used to realign the sample with the population (adjusted longitudinal weight) as opposed to the weights provided by Statistics New Zealand (standard longitudinal weights).[40]
Population and sample of interest
The questionnaire is only asked to those aged 15 and over. To ensure there is full information on respondents in all waves, the analysis is focused on those aged 15 and over at the end of the reference period in Wave 1 who remain eligible and respond in all three waves of the survey (adult longitudinal respondents). This is the balanced panel made up of 17,615 respondents in Waves 13; an unadjusted attrition rate of 20.5%. Once this is adjusted, to remove those people who move out of the scope of the survey or die, the adjusted attrition rate is 17.2%. Those over working age or who are fulltime students in each wave are excluded from the analysis. The results are therefore representative of the usual adult resident population of New Zealand who lived in private dwellings on the main islands of New Zealand in 2002/03 who are working age nonstudents. Around threequarters of the 17,615 adult longitudinal respondents are working age nonstudents in Waves 1, 2 or 3.[41][42]
Figure B1  SoFIE wave structure
Household is selected for interview  January 2003 Wave 1 (October 2002 to September 2003)
 Household interview date  usually a day in January 2003*
 Annual reference period  January 2002 to December 2002
 Household interview date  usually a day in January 2004*
 Annual reference period  January 2003 to December 2003
 Household interview date  usually a day in January 2005*
 Annual reference period  January 2004 to December 2004
* This date could be later if there are problems contacting respondent or arranging an interview; however, even if this moves into February or March the reference period will not change.
Limitations and strengths of SoFIE
The SoFIE data has a few limitations. As with all surveys, there is potential for nonresponse error  that is, errors because not all potential respondents take part in the survey. Unlike in crosssectional surveys, nonresponse in longitudinal surveys has a second element as respondents can also choose whether to respond in each wave. If this nonresponse (known as attrition) is nonrandom (that is, the characteristics of those who do respond are systematically different from those who do not) then any inferences based on analyses of the data may be biased. In addition, where longitudinal data is linked to other sources, information is only observed for part of the sample (those who agree to the linkage) and these differences could also be nonrandom and potentially bias results. While there are differences in the response, consent and matching rates in SoFIE there are no groups of interest that do not contain any respondents. The weights (both the standard weights provided by Statistics New Zealand and adjusted weights to take account of nonconsenters) go some way to restore the distribution of respondents over the variables of interest and any bias as a result of this should be small when making inferences about the population as a whole.[43] However, it should be remembered that as a longitudinal survey, those who are most unhealthy will die or move into institutions where they may not be able to be traced, meaning that the SoFIE population is likely to be healthier than the wider New Zealand population it represents.
A further limitation is that not all variables are available in all waves. An indicator for psychiatric conditions is only available in Wave 3 and an indicator for cancer is only available for the subset of respondents who agreed for their data to be matched to the Cancer Registrations database and were successfully linked. This potentially reduces the sample size considerably if only Wave 3 matched consenters are considered. Making an assumption about the presence of psychiatric conditions for Waves 1 and 2 and coding the nonconsenters' cancer status as “unknown” rather than missing goes someway to countering this problem, allowing analysis to be undertaken on all three waves rather than the restricted sample.
While SoFIE is a longitudinal survey, there are only currently three waves of information. While this provides a wealth of information for variables that do not change very frequently, such as diagnosis of new diseases, modelling the impact of these variables with such a short span of data is difficult.
Lastly, if dependants of respondents have ill health or chronic diseases this may also affect the respondent's labour market participation. The SoFIE questionnaire does not allow “carers” to be identified except when the ill health of a family member is given as a reason for inactivity. In addition, when people do report the ill health of a family member as a reason for inactivity the cause of ill health cannot be identified or attributed to a specific chronic disease or illness. The effect of this on labour market participation is therefore not explored in this analysis.
Despite its limitations, SoFIE collects a wealth of information on respondents over time. This allows a range of labour market transitions, durations and repeat occurrences of respondents to be analysed. It allows comparison of labour market activity and disease presence at more than one point in time. Further, attempts to account for the presence of unobserved variables can be made given that the same respondent is being monitored over time. The linking of SoFIE data to cancer and hospitalisation information adds further depth to the SoFIE data and this additional information is subject to less reporting error than additional questioning of respondents.
While there are differences in response and consent rates by respondent characteristics, for a longitudinal survey of this kind the response and consent rates are high by international standards.
Notes
 [37]Full details of the sampling design for SoFIE can be found here: http://www2.stats.govt.nz/domino/external/pasfull/pasfull.nsf/84bf91b1a7b5d7204c256809000460a4/4c2567ef00247c6acc256fab0082e7fc?OpenDocument. There was no formal oversampling of specific groups; however, stratification was used in the first stage of the sample selection to try to ensure sufficient representation in the survey from specific groups. The strata were defined according to region; urban/rural; high/low Māori population density and other socioeconomic variables derived from the most recent census.
 [38]The full SoFIE questionnaire can be found here: http://www2.stats.govt.nz/domino/external/quest/sddquest.nsf/12df43879eb9b25e4c256809001ee0fe/14d945bb95ab2bbbcc256fb70077b3bb?OpenDocument.
 [39]Around 80% of all SoFIE respondents agreed for their data to be linked. Of these, 97% were linked successfully.
 [40]More information on the adjusted weight is available from the author.
 [41]Those respondents with a missing value for any of the variables of interest in a particular wave are excluded from the models for data based on that wave. The number of missing values is small and analysis indicates they appear to be random.
 [42]Respondents can change status with regard to being a student or moving out of working age over the survey period. Therefore there are not always three responses for each respondent in the analysis even though the balanced panel is the starting point for the analysis (ie, the student/working age values criteria make the panel unbalanced).
 [43]More information on sample attrition and consent in SoFIE and the adjusted weights is available from the author.
Appendix C
Methods
Pooled logistic regressions
Initially, binomial logistic regression models were fitted to the data to quantify the relationship between the presence of different chronic diseases and labour force participation and between selfrated health and labour force participation, while holding all other variables constant. In the standard pooled regression models, responses in each wave were pooled together to form one large sample. Therefore each respondent had up to three responses in the sample. The fact that observations from the same person in different waves were not independent of each other, and therefore the error terms in the model were likely to be correlated, was accounted for by treating people as clusters.
A binomial logistic regression model is suitable as the dependent variable (L) is a binary response variable equal to one for those respondents who are participating and zero for those who are not participating (the latter was the reference category when a binomial logistic regression was carried out). The form of the equation can be seen in Figure C1. The unemployment rate at the time of the interview was included to reflect the possible differences in participation owing to the economic climate at the interview date. Maximum likelihood estimation was used to estimate the regression coefficients.[44]
A multinomial logistic regression was then fitted to the data to quantify the impact of the presence of diseases on the chance of being in one of the four labour market outcomes while holding all other variables constant. This aimed to determine if the impact of the presence of each disease was consistent across each labour market outcome. As there are more than two response categories in the dependent variable there is now more than one logistic regression model. Each model is the same as that in Figure C1 with the L indicator replaced with indicators for fulltime, parttime and unemployed (L_{FTi}, L_{PTi} and L_{Ui} respectively), with the reference category being those who are inactive. The formula for the probability of success in each case is similar to that for the binomial logistic regression but with the denominator being the sum of the odds of success across each of the three response categories (excluding the reference category).
The main limitation of standard binomial and multinomial logistic regressions is that they do not allow for endogeneity. In other words they assume that the explanatory variables are exogenous; that is, their values are not affected by labour force participation or by other unobserved characteristics. However, this assumption may not be strictly true for any generic health measure (H_{i}) and the failure to account for endogeneity means that any significant relationships that are established are associations and do not imply causality; for instance, the fact that the model may prove a relationship between the dependent and predictor variables does not mean that the predictor variables caused the outcome (Tabachnick and Fidell, 2001)
Figure C1  Form of binomial logistic regression model
where:
L_{i} = a binary response variable for participation for the
th person equal to one if participating and zero otherwise
1(.) = an indicator function that takes the value one or zero according to whether the value in parentheses is true or false
= a vector of regression coefficients
CD_{i} = a vector of chronic disease indicators
X_{i} = a vector of explanatory variables
u_{i} = error term associated with person
= odds of success
Note: The relationship between the responses for each person in the different waves (ie, time = 1, 2 or 3) is accounted for by identifying people as clusters.
Fixed and random effects panel logistic regression^{[45]}
While there were a number of control variables included in the standard pooled regressions, there may be some important individual characteristics that were not observed. The unobserved variables may significantly influence participation; they may influence (or be correlated) with ill health; or they may influence both of these. When the omitted variables are correlated with health, the estimates of the relationship between health and participation from the pooled regression model will be biased because the error term in the model will be correlated with the health variable (that is, health is endogenous, not exogenous, therefore violating an assumption of the logistic regression analysis).
One advantage of SoFIE is its panel aspect; that is, there are up to three observations per person. This opens up the prospect of fixed or random effects panel models to allow for timeconstant unobserved heterogeneity. A fixed effects model exploits the panel nature of the data to determine how health shocks (changes in health) over time relate to changes in labour force participation allowing for timeinvariant omitted variables that may be correlated with the explanatory variables (ie, the endogenous health). The fixed effects model is derived from the starting equation in Figure C2. The error term from the standard pooled regression model u_{i} now has a time dimension and is made up of two components. These are α_{i}, the timeconstant unobserved variables for the ith person which may or may not be correlated with H_{it}, and the error term ε_{it}, which includes the true error and any unobserved variables that are timevarying. It is assumed that the timevariant unobserved variables are not correlated with the explanatory variables so that the error term, ε_{it}, is not correlated with L_{it} or H_{it}. Conditional logistic analysis differs from regular logistic regression in that data are grouped (with those who exhibit no changes in the outcome variable over the periods considered dropped) and the likelihood is calculated relative to each other group; that is, a conditional likelihood is used. The conditional likelihoods do not involve α_{i}, so they do not need to be estimated (Stata, 2007). The model compares changes in the covariates with a change in the dependent variable. The coefficients indicate the relationship between a change in that covariate and the chance of participating. One drawback of the fixed effects model is that it removes all explanatory variables from the model which are timeinvariant; for example, gender.[46] It also drops all respondents for whom the dependent variable (labour force participation) did not change over time. This significantly reduced the sample available for analysis.
Figure C2  Initial form of the fixed and standard random effects logistic panel model
where:
L_{it} = a binary response variable for participation for the
th person at time
1(.) = an indicator function that takes the value one or zero according to whether the value in parentheses is true or false
= a vector of regression coefficients
H_{it} = a vector of variables to indicate selfrated health
X_{it} = a vector of explanatory variables
α_{i} = unobserved timeinvariant variables
ε_{it} = idiosyncratic error representing unobserved factors that change over time and affect (Note: α_{i} + ε_{it} = u_{it})
Fixed effects model:
Random effects model:
An alternative way to control for unobserved timeinvariant variables is using a random effects model. The starting form of this model is the same as that presented in Figure C2, however, this time the assumption is that while the unobserved variables influence the dependent variable (labour force participation) they are not correlated with health. This means that the coefficient estimates from the standard pooled regression will not suffer from omitted variable bias, but that the error terms in the model will be serially correlated. The random effects model subtracts a fraction of that time averaged value, where the fraction depends on the variation of the unobserved variables, the variation of the idiosyncratic error and the number of time periods (for more explanation, see Wooldridge, 2006). The advantage of the method is that it includes explanatory and dependent variables that are constant over time. This means that the sample size available for analysis is not reduced as with the fixed effects model and that estimates of the effect of time constant variables are provided. However, the assumption that the omitted variables are not correlated with health is a disadvantage given that the unobserved variables that are correlated with health are of concern. One way to use the random effects model where some of the unobserved time constant variables are thought to be correlated with health is to make an assumption about the relationship between health and the unobserved timeinvariant variables. This is the correlated random effects model. More specifically, as shown in Figure C3 it can be assumed that the expected value of the unobserved variables is equal to a linear function of the average time spent in each health state over the three waves together with a random term representing the unobserved timeinvariant coefficients that are not correlated with health. Substituting this expected value into the starting equation for the fixed effects model results in the remaining unobserved timevariant coefficients being uncorrelated with health. A random effects model can therefore be used.
Figure C3  Equations used in the correlated random effects logistic regression panel model
From Figure C2 the starting form of the fixed effects equation is:
Where:
i = person = 1, ... ., n
t = time = 1, 2, 3
It is assumed that:
where:
j = health state = 1 (excellent), ... ., 5 (poor)
H_{it} = a vector of variables to indicate selfrated health
For each health state
= Proportion of time in the health state
For each person
η = unobserved timeinvariant variables
and Cov(H_{it}, η_{i}) = 0
Combining equations (1) and (2) gives the standard form of the random effects model:
Results for both the fixed and correlated random effects models are presented in this paper. While the fixed and correlated random effect panel model goes further than the standard pooled regression, there are drawbacks. Firstly, the model only accounts for omitted variables that are timeconstant, so any timevariant unobserved effects are in the error term. The assumption is that these timevarying omitted variables are uncorrelated with participation or with any of the explanatory variables. Secondly, while using fixed or correlated random effects models to look at how health changes are related to participation changes within respondents does control for the subjective nature of the selfrated health question (in the sense that some people will consistently be more optimistic in their health rating and some consistently more pessimistic) these models do not control for the other health measurement issues with selfrated health outlined in Section 4.2.2. Thirdly, these models do not allow the feedback effect to be estimated. Finally, an issue with the fixed effects model is that it only looks at how changes in health relate to changes in participation. It does not include estimates of the effect of poor health which possibly prevents a person working in the first place. This average health effect for the three waves is picked up in part in the correlated fixed effects model. However, if the assumption for the random effects model, that the expectation of the correlated unobserved timeinvariant variables is a linear function of the average time in a health state, is incorrect, this model will be flawed.
Notes
 [44]Fitting models separately for each gender was considered. However, for all chronic diseases other than psychiatric conditions the relationship between chronic disease and participation was in the same direction and of the same magnitude irrespective of gender. Further, for each disease the confidence intervals for the coefficients overlapped for male and female. For this reason, and owing to the relatively small numbers with certain diseases such as cancer, it was decided to fit the model for combined genders with interactions included for parameter estimates that appeared to differ by gender. These were psychiatric conditions, social marital status and the presence of children. This approach was continued when considering selfrated health to aid comparability.
 [45]This section draws heavily on unpublished lecture notes by Dean Hyslop.
 [46]Further, it is considered best practice to remove from the model specification all variables that may change over time, but are more or less fixed in reality.
Appendix C (continued)
Standard pooled, fixed and correlated random effects logistic regression with adjusted health measure
While selfreported health may be a more encompassing measure of current health than considering previous diagnosis of individual chronic diseases from which the respondent may no longer suffer symptoms, it is also a more subjective measure and open to bias. Despite the possibility that selfrated health may not completely reflect true health it is still widely used where no alternative measures exist.
Of the problems with selfreported health reported in Section 4.2.2 the one of main concern here is rationalisation bias. This is where individuals who are inactive may report worsethanactual health to justify their inactive labour market state. Disney et al (2003) point out that this may be for selfesteem if nothing else. This bias, if it exists, will cause selfreported health to be correlated with the error term in the labour market participation regression models if unadjusted selfrated health is used as an explanatory variable and result in the relationship between health and participation being overestimated.
One approach to attempt to remove this problem (suggested by Bound et al, 1999 and used by Disney et al, 2003) is to construct an adjusted health measure using personal characteristics and more objective health measures. The relationships between true health and measured selfrated health are shown in Figure C4. This method aims to purge selfrated health of its rationalisation bias and better reflect true health. The adjusted health variable, which is a standardised index derived from equation 3 in Figure C4, is then included in a second model to assess the impact of adjusted health on participation. Using this adjusted health means that, unlike when unadjusted health is used, the error term should no longer be correlated with labour market participation as the rationalisation bias is included in the error term of equation 3 in Figure C4.
Figure C4  Relationship between true health and measured health in each wave
Assume that at time t a person's true health,
, can be modelled using the following equation:
where:
= a vector of regression coefficients
Z_{i} = a vector of objective heath indicators
Y_{i} = a vector of explanatory personal characteristics that may affect health some of which overlap with the explanatory variables in X_{i} in the participation equation
ε_{i} = error term associated with person i
Corr(Z_{i}, ε_{i}) = 0 and Corr(Y_{i}, ε_{i}) = 0
However, health may be measured with error:
where v_{i} = reporting errors
If H_{i} is subject to rationalisation bias then including this in the participation equation will result in biased estimates as v_{i} will not be random and Corr(v_{i}, L_{i}) ≠ 0
Assuming that Corr(v_{i},ε_{i}) = 0
Combining equations (1) and (2) gives:
Where:
u_{i} = v_{i} + ε_{i}
Using a standardised form of the predicted value of H_{i} from (3) to estimate an adjusted health measure should purge health of any rationalisation bias as this bias should be contained in the error term for the model.
To construct the adjusted health measure, each wave of SoFIE was taken in turn and all adult longitudinal respondents considered (ie, even if respondents are over 64 or fulltime students). An ordered logit model was used to predict selfrated health using a vector of personal characteristics (some of which overlap with the personal characteristics used in the participation equation) and a vector of objective health measures. The objective health measures were the presence of various chronic diseases; whether the respondent has ever been a regular smoker; and whether the respondent received any health related benefits in the reference period (which, if the benefit system is effective, should be an indicator of the severity of health problems). The form of this model is similar to that described in Figure C1 but with selfreported health as the dependent variable. There are now numerous outcomes for each of the five selfreported health states which are ordered, so an ordered logit model is used to predict health.
The probability of being in poor health was then predicted for each person using the model results. As in the IFS paper (Disney et al, 2003) these probabilities were then standardised in relation to the average health for that year to form the adjusted health measure (so the mean for each year for all longitudinal respondents was zero and the standard deviation one). This process is conducted independently for each year and results in a health measure for each person relative to that year's average. This adjusted health measure is then included in the standard pooled logit regression and in the fixed and correlated random effects models in place of selfrated health to determine the relationship between this adjusted measure and labour force participation.[47]
This method is similar to an instrumental variable or twostage approach; however, the aim of it is just to purge selfrated health of potential bias rather than using the instruments to account for the unobserved heterogeneity of health. One drawback of using this adjusted health measure in the second model rather than unadjusted health is that interpreting what a unit change in the adjusted health measure equates to in the real world is less intuitive than, say, a change from excellent to poor health when the selfrated health measure is used. However, using this method is worthwhile to see if any relationships between health and participation remain when an adjusted health measure is used.
Instrumental variables/twostage approach and simultaneous equations
An alternative method of controlling for unobserved heterogeneity is to use an instrumental variable approach (also called twostage regression). This approach enables both timevariant and timeinvariant unobserved variables that are correlated with health to be controlled for by instrumenting the endogenous variable, health. In the first equation, health is regressed against all the exogenous variables in the participation equation along with the instrument(s).[48] The second stage uses the predicted values of health in the labour force equation model. While this approach seems attractive, given that it controls for timevariant and invariant unobserved variables it is very difficult to find suitable instruments. For a variable to be a valid instrument it should be correlated with health, but should not affect participation other than through health (ie, it should not belong in the labour force participation equation once health is included) (Wooldridge, 2006). When a valid instrument(s) is found, an equation is said to be identified. A literature review by Currie and Madrian et al (1999) concluded that relatively little research has been devoted to assessing the empirical importance of potential endogeneity bias; however, for those studies that attempt to deal with endogeneity of health using instrumental variables, it is difficult to find compelling sources of identification. The majority of the studies they reviewed relied on arbitrary exclusion restrictions and the resulting estimates were very sensitive to these identification assumptions.[49]
An effort was made to find instruments for selfrated health. Possible candidates available in all three waves were whether a respondent has ever smoked (making an assumption that very few people will have started smoking in the survey period) and whether they had any chronic condition. Both of these would be expected to affect participation only through health. Both of these variables are correlated with selfrated health. If selfrated health was a perfect measure of health then these variables may have been considered to be valid instruments. However, as there are problems with how selfrated health is measured, smoking and chronic disease presence could justifiably be associated with participation outside of selfrated health; perhaps as proxies for healthrelated aspects not measured accurately by selfrated health. While not a valid test of an instrument, basic models indicated that there appeared to be some correlation between both smoking and chronic disease presence and participation above selfrated health. After much consideration it was decided that these were not feasible instruments.[50]
Further to there being possible unobserved heterogeneity the link between health and participation is not necessarily one way. If working affects people's health then there is a feedback effect. This feedback effect could be positive or negative. Working long hours in a stressful environment may lead to poorer health or participation may lead to a higher sense of personal and economic security and thus better health (Laplagne et al, 2007). The leading method for solving simultaneous equations is by instrumental variables (Wooldridge, 2006). If it had been possible to identify the health equation using an instrument then this feedback effect could have been assessed using simultaneous equations. The first equation would be the identified health equation inclusive of participation, and the second the identified participation equation including the health variable.
There are numerous examples of research that has aimed to assess the feedback effect. For example, Stern (1989) used a simultaneous equations approach using a list of symptoms to instrument selfrated health or the presence of a health condition that limits work that can be undertaken. In this paper he used presence of different chronic conditions to identify his health equation. However, the indicator variable of whether any chronic disease was present was not significant in the model on top of selfreported health, so these chronic conditions identify the health equation. In the case of the SoFIE data the same is not true. In contrast to other literature, while Stern found that there was a significant feedback effect, he found this was not large (Currie et al, 1999). In any case, the impact of participation on health is not clearly in any one direction.
As a result of the fact that there does not appear to be any compelling instruments to identify the health equation for all three waves of SoFIE and that, in any case, conclusions can be very sensitive to the instruments chosen, this work does not attempt to adjust for unobserved heterogeneity that is timevariant or to assess the feedback effect.
General model information
For all of the models in this paper the logit model was used as opposed to the probit model; however, this choice is not critical as it has been proven that the two give very similar results (Freese and Scott Long et al, 2006).^{[51]} The models were fitted using Stata Version 9. Analysis was undertaken at the Statistics New Zealand Datalab. All variables and variable categories were included in the model even if they were found not to be significant at the 95% level. This was done for completeness and to aid comparability between models. Residual plots of the models were examined but are not presented as Statistics New Zealand does not release them. Significance in this report is reported at the 95% level unless stated.
All descriptive figures presented in the report are based on weighted data; this is to ensure the figures are representative of the population. While Stata allows sampling weights to be accounted for in basic logit models it is not always possible to allow for these in the more advanced models. Further, while the survey command in Stata enables the sampling design to be taken into account, this survey command cannot be used with more advanced models. In any case, owing to confidentiality, not all the information on the sampling scheme is available which would allow full adjustment.[52] For this reason all models were carried out unweighted and without adjusting for the sampling design. This is likely to make little difference to the magnitude of resulting estimates and lead to the same conclusions.[53]
Notes
 [47]The correlated fixed model included an average health stock measure for each person across all waves as discussed in the methodology section for the random effects model using unadjusted selfrated health.
 [48]If there is only one instrument this is equivalent to regressing health against the instrument.
 [49]In previous studies factors such as physical activity, whether a person has ever smoked, whether the respondent has a health condition or whether they are a heavy drinker have been used to instrument selfrated health.
 [50]Note these variables could have been included in the original participation equation; however, as they are correlated with selfrated health they were excluded. Further, even if they are proxies for timeinvariant unobserved variables these will be removed in the panel models. In any case including them in the pooled regressions only marginally increases the R2and the coefficients for the health variables are largely unchanged.
 [51]The differences in the coefficients from the logit and probit models are owing to different assumptions about the distribution of errors. The magnitude of the coefficients from the logit and probit models is proportional and there is little or no difference in the predicted probabilities.
 [52]Statistics New Zealand provides information that allows the identification of the primary sampling units (PSUs) (geographical areas), secondary sampling units (SSUs) (households) and strata, however, as the total number of PSUs in each strata and the total number of SSUs in each PSU are not currently available to SoFIE users, the survey command in Stata would assume that the PSUs were sampled with replacement from the strata, therefore resulting in the secondary sampling stages being ignored. This means that in the pooled logistic model the fact that the responses for the same person are not independent could not be accounted for.
 [53]The impact of not adjusting for the sampling weight or the survey design is likely to be small. Using the pooled logistic regression model, models were run with and without the weights and accounting and not accounting for the survey design (the SSU and the relationship between the responses of the same person in the different waves could not be accounted for as explained in footnote 39) to get an idea of the impact of not accounting for these factors. There was little difference in the conclusions reached using weighted or unweighted data or data adjusted or unadjusted for the sampling design. Therefore all models’ results presented in this paper are based on unweighted data to aid comparability. Allowing for the sampling weights affects the estimated coefficient and the estimated standard errors (SEs). The weights result in coefficients that are slightly lower than those estimates that don’t allow for the sampling weights. However, the differences are small and lead to the same conclusions being made about the variables that are and are not significant. Accounting for the survey design impacts on the SEs of the estimates rather than the estimates themselves. As would be expected, not accounting for the strata results in SEs that are higher than they otherwise would be. Reversely, not accounting for the PSU clusters results in the SEs being smaller than they otherwise would have been. Not accounting for the strata and the PSU clusters results in SEs that are only very slightly smaller than if these had been adjusted for.
Appendix D
Coefficient  Standard error  P value  95% confidence intervals  

Lower  Upper  
Sex (base=male)  
Female  0.247**  0.104  0.017  0.452  0.043 
Region (base=Auckland)  
Waikato  0.134  0.088  0.128  0.038  0.306 
Wellington  0.004  0.076  0.960  0.144  0.152 
Rest of North Island  0.024  0.067  0.723  0.155  0.108 
Canterbury  0.066  0.075  0.381  0.081  0.213 
Rest of South Island  0.068  0.077  0.378  0.220  0.083 
Born in New Zealand (base=yes)  
No  0.111*  0.067  0.096  0.243  0.020 
Ethnicity (base=NZ/European)  
Māori  0.135**  0.066  0.042  0.265  0.005 
Pacific Islander  0.032  0.109  0.771  0.244  0.181 
Other  0.170*  0.101  0.093  0.368  0.028 
Age at interview date  0.108***  0.005  0.000  0.118  0.098 
Aged 50 and over (base=1549)  
Aged 50 and over  6.510***  0.670  0.000  5.198  7.823 
Highest qualification (base=school qualification)  
Postschool vocational qualification  0.190***  0.057  0.001  0.079  0.301 
Degree or higher  0.840***  0.078  0.000  0.687  0.992 
No qualification  0.414***  0.063  0.000  0.537  0.291 
Chronic disease presence (base=no (or u/k) chronic diseases)  
One or more known chronic diseases  0.378***  0.045  0.000  0.467  0.289 
Studying (base=no studying)  0.357***  0.056  0.000  0.466  0.248 
Other household income  0.007  0.006  0.241  0.018  0.005 
Partner (base=working partner)  
Nonworking partner  1.396***  0.102  0.000  1.596  1.196 
No partner  1.053***  0.102  0.000  1.253  0.854 
Children (base=no children)  
Child(ren) minimum age 0  0.634***  0.158  0.000  0.325  0.943 
Child(ren) minimum age 517  0.239**  0.107  0.025  0.448  0.029 
Years paid employment  0.182***  0.009  0.000  0.165  0.199 
Years paid employment squared  0.001***  0.000  0.000  0.001  0.001 
Unemployment rate  0.124***  0.031  0.000  0.185  0.063 
Interactions  
Female*Child(ren) minimum age 0  2.925***  0.166  0.000  3.251  2.599 
Female*Child(ren) minimum age 517  0.254**  0.123  0.038  0.495  0.014 
Female*Nonworking partner  0.084  0.147  0.567  0.204  0.372 
Female*No partner  0.369***  0.115  0.001  0.144  0.594 
Aged 50 and over*Age  0.130***  0.013  0.000  0.155  0.106 
Constant  5.024***  0.224  0.000  4.586  5.462 
Model summary statistics  Coefficient 

Number of observations  39,310 
Number of unique respondents (clusters)  13,940 
Chisquared  3,401.23 
Loglikelihood  13,178.71 
Pseudo R^{2}  0.2968 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. Based on original sample members with responses in all three waves who are aged over 15 at the end of the reference period in Wave 1. Fulltime students and those 65 years of age and over are excluded. Responses in each wave are included in the model separately. The relationship between person responses in each wave was accounted for by defining the people as clusters. The number of observations in each wave is not equal owing to the small number of missing values for variables of interest in certain waves or owing to student/retirement status changing between waves. All variables were included in the model and significant and insignificant variables or variable categories are kept in for completeness.
2. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
3. Psychiatric conditions include depression, manic depression and schizophrenia.
4. The likelihood of labour market participation was modelled. Not participating is the base category.
Mean  Standard deviation  

Labour force participation (participation=1, not participating=0)  0.816  0.387 
Labour market outcome (fulltime=0, parttime=1, unemployed=2, inactive=3)  0.758  1.153 
Gender (male=0, female=1)  0.538  0.499 
Region (base=Auckland)  
Waikato (=1)  0.089  0.285 
Wellington (=1)  0.135  0.342 
Rest of North Island (=1)  0.217  0.412 
Canterbury (=1)  0.162  0.368 
Rest of South Island (=1)  0.143  0.350 
Born in NZ (yes=1, no=0)  0.198  0.398 
Ethnicity (base=NZ/European)  
Māori (=1)  0.117  0.321 
Pacific Islander (=1)  0.046  0.209 
Other (=1)  0.067  0.250 
Age at interview date  42.250  12.242 
Age 50 and over (1549=0, 50 and over=1)  0.311  0.463 
Highest Qualification (base=school qualification)  
Postschool vocational qualification (=1)  0.371  0.483 
Degree or higher (=1)  0.161  0.367 
No qualification (=1)  0.212  0.409 
Asthma (asthma=1, no asthma=0)  0.186  0.389 
High blood pressure (High blood pressure=1, no high blood pressure=0)  0.163  0.370 
High cholesterol (High cholesterol=1, no high cholesterol=0)  0.140  0.347 
Heart disease (Heart disease=1, no heart disease=0)  0.032  0.177 
Diabetes (diabetes=1, no diabetes=0)  0.033  0.177 
Stroke (stroke=1, no stroke=0)  0.011  0.105 
Migraine (migraine=1, no migraine=0)  0.140  0.347 
Psychiatric conditions (Psychiatric conditions=1, no psychiatric conditions=0)  0.103  0.304 
Cancer (base=no cancer)  
Cancer (=1)  0.029  0.169 
Unknown (=1)  0.235  0.424 
Studying (no studying in reference period=0, studying in reference period=1)  0.119  0.323 
Other household income  8.398  4.083 
Partner (base=working partner)  
Nonworking partner (=1)  0.113  0.317 
No partner (=1)  0.308  0.462 
Children (base=no children)  
Child(ren) minimum age 0  0.161  0.367 
Child(ren) minimum age 517  0.272  0.445 
Years paid employment  22.116  12.399 
Unemployment rate  4.174  0.531 
Number of observations  39,310 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. Based on original sample members with responses in all three waves who are aged over 15 at the end of the reference period in Wave 1. Fulltime students and those 65 years of age and over are excluded. Data for all three waves is pooled together to create an average rate.
2. Psychiatric conditions include depression, manic depression and schizophrenia.
Appendix D (continued)
Coefficient  Standard error  P value  95% confidence intervals  

Lower  Upper  
Sex (base=male)  
Female  0.383***  0.106  0.000  0.592  0.174 
Region (base=Auckland)  
Waikato  0.135  0.088  0.127  0.038  0.308 
Wellington  0.018  0.076  0.813  0.130  0.166 
Rest of North Island  0.015  0.068  0.827  0.147  0.118 
Canterbury  0.103  0.076  0.173  0.045  0.252 
Rest of South Island  0.077  0.078  0.323  0.229  0.075 
Born in New Zealand (base=yes)  
No  0.118*  0.067  0.081  0.250  0.014 
Ethnicity (base=NZ/European)  
Māori  0.139**  0.067  0.038  0.270  0.008 
Pacific Islander  0.031  0.109  0.774  0.182  0.245 
Other  0.149  0.102  0.142  0.348  0.050 
Age at interview date  0.101***  0.005  0.000  0.111  0.091 
Aged 50 and over (base=1549)  
Aged 50 and over  6.669***  0.676  0.000  5.343  7.994 
Highest qualification (base=school qualification)  
Postschool vocational qualification  0.194***  0.057  0.001  0.083  0.305 
Degree or higher  0.812***  0.078  0.000  0.660  0.965 
No qualification  0.392***  0.063  0.000  0.515  0.268 
Asthma (base=no asthma)  0.090  0.055  0.102  0.198  0.018 
High blood pressure (base=no high blood pressure)  0.169***  0.064  0.008  0.294  0.045 
High cholesterol (base=no high cholesterol)  0.080  0.071  0.257  0.218  0.058 
Heart disease (base=no heart disease)  0.662***  0.120  0.000  0.898  0.426 
Diabetes (base=no diabetes)  0.553***  0.117  0.000  0.782  0.324 
Stroke (base=no stroke)  0.897***  0.181  0.000  1.253  0.541 
Migraine (base=no migraine)  0.043  0.064  0.501  0.168  0.082 
Psychiatric conditions (base=no psychiatric conditions)  1.207***  0.115  0.000  1.433  0.981 
Cancer (base=no cancer)  
Cancer  0.068  0.129  0.598  0.321  0.185 
Unknown  0.129**  0.053  0.016  0.234  0.024 
Studying (base=no studying)  0.355***  0.056  0.000  0.464  0.245 
Other household income  0.010*  0.006  0.090  0.021  0.002 
Partner (base=working partner)  
Nonworking partner  1.384***  0.105  0.000  1.589  1.179 
No partner  1.015***  0.103  0.000  1.216  0.813 
Children (base=no children)  
Child(ren) minimum age 0  0.589***  0.160  0.000  0.276  0.902 
Child(ren) minimum age 517  0.304***  0.107  0.004  0.513  0.095 
Years paid employment  0.179***  0.009  0.000  0.162  0.196 
Years paid employment squared  0.001***  0.000  0.000  0.001  0.001 
Unemployment rate  0.129***  0.031  0.000  0.190  0.068 
Interactions  
Female*Psychiatric conditions  0.701***  0.137  0.000  0.432  0.971 
Female*Child(ren) minimum age 0  2.863***  0.169  0.000  3.193  2.533 
Female*Child(ren) minimum age 517  0.203  0.123  0.101  0.444  0.039 
Female*Nonworking partner  0.085  0.149  0.567  0.206  0.377 
Female*No partner  0.367***  0.115  0.001  0.141  0.593 
Aged 50 and over*Age  0.134***  0.013  0.000  0.159  0.109 
Constant  4.965***  0.225  0.000  4.523  5.406 
Model summary statistics  Coefficients 

Number of observations  39,310 
Number of unique respondents (clusters)  13,940 
Chisquared  3,543.77 
Loglikelihood  12,949.58 
Pseudo R^{2}  0.309 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Note: See footnotes on Table D1.
Coefficients  

Fulltime  Parttime  Unemployed  
Sex (base=male)  
Female  0.503***  0.856***  0.608*** 
Region (base=Auckland)  
Waikato  0.128  0.116  0.347** 
Wellington  0.014  0.008  0.314** 
Rest of North Island  0.098  0.065  0.389*** 
Canterbury  0.015  0.214**  0.211 
Rest of South Island  0.122  0.064  0.234 
Born in New Zealand (base=yes)  0.125*  0.107  0.248* 
Ethnicity (base=NZ/European)  
Māori  0.105  0.308***  0.441*** 
Pacific Islander  0.123  0.365***  0.202 
Other  0.184*  0.208*  0.376** 
Age at interview date  0.145***  0.065***  0.052*** 
Aged 50 and over (base=1549)  
Aged 50 and over  7.442***  4.150***  7.693*** 
Highest qualification (base=school qualification)  
Postschool vocational qualification  0.205***  0.132**  0.306*** 
Degree or higher  1.052***  0.516***  0.298* 
No qualification  0.481***  0.434***  0.304** 
Chronic disease presence (base=no (or u/k) chronic diseases)  
One or more known chronic diseases  0.417***  0.310***  0.130 
Studying (base=no studying)  0.423***  0.243***  0.113 
Other household income  0.014**  0.009  0.018 
Partner (base=working partner)  
Nonworking partner  1.417***  1.120***  1.102*** 
No partner  1.227***  0.534***  0.264 
Children (base=no children)  
Child(ren) minimum age 0  0.505***  0.502**  0.415 
Child(ren) minimum age 517  0.362***  0.201  0.114 
Years paid employment  0.235***  0.137***  0.048*** 
Years paid employment squared  0.001***  0.001***  0.000 
Unemployment rate  0.203***  0.005  0.096 
Interactions  
Female*Child(ren) minimum age 0  3.482***  1.591***  2.324*** 
Female*Child(ren) minimum age 517  0.407***  0.360**  0.388* 
Female*Nonworking partner  0.0569  0.197  0.078 
Female*No partner  0.566***  0.232  0.333 
Aged 50 and over*Age  0.148***  0.083***  0.149*** 
Constant  6.059***  0.595**  0.494 
Model summary statistics  Coefficients 

Number of observations  39,310 
Number of unique respondents (clusters)  13,940 
Chisquared  5,433.48 
Loglikelihood  29,708.28 
Pseudo R^{2}  0.2301 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. See footnotes 13 Table D1.
2. The likelihood of different labour market outcomes was modelled. Inactive is the base category.
Appendix D (continued)
Coefficients  

Fulltime  Parttime  Unemployed  
Sex (base=male)  
Female  0.641***  0.749***  0.754*** 
Region (base=Auckland)  
Waikato  0.128  0.120  0.323* 
Wellington  0.003  0.011  0.320** 
Rest of North Island  0.094  0.077  0.392*** 
Canterbury  0.022  0.250***  0.217 
Rest of South Island  0.134  0.058  0.237 
Born in New Zealand (base=yes)  0.129**  0.115  0.230* 
Ethnicity (base=NZ/European)  
Māori  0.109  0.311***  0.462*** 
Pacific Islander  0.201*  0.322**  0.301 
Other  0.154  0.206*  0.433** 
Age at interview date  0.138***  0.06***  0.048*** 
Aged 50 and over (base=1549)  
Aged 50 and over  7.698***  4.226***  7.655*** 
Highest qualification (base=school qualification)  
Postschool vocational qualification  0.21***  0.136**  0.307*** 
Degree or higher  1.025***  0.498***  0.28* 
No qualification  0.451***  0.421***  0.312** 
Asthma (base=no asthma)  0.080  0.114*  0.039 
High blood pressure (base=no high blood pressure)  0.164**  0.183**  0.172 
High cholesterol (base=no high cholesterol)  0.088  0.080  0.003 
Heart disease (base=no heart disease)  0.619***  0.635***  0.875*** 
Diabetes (base=no diabetes)  0.700***  0.361***  0.015 
Stroke (base=no stroke)  1.119***  0.492**  0.808** 
Migraine (base=no migraine)  0.080  0.011  0.229* 
Psychiatric conditions (base=no psychiatric conditions)  1.328***  0.751***  0.597*** 
Cancer (base=no cancer)  
Cancer  0.056  0.068  0.188 
Unknown  0.186***  0.017  0.237** 
Studying (base=no studying)  0.420***  0.241***  0.119 
Other household income  0.017***  0.007  0.020* 
Partner (base=working partner)  
Nonworking partner  1.411***  1.106***  1.082*** 
No partner  1.196***  0.513***  0.243 
Children (base=no children)  
Child(ren) minimum age 0  0.451***  0.460**  0.367 
Child(ren) minimum age 517  0.434***  0.253*  0.048 
Years paid employment  0.232***  0.136***  0.046*** 
Years paid employment squared  0.001***  0.001***  0.000 
Unemployment rate  0.210***  0.008  0.089 
Interactions  
Female*Psychiatric conditions  0.694***  0.364**  0.554** 
Female*Child(ren) minimum age 0  3.416***  1.539***  2.283*** 
Female*Child(ren) minimum age 517  0.355***  0.399***  0.331 
Female*Nonworking partner  0.065  0.200  0.073 
Female*No partner  0.580***  0.226  0.318 
Aged 50 and over*Age  0.154***  0.085***  0.149*** 
Constant  6.017***  0.538**  0.438 
Model summary statistics  Coefficients 

Number of observations  39,310 
Number of unique respondents (clusters)  13,940 
Chisquared  5,601.24 
Loglikelihood  29,422.78 
Pseudo R^{2}  0.2375 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
 See footnotes 13 Table D1.
 The likelihood of different labour market outcomes was modelled. Inactive is the base category.
Appendix E
Mean  Standard deviation  

Selfrated health (base=excellent)  
Very good  0.341  0.474 
Good  0.196  0.397 
Fair  0.055  0.229 
Poor  0.015  0.123 
Number of observations  39,310 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. Based on original sample members with responses in all three waves who are aged over 15 at the end of the reference period in Wave 1. Fulltime students and those 65 years of age and over are excluded. Data for all three waves is pooled together to create an average rate. The sample is restricted so it is the same as that considered in the models with individual chronic diseases (ie, those with missing indicators of chronic diseases are excluded from this analysis).
2. The means and standard deviations for the nonhealth variables are the same as those from the individual disease models (Table D2).
Coefficient  Standard error  P value  95% confidence intervals  

Lower  Upper  
Sex (base=male)  
Female  0.407***  0.104  0.000  0.611  0.204 
Region (base=Auckland)  
Waikato  0.163*  0.090  0.068  0.012  0.339 
Wellington  0.006  0.075  0.935  0.141  0.153 
Rest of North Island  0.013  0.068  0.843  0.120  0.147 
Canterbury  0.088  0.075  0.245  0.060  0.236 
Rest of South Island  0.007  0.078  0.927  0.146  0.161 
Born in New Zealand (base=yes)  
No  0.082  0.068  0.224  0.215  0.050 
Ethnicity (base=NZ/European)  
Māori  0.079  0.068  0.247  0.212  0.054 
Pacific Islander  0.064  0.108  0.555  0.148  0.276 
Other  0.062  0.100  0.536  0.259  0.135 
Age at interview date  0.097***  0.005  0.000  0.108  0.087 
Aged 50 and over (base=1549)  
Aged 50 and over  6.922***  0.671  0.000  5.607  8.238 
Highest qualification (base=school qualification)  
Postschool vocational qualification  0.197***  0.057  0.001  0.085  0.309 
Degree or higher  0.757***  0.078  0.000  0.605  0.910 
No qualification  0.312***  0.064  0.000  0.437  0.187 
Selfrated health (base=excellent)  
Very good  0.064  0.045  0.156  0.152  0.024 
Good  0.578***  0.052  0.000  0.681  0.475 
Fair  1.440***  0.078  0.000  1.594  1.287 
Poor  2.545***  0.135  0.000  2.809  2.280 
Studying (base=no studying)  0.365***  0.057  0.000  0.477  0.253 
Other household income  0.012**  0.006  0.048  0.023  0.000 
Partner (base=working partner)  
Nonworking partner  1.337***  0.104  0.000  1.540  1.133 
No partner  1.016***  0.102  0.000  1.217  0.815 
Children (base=no children)  
Child(ren) minimum age 0  0.560***  0.161  0.000  0.245  0.875 
Child(ren) minimum age 517  0.284***  0.107  0.008  0.494  0.074 
Years paid employment  0.178***  0.009  0.000  0.160  0.195 
Years paid employment squared  0.001***  0.000  0.000  0.001  0.001 
Unemployment rate  0.139***  0.032  0.000  0.201  0.078 
Interactions  
Female*Child(ren) minimum age 0  2.858***  0.170  0.000  3.191  2.526 
Female*Child(ren) minimum age 517  0.209*  0.124  0.091  0.451  0.033 
Female*Nonworking partner  0.082  0.149  0.581  0.209  0.373 
Female*No partner  0.434***  0.115  0.000  0.209  0.660 
Aged 50 and over*Age  0.139***  0.013  0.000  0.164  0.114 
Constant  4.945***  0.227  0.000  4.500  5.389 
Model summary statistics  Coefficient 

Number of observations  39,310 
Number of unique respondents (clusters)  13,940 
Chisquared  3,749.79 
Loglikelihood  12,691.30 
Pseudo R^{2}  0.3227 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. Based on original sample members with responses in all three waves who are aged over 15 at the end of the reference period in Wave 1. Fulltime students and those 65 years of age and over are excluded. Responses in each wave are included in the model separately. The relationship between person responses in each wave was accounted for by defining the people as clusters. The number of observations in each wave is not equal owing to the small number of missing values for variables of interest in certain waves or owing to student/retirement status changing between waves. All variables were included in the model and significant and insignificant variables or variable categories are kept in for completeness.
2. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
3. The sample is restricted so it is the same as that considered in the models with individual chronic diseases (ie, those with missing indicators of chronic diseases are excluded from this analysis).
4. The likelihood of labour market participation was modelled. Not participating is the base category.
Appendix E (continued)
Coefficients  

Fulltime  Parttime  Unemployed  
Sex (base=male)  
Female  0.698***  0.703***  0.727*** 
Region (base=Auckland)  
Waikato  0.163*  0.136  0.339** 
Wellington  0.008  0.007  0.301** 
Rest of North Island  0.055  0.093  0.389*** 
Canterbury  0.011  0.229***  0.206 
Rest of South Island  0.038  0.126  0.214 
Born in New Zealand (base=yes)  0.096  0.083  0.260* 
Ethnicity (base=NZ/European)  
Māori  0.042  0.264***  0.456*** 
Pacific Islander  0.238*  0.286**  0.251 
Other  0.061  0.120  0.411** 
Age at interview date  0.133***  0.057***  0.048*** 
Aged 50 and over (base=1549)  
Aged 50 and over  7.929***  4.588***  7.971*** 
Highest qualification (base=school qualification)  
Postschool vocational qualification  0.217***  0.137**  0.307*** 
Degree or higher  0.963***  0.453***  0.267 
No qualification  0.367***  0.356***  0.338*** 
Selfrated health (base=excellent)  
Very good  0.078  0.026  0.036 
Good  0.665***  0.468***  0.035 
Fair  1.750***  0.944***  0.621*** 
Poor  2.914***  1.975***  1.235*** 
Studying (base=no studying)  0.436***  0.250***  0.115 
Other household income  0.019***  0.005  0.021* 
Partner (base=working partner)  
Nonworking partner  1.368***  1.068***  1.069*** 
No partner  1.208***  0.522***  0.271 
Children (base=no children)  
Child(ren) minimum age 0  0.424**  0.428**  0.347 
Child(ren) minimum age 517  0.414***  0.248*  0.063 
Years paid employment  0.23***  0.135***  0.049*** 
Years paid employment squared  0.001***  0.001***  0.000 
Unemployment rate  0.225***  0.019  0.089 
Interactions  
Female*Child(ren) minimum age 0  3.417***  1.533***  2.267*** 
Female*Child(ren) minimum age 517  0.365***  0.399***  0.349 
Female*Nonworking partner  0.070  0.204  0.074 
Female*No partner  0.656***  0.163  0.398* 
Aged 50 and over*Age  0.158***  0.092***  0.155*** 
Constant  6.019***  0.577**  0.503 
Model Summary Statistics  Coefficients 

Number of observations  39,310 
Number of unique respondents (clusters)  13,940 
Chisquared  5,869.14 
Loglikelihood  29,136.51 
Pseudo R^{2}  0.2449 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. See footnotes 13 of Table E2.
2. The likelihood of different labour market outcomes was modelled. Inactive is the base category.
Appendix F
Coefficient  Standard error  P value  95% confidence intervals  

Lower  Upper  
Region (base=Auckland)  
Waikato  0.509  0.411  0.216  1.314  0.297 
Wellington  0.627  0.408  0.125  1.428  0.173 
Rest of North Island  0.118  0.353  0.738  0.809  0.573 
Canterbury  1.023**  0.502  0.041  2.006  0.039 
Rest of South Island  1.075**  0.485  0.027  2.026  0.125 
Age at interview date  0.193**  0.080  0.015  0.037  0.350 
Aged 50 and over (base=1549)  
Aged 50 and over  20.605***  3.405  0.000  13.931  27.279 
Selfrated health (base=excellent)  
Very good  0.028  0.088  0.750  0.144  0.200 
Good  0.078  0.105  0.459  0.284  0.128 
Fair  0.563***  0.153  0.000  0.863  0.263 
Poor  1.503***  0.258  0.000  2.008  0.999 
Other household income  0.010  0.013  0.405  0.035  0.014 
Partner (base=working partner)  
Nonworking partner  1.479***  0.257  0.000  1.983  0.976 
No partner  0.373  0.310  0.228  0.980  0.234 
Children (base=no children)  
Child(ren) minimum age 0  0.163  0.382  0.670  0.912  0.586 
Child(ren) minimum age 517  0.424  0.282  0.133  0.977  0.129 
Unemployment rate  0.054  0.148  0.716  0.344  0.236 
Interactions  
Female*Child(ren) minimum age 0  1.925***  0.434  0.000  2.775  1.075 
Female*Child(ren) minimum age 517  0.409  0.353  0.246  1.101  0.282 
Female*Nonworking partner  0.222  0.338  0.512  0.441  0.885 
Female*No partner  0.055  0.353  0.877  0.748  0.638 
Aged 50 and over*Age  0.427***  0.068  0.000  0.560  0.293 
Model summary statistics  Coefficients 

Number of observations  5,710 
Number of unique respondents (clusters)  1,970 
Chisquared  329.44 
Loglikelihood  1,918.34 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. Based on original sample members with responses in all three waves who are aged over 15 at the end of the reference period in Wave 1. Fulltime students and those 65 years of age and over are excluded. Variables that do not change over time (ie, gender and place of birth), that are little or slow changing (eg, ethnicity and highest qualification) or that could be impacted on by health changes (ie, studying status and years in paid employment) are excluded from these models. Significant and insignificant variables or variable categories are kept in for completeness.
2. The relationship between changes in selfrated health and participation was modelled.
3. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
Coefficient  Standard error  P value  95% confidence intervals  

Lower  Upper  
Sex (base=male)  
Female  2.034***  0.119  0.000  2.268  1.800 
Region (base=Auckland)  
Waikato  0.167  0.117  0.153  0.062  0.395 
Wellington  0.044  0.100  0.659  0.152  0.240 
Rest of North Island  0.012  0.088  0.890  0.184  0.160 
Canterbury  0.198**  0.097  0.040  0.009  0.388 
Rest of South Island  0.112  0.101  0.267  0.086  0.311 
Born in New Zealand (base=yes)  0.422***  0.077  0.000  0.573  0.270 
Age at interview date  0.051***  0.004  0.000  0.043  0.060 
Aged 50 and over (base=1549)  
Aged 50 and over  13.620***  0.695  0.000  12.257  14.983 
Selfrated health (base=excellent)  
Very good  0.003  0.075  0.970  0.143  0.149 
Good  0.069  0.090  0.446  0.246  0.108 
Fair  0.419***  0.130  0.001  0.673  0.165 
Poor  1.058***  0.214  0.000  1.477  0.639 
Average time in health state (base=excellent health)  
Very good  0.157  0.126  0.211  0.404  0.089 
Good  1.648***  0.139  0.000  1.921  1.375 
Fair  3.380***  0.209  0.000  3.790  2.970 
Poor  5.357***  0.365  0.000  6.074  4.641 
Other household income  0.022***  0.007  0.003  0.037  0.008 
Partner (base=working partner)  
Nonworking partner  1.876***  0.130  0.000  2.130  1.621 
No partner  2.122***  0.130  0.000  2.377  1.866 
Children (base=no children)  
Child(ren) minimum age 0  0.250  0.181  0.166  0.104  0.604 
Child(ren) minimum age 517  0.744***  0.131  0.000  1.000  0.487 
Unemployment rate  0.203***  0.042  0.000  0.285  0.122 
Interactions  
Female*Child(ren) minimum age 0  3.407***  0.197  0.000  3.793  3.021 
Female*Child(ren) minimum age 517  0.324**  0.151  0.032  0.621  0.028 
Female*Nonworking partner  0.258  0.178  0.147  0.607  0.091 
Female*No partner  1.289***  0.144  0.000  1.008  1.570 
Aged 50 and over*Age  0.280***  0.013  0.000  0.305  0.255 
Constant  5.728***  0.293  0.000  5.153  6.303 
Model summary statistics  
ln

1.561  0.023  1.515  1.606  

2.182  0.025  2.133  2.232  
ρ  0.591  0.006  0.580  0.602 
Coefficients  

Number of observations  39,310 
Number of unique respondents (clusters)  13,940 
Chisquared  3,909.77 
Loglikelihood  11,994.97 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. Based on original sample members with responses in all three waves who are aged over 15 at the end of the reference period in Wave 1. Fulltime students and those 65 years of age and over are excluded. Variables that are little or slow changing (eg, ethnicity and highest qualification) or that could be impacted on by health changes (ie, studying status and years in paid employment) are excluded from these models. Significant and insignificant variables or variable categories are kept in for completeness.
2. The relationship between changes and stocks of selfrated health and participation was modelled.
3. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
Appendix G
Coefficients  

Wave 1  Wave 2  Wave 3  
Sex (base=male)  
Female  0.133***  0.016  0.124*** 
Region (base=Auckland)  
Waikato  0.271***  0.197***  0.124*** 
Wellington  0.078  0.049  0.310*** 
Rest of North Island  0.193***  0.080  0.182*** 
Canterbury  0.084  0.062  0.167*** 
Rest of South Island  0.279***  0.337***  0.055 
Born in New Zealand (base=yes)  0.147***  0.121**  0.581*** 
Ethnicity (base=NZ/European)  
Maori  0.350***  0.344***  0.090 
Pacific Islander  0.162*  0.230**  0.234*** 
Other  0.312***  0.459***  0.135 
Age at interview date  0.026***  0.130***  0.021*** 
Aged 50 and over (base=1549)  
Aged 50 and over  0.294  0.181  0.008*** 
Highest qualification (base=school qualification)  
Postschool vocational qualification  0.018  0.090**  0.347*** 
Degree or higher  0.342***  0.363***  0.070* 
No qualification  0.317***  0.296***  0.316*** 
Asthma (base=no asthma)  0.485***  0.340***  0.440*** 
High blood pressure (base=no high blood pressure)  0.500***  0.406***  0.522*** 
High cholesterol (base=no high cholesterol)  0.177***  0.507***  0.156*** 
Heart disease (base=no heart disease)  0.917***  0.154***  0.915*** 
Diabetes (base=no diabetes)  1.086***  0.884***  0.975*** 
Stroke (base=no stroke)  0.736***  0.859***  0.519*** 
Migraine (base=no migraine)  0.368***  0.594***  0.386*** 
Psychiatric conditions (base=no psychiatric conditions)  0.953***  0.381***  0.986*** 
Cancer (base=no cancer)  
Cancer  0.418***  0.873***  0.582*** 
Unknown  0.095**  0.460***  0.043 
Studying (base=no studying)  0.170***  0.129**  0.257*** 
Total household income  0.059***  0.109***  0.014 
Partner (base=working partner)  
Nonworking partner  0.158***  0.264***  0.055 
No partner  0.128***  0.100**  0.139*** 
Children (base=no children)  
Child(ren) minimum age 0  0.182***  0.107*  0.133*** 
Child(ren) minimum age 517  0.053  0.017  0.042 
Tenure (base=not owned)  
Owned with mortgage  0.140***  0.006  0.086*** 
Owned outright  0.123**  0.060  0.002 
Years paid employment  0.014***  0.010***  0.052 
Sickness benefit (base=no sickness benefit)  1.286***  0.004  0.868*** 
Smoked (base=never smoked)  0.381***  1.009***  0.344*** 
Interactions  
Female*Psychiatric conditions  0.250**  0.063  0.269** 
Aged 50 and over*Age  0.005  0.024***  0.002 
Cut points  
Cut point 1  0.432  0.013  0.015 
Cut point 2  2.146  1.652  1.748 
Cut point 3  4.025  3.515  3.601 
Cut point 4  6.034  5.437  5.459 
Model summary statistics  
Number of respondents  17,190  17,195  17,355 
Chisquared  3,629.70  3,918.74  3,601.95 
Loglikelihood  19,648.90  19,715.08  20,633.86 
Pseudo R^{2}  0.1084  0.1131  0.1052 
Source: SoFIE Waves 13 Version 4, standard longitudinal weights, Statistics New Zealand
Notes:
 Based on original sample members with responses in all three waves who are aged over 15 at the end of the reference period in Wave 1. Responses in each wave are included in the model separately. The number of observations in each wave is not equal owing to the small number of missing values for variables of interest in certain waves. All variables were included in the model and significant and insignificant variables or variable categories are kept in for completeness.
 *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99& level.
 Psychiatric conditions include depression, manic depression and schizophrenia.
 The likelihood of different selfrated health states was modelled using an ordinal logistic regression model.
Coefficient  Standard error  P value  95% confidence intervals  

Lower  Upper  
Sex (base=male)  
Female  0.421***  0.107  0.000  0.630  0.211 
Region (base=Auckland)  
Waikato  0.196**  0.090  0.029  0.020  0.371 
Wellington  0.041  0.075  0.581  0.106  0.189 
Rest of North Island  0.051  0.068  0.453  0.082  0.183 
Canterbury  0.122  0.075  0.107  0.026  0.269 
Rest of South Island  0.033  0.078  0.675  0.121  0.186 
Born in New Zealand (base=yes)  
No  0.090  0.067  0.180  0.222  0.042 
Ethnicity (base=NZ/European)  
Maori  0.000  0.069  0.997  0.135  0.135 
Pacific Islander  0.101  0.108  0.348  0.110  0.312 
Other  0.053  0.100  0.595  0.249  0.143 
Age at interview date  0.092***  0.005  0.000  0.103  0.082 
Aged 50 and over (base=1549)  
Aged 50 and over  6.488***  0.692  0.000  5.132  7.845 
Highest qualification (base=school qualification)  
Postschool vocational qualification  0.163***  0.056  0.004  0.053  0.274 
Degree or higher  0.716***  0.077  0.000  0.565  0.868 
No qualification  0.249***  0.064  0.000  0.375  0.123 
Health stock  0.837***  0.079  0.000  0.991  0.682 
Studying (base=no studying)  0.381***  0.056  0.000  0.490  0.272 
Other household income  0.016***  0.006  0.006  0.028  0.005 
Partner (base=working partner)  
Nonworking partner  1.266***  0.109  0.000  1.479  1.053 
No partner  0.978***  0.105  0.000  1.184  0.771 
Children (base=no children)  
Child(ren) minimum age 0  0.507***  0.163  0.002  0.188  0.826 
Child(ren) minimum age 517  0.357***  0.108  0.001  0.569  0.145 
Years paid employment  0.178***  0.009  0.000  0.161  0.195 
Years paid employment squared  0.001***  0.000  0.000  0.001  0.001 
Unemployment rate  0.088***  0.032  0.005  0.150  0.026 
Interactions  
Female*Child(ren) minimum age 0  2.782***  0.171  0.000  3.118  2.446 
Female*Child(ren) minimum age 517  0.192  0.125  0.126  0.437  0.054 
Female*Nonworking partner  0.082  0.152  0.590  0.216  0.380 
Female*No partner  0.423***  0.117  0.000  0.193  0.653 
Aged 50 and over*Age  0.131***  0.013  0.000  0.156  0.105 
Constant  4.147***  0.233  0.000  3.690  4.604 
Model summary statistics  Coefficients 

Number of observations  39,270 
Number of unique respondents (clusters)  13,930 
Chisquared  3,333.23 
Loglikelihood  12,723.33 
Pseudo R^{2}  0.3201 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. See footnotes 1 and 2 of Table G1.
2. The likelihood of labour market participation was modelled. Not participating was the base category.
3. The number of observations the model is based on is lower than that for the selfrated health or individual disease models owing to small numbers of missing values for the objective health measures.
Appendix G (continued)
Coefficient  Standard error  P value  95% confidence intervals  

Lower  Upper  
Region (base=Auckland)  
Waikato  0.439  0.414  0.290  1.250  0.373 
Wellington  0.613  0.407  0.132  1.411  0.184 
Rest of North Island  0.169  0.359  0.638  0.872  0.534 
Canterbury  1.006**  0.496  0.042  1.978  0.035 
Rest of South Island  1.126**  0.483  0.020  2.073  0.180 
Age at interview date  0.186**  0.081  0.021  0.028  0.344 
Aged 50 and over (base=1549)  
Aged 50 and over  19.692***  3.449  0.000  12.931  26.452 
Health stock  0.572***  0.140  0.000  0.846  0.298 
Other household income  0.012  0.013  0.340  0.037  0.013 
Partner (base=working partner)  
Nonworking partner  1.462***  0.266  0.000  1.982  0.942 
No partner  0.347  0.316  0.272  0.966  0.273 
Children (base=no children)  
Child(ren) minimum age 0  0.036  0.382  0.925  0.784  0.712 
Child(ren) minimum age 517  0.378  0.283  0.182  0.934  0.177 
Unemployment rate  0.039  0.150  0.795  0.332  0.254 
Interactions  
Female*Child(ren) minimum age 0  2.029***  0.433  0.000  2.877  1.180 
Female*Child(ren) minimum age 517  0.419  0.354  0.236  1.112  0.274 
Female*Nonworking partner  0.225  0.346  0.515  0.453  0.904 
Female*No partner  0.049  0.359  0.891  0.753  0.655 
Aged 50 and over*Age  0.409***  0.069  0.000  0.544  0.274 
Model summary statistics  Coefficient 

Number of observations  5,575 
Number of unique respondents (clusters)  1,925 
Chisquared  302.90 
Loglikelihood  1,882.69 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
1. Based on original sample members with responses in all three waves who are aged over 15 at the end of the reference period in Wave 1. Fulltime students and those 65 years of age and over are excluded. Variables that do not change over time (ie, gender and place of birth), that are little or slow changing (eg, ethnicity and highest qualification) or that could be impacted on by health changes (ie, studying status and years in paid employment) are excluded from these models. Significant and insignificant variables or variable categories are kept in for completeness.
2. The relationship between changes in selfrated health and participation was modelled.
3. *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.
Coefficient  Standard error  P value  95% confidence intervals  

Lower  Upper  
Sex (base=male)  
Female  1.913***  0.118  0.000  2.145  1.681 
Region (base=Auckland)  
Waikato  0.224*  0.116  0.054  0.004  0.452 
Wellington  0.113  0.100  0.257  0.083  0.308 
Rest of North Island  0.036  0.087  0.680  0.135  0.208 
Canterbury  0.230**  0.096  0.017  0.041  0.419 
Rest of South Island  0.106  0.101  0.293  0.091  0.303 
Born in New Zealand (base=yes)  0.482***  0.077  0.000  0.633  0.331 
Age at interview date  0.053***  0.004  0.000  0.045  0.062 
Aged 50 and over (base=1549)  
Aged 50 and over  12.774***  0.698  0.000  11.406  14.142 
Health stock  0.560***  0.120  0.000  0.797  0.324 
Average health stock  1.148***  0.129  0.000  1.400  0.896 
Other household income  0.026***  0.007  0.000  0.041  0.011 
Partner (base=working partner)  
Nonworking partner  1.710***  0.129  0.000  1.964  1.457 
No partner  2.011***  0.129  0.000  2.265  1.757 
Children (base=no children)  
Child(ren) minimum age 0  0.226  0.181  0.211  0.128  0.580 
Child(ren) minimum age 517  0.794***  0.130  0.000  1.049  0.539 
Unemployment rate  0.162***  0.041  0.000  0.243  0.080 
Interactions  
Female*Child(ren) minimum age 0  3.388***  0.197  0.000  3.773  3.002 
Female*Child(ren) minimum age 517  0.352**  0.150  0.019  0.647  0.058 
Female*Nonworking partner  0.326*  0.177  0.066  0.673  0.021 
Female*No partner  1.191***  0.143  0.000  0.911  1.470 
Aged 50 and over*Age  0.263***  0.013  0.000  0.288  0.238 
Constant  4.363  0.291  0.000  3.793  4.934 
Model summary statistics  
ln

1.569  0.023  1.523  1.614  

2.191  0.025  2.142  2.241  
ρ  0.593  0.006  0.582  0.604 
Coefficients  

Number of observations  39,270 
Number of unique respondents (clusters)  13,925 
Chisquared  3,642.92 
Loglikelihood  12,079.28 
Source: SoFIE Waves 13 Version 4, unweighted, Statistics New Zealand
Notes:
 Based on original sample members with responses in all three waves who are aged over 15 at the end of the reference period in Wave 1. Fulltime students and those 65 years of age and over are excluded. Variables that are little or slow changing (eg, ethnicity and highest qualification) or that could be impacted on by health changes (ie, studying status and years in paid employment) are excluded from these models. Significant and insignificant variables or variable categories are kept in for completeness.
 The relationship between changes and stocks of selfrated health and participation was modelled.
 *Significant at the 90% level. **Significant at the 95% level. ***Significant at the 99% level.