SHA DAC S TATE H EALTH A CCESS D ATA A SSISTANCE C ENTER A Health Data Resource for States April 2002/Issue 5 Overview of Approaches for Estimating Uninsurance Rates at the Sub-state Level Many states now conduct state household respondents. surveys to estimate health insurance Three conditions need to be met in order to coverage and some states have begun explor- obtain high-quality direct estimates of health ing methods to derive coverage estimates insurance coverage. First, the instrument for different populations, specifically for used to measure the concept should be valid. geographic areas (regions, counties) and For an instrument to be valid the survey items racial/ethnic sub-populations within their need to do a good job of determining whether borders. The purpose of this issue brief is to or not someone has health insurance coverage. highlight three approaches that have been Second, each member of the population of used to estimate uninsurance rates at the interest should have a known probability of sub-state level. We provide an overview of the selection into the sample. For example, if you conceptual and methodological issues are conducting a survey of 500 people and you involved in estimating uninsurance rates at draw a simple random sample from a popula- the sub-state level, assess the relative tion list that includes all 5000 people in the strengths and weaknesses of each approach county, then each person’s probability of and conclude with a list of resources useful selection would be 10 percent. The final con- to readers interested in learning more about dition that needs to be met in order to derive small-area estimation. direct estimates is to have a large enough DIRECT APPROACH THROUGH sample size. A good rule of thumb is that the SURVEY DESIGN AND SAMPLING equivalent of 100 simple random sample cases The direct approach to estimating health are needed for each population of interest. insurance coverage within a small area (e.g., Although direct estimates provide the most a county or city) can be characterized by two defensible estimates, they are also the most features: (1) Use of a measurement instru- costly to produce. Indeed, the cost of produc- ment (e.g. state survey) to directly measure ing high-quality direct estimates for small health insurance coverage, and (2) measure- areas is often prohibitive. When at least one ments from a sample of people drawn from of the three conditions to derive direct esti- the actual population of interest (e.g., the mates is not met, people often turn to one or county or city of interest). For example, to more other approaches, depending on their directly measure health insurance coverage expertise, resources, and data available. The University of Minnesota within a specific county, researchers could simplest of these alternatives is the “proxy School of Public Health construct a survey instrument designed to measure” approach to small area estimation. measure health insurance and draw a sample Sponsored by a grant of people from the county to serve as survey from The Robert Wood Johnson Foundation. PROXY MEASURE APPROACH than those without coverage. To the extent that you include this type of diagnosis in the set of diagnoses The proxy measure approach uses some measure that forming your overall proxy measure of uninsurance, you can serve as a proxy of health insurance coverage to would underestimate the amount of uninsurance in the estimate health insurance coverage, and that proxy county. Although this "self-pay" proxy measure uses measure is generally applied to a proxy population with- data from the county of interest, it is nevertheless a in a county. A commonly used proxy measure of unin- “proxy” population that can be expected to yield an surance uses administrative records from all the hospi- estimate of uninsurance of greater or lesser accuracy. tals within a county to determine the percent of speci- fied discharge diagnoses that were coded as “self-pay.” Finally, because actual insurance coverage is only Specifically, this would entail extracting information on correlated with expected self-pay and is not the same the expected primary source of reimbursement reported thing, use of this proxy measure of coverage can involve on in hospital discharge data sets from all hospitals in error. For example, an individual may be classified as an area for specifically chosen diagnoses. Patients dis- “self-pay” at the time of discharge but receive retroactive charged with one of these specific diagnoses who are Medicaid coverage for this hospital expense later. As classified as “self-pay” (meaning the person, and not an this example shows, using this proxy measure would insurance company or the government, was expected to yield too high an estimated uninsurance rate unless pay the bill) would be designated as being uninsured. some adjustment to it could be made to account for this For example, if 8 percent of all patients with these diag- kind of error. This type of adjustment is difficult to noses were expected to self-pay, then the uninsurance do—and subject to imprecision—with only expected rate in the county could be set at 8 percent as well. primary payment data available. A major strength of this and other proxy measures is the Although proxy measures often have fairly large sample low cost. These data are relatively inexpensive to compile sizes (for example expected payer information on and are routinely collected in a majority of states. discharges from all hospitals within a county for an Moreover, the use of this particular proxy measure entire year), the proxy measures approach is generally avoids the problem of basing estimates on small survey considered a last resort. With proxy measures the samples since generally there will be reasonably large potential for bias is high. If there is nothing else available, numbers of discharges for the selected diagnoses within you may want to consider it. At a minimum, however, a specific geographic area. you should exercise great care in selecting the proxy used, preferably using only those that have been rigor- There are some concerns with bias and measurement ously evaluated for potential bias. error. Not everyone who was discharged from each hospital is going to be a resident of that county, which MODEL-BASED APPROACH can bias the estimate for the referent county. And a given When the sample size within a geographic area is too county's estimated rate of uninsurance can also be biased small, or there are no national or state survey data on from its actual rate because not every patient living in the insurance coverage available, the previously described county will have gone to one of the county’s hospitals. direct estimation is not possible or desirable. Under Furthermore, for the diagnoses selected for use in this these conditions, statisticians and researchers must use analysis it is critically important that the decision to be several sources of data and statistical analyses to develop admitted to a hospital be completely independent of direct and indirect estimates of health insurance coverage. whether one has insurance coverage or not. For example, We illustrate the spectrum of model-based approaches for a given diagnosis with some 'discretion' about the with a “simple model-based approach” and a “complex need to be hospitalized, individuals with insurance model-based approach.” coverage are more likely to be admitted to a hospital SIMPLE MODEL-BASED APPROACH a direct estimate of the number of school-aged children in poverty. Even for counties that have been sampled, The simple modeling approach predicts health however, this direct estimate is usually based on very insurance coverage for a specific geographic area using, small samples. As a result, even if three years of March 1) one or more variables correlated with health insur- CPS information are combined to form one direct esti- ance coverage and, 2) correlation based on data mate, it is still likely to be subject to too large an amount obtained from the geographic area of interest. It then is of sampling error to be of much policy utility if used possible to predict coverage for other geographic areas alone. In addition, only about one-third of counties that do not have a measure of health insurance coverage nation-wide are included in the March CPS sample in by inserting the values of the correlated measures into any given year, and consequently no direct estimate is the models and use this model-based estimate as the possible for the majority of counties in the country. health insurance coverage estimate. To overcome this deficiency, researchers have developed An example of this approach is using unemployment regression models to provide indirect, or synthetic, rates to estimate the level of uninsurance. The use of estimates of a county’s number of school-age children unemployment rates is attractive for two reasons: in poverty. This approach begins by assembling a large 1) unemployment rates are correlated with health data set on all the counties in the entire country that have insurance coverage rates, and 2) unemployment rates been included in the CPS samples. The data collected are available for every county in the United States from for this project come from the CPS itself, on each county’s the Bureau of Labor Statistics and in a timely manner. number of school-age children in poverty, plus Internal If, for example, it was found through statistical analysis Revenue Service (IRS) data on individual tax returns that the uninsurance rate was, on average, 1.5 times the and data from the federal food stamp program, all amount of the unemployment rates across a large num- aggregated to the county level to yield predictors of ber of counties, then in counties, without any direct school-age children in poverty. That is, these predictors measure of uninsurance, an estimate of uninsurance include such county-specific measures as the number of would be 1.5 times the unemployment rate prevailing child exemptions reported by families in poverty in the in the county. With such a simple model it is clearly county, and the number of people receiving food stamps preferable that the counties used to develop the model in the county. These data are then used in regression be as demographically similar as possible, be located models to establish the statistical relationship between within the same state, and be as close as possible to the the expected number of school-age children in poverty counties using the model to predict their uninsurance in each county and the levels of these predictor variables rates. for the county. Importantly, these predictor variables are selected in part because of the feasibility (for the COMPLEX MODEL-BASED APPROACH Census Bureau) of obtaining reasonably up-to-date The pre-eminent example of this model-based values for them for all the counties in the country. Thus approach in current use—unfortunately not for uninsur- it is possible to use these up-to-date predictor values to ance—is the Census Bureau’s Small-Area Income and estimate each county’s number of school-age children Poverty Estimates program (known by it acronym in poverty. Finally, since this regression model has SAIPE). In the SAIPE program, up-to-date estimates of been estimated on a large data set (all counties in the the number of school-age children living in poverty in county with CPS samples), the synthetic or indirect U.S. counties are obtained from a combination of two estimates derived from it are capable of achieving estimates. First, and for those counties that have been reasonably high levels of ‘predictive’ accuracy. sampled by the annual March Supplement to the Current Population Survey (CPS), this survey provides SHA The SAIPE model estimates of school-age children really becomes best to use the direct estimate DAC S TATE H EALTH A CCESS in poverty are formed as a mixture of the direct approach). Thus, complex, difficult to achieve D ATA A SSISTANCE C ENTER estimates (for counties included in the March CPS and/or costly requirements are placed on measure sample) and the model predictions, or indirect proxy and blended-model approaches if they are to estimates. By blending these two estimates together serve the needs of most evaluation uses. in a sophisticated manner that takes into account Model-based estimates could prove considerably the accuracy of each estimate, the resulting blended more useful, were a counterpart to the SAIPE model estimate is better than either direct or model-based estimates for children in poverty ever developed by estimate would be alone. Importantly, they also the Census Bureau for uninsurance in small-areas, provide an estimate for those counties not included producing what might be called Small-Area in the March CPS samples. The other advantages of Uninsurance Rate Estimates (SAURE). They would the SAIPE model estimates are that they can be be based on a large data set, again including all the updated on an annual or biennial schedule; and counties in the country with a sample in the March they can be expected to have less error than using Supplement to the CPS. And they could use many outdated census estimates, the alternative to them. predictor variables available only to the Census The major disadvantage is that the production of Bureau and on a reasonably timely basis. These these model-based estimates requires substantial models are capable of generating estimates with resources. These models must be developed initial- reasonably high predictive accuracy and in a rea- ly and then sonably timely manner. But like the SAIPE model evaluated by highly-trained statisticians; they estimates for children in poverty, these Small-Area require access to large amounts of data, preferably Uninsurance Rate Estimates (SAURE) would have to nationwide, all of which may not be in the public be a three-year average estimate. And this three- domain; and the models themselves must be updat- year time dimension would not accommodate many ed periodically, which also entails large resource evaluation uses, although it might prove satisfactory costs. for less rigorous monitoring purposes. DISCUSSION/CONCLUSION Nonetheless, SHADAC is working with staff at the Desirable levels of accuracy for well-defined sub- Census Bureau to assess the feasibility of estimating populations and specific time periods at the sub- uninsurance rates in small areas using the CPS. state level are obtainable only at very substantial In conclusion, selection of the appropriate estima- costs, since they are achievable only from large- tion approach is not straightforward and requires sample based direct estimates. Conversely, esti- an assessment of the principal strengths and weak- mates using proxy measures are generally possible nesses of each approach (Table 1). Unfortunately, with low resource costs but are very unlikely to pro- each of the previously listed desired properties for vide sufficient accuracy or sensitivity to be useful small-area estimates of uninsurance is achievable for most evaluation purposes. Specifically, the only at the price of steep trade-offs among the oth- proxy measure and model-based approaches in ers. When evaluating the relative merits of the vari- University of Minnesota general will not be sensitive to specific interven- ous approaches described, one must also consider Division of Health Services tions within a geographic area. For example, if a the ease or unease with which the results can be Research and Policy county implements an intervention to increase described. Specifically, it will be important (and insurance coverage, it’s impact will only be difficult) to provide policymakers with an appro- 2221 University Avenue detectable from a model if either: priate understanding of the complex statistical and Suite 345 (1) one or more of the correlates are directly methodological issues associated with the proxy impacted by the intervention itself and hence are direct and model-based approaches. End users of Minneapolis, MN 55414 directly related to uninsurance status (e.g. “self- the information generated by the approaches must Phone 612-624-4802 pay” status for specific diagnoses); or (2) there is a also be informed of the requisite cautions to guard Fax 612-624-1493 significant number of directly measured cases from against over-interpretation of the data. www.shadac.org the area in the blended-model (in which case it IB-05-0202 TABLE 1. SUMMARY OF PRINCIPAL STRENGTHS AND WEAKNESSES OF THE DIFFERENT APPROACHES TO ESTIMATE UNINSURANCE RATES AT THE SUB-STATE LEVEL APPROACH PRINCIPAL STRENGTH PRINCIPAL WEAKNESS Direct Estimation Through Survey Sampling Precision Cost Proxy Measure Approach Cost Bias Model-based Approach Predictive accuracy Complexity Related References USE OF HOSPITAL ADMINISTRATIVE DATA Malec, D., Sedransk, J., Moriarty, C., & LeClere, F.B. Rask, K.J. (1994). Hospital discharge data and the (1997). Small area inference for binary variables in the uninsured. Journal of Health Care for the Poor and National Health Interview Survey. Journal of the Underserved, 5, 275-279. American Statistical Association, 92, 825-826. Turner, C., & Campbell, E. (1999). Counting the National Research Council (2000). Small area income uninsured using state-level hospitalization data. and poverty estimates: Priorities for 2000 and beyond. Public Health Reports, 114, 149-156. Washington, D.C., Committee on National Statistics, National Academy of Sciences. MODEL-BASED APPROACHES Popoff, C., Judson, D.H., & Fadali, B. (2001). Measuring the number of people without health insurance: A test of Chand, N., & Alexander, C.H. (1999). Using adminis- a synthetic estimates approach for small areas using trative records for small area estimation in the SIPP microdata. http://www.fcsm.gov/01_papers/Popoff.pdf American Community Survey. http://www.fcsm.gov/papers/mcf.html Schaible, W.L., Brock, D.B., Casady, R.J., & Schnack, G.A. (1979). Small area estimation: An empirical com- Chand, N., & Malec, D. (2001). Small area estimates parison of conventional and synthetic estimators for from the American Community Survey using a housing states. U.S. Dept. of Health, Education, and Welfare, unit model. http://www.fcsm.gov/01_papers/Chand.pdf Public Health Service, Office of Health Research, Diehr, P., Madden, C.W., Cheadle, A., Patrick, D., Statistics, and Technology, National Center for Health Fishman, P., Char, P., & Skillman, S. (1991). Estimating Statistics; ISBN: 084060176X. county percentages of people without health insurance. Schirm, A.L., Zaslavsky, A.M., & Czajka, J.L. (1999). Inquiry, 28, 413-419. Large numbers of estimates for small areas. Malec, D., Davis, W.W., & Cao, X. (1999). Model-based http://www.fcsm.gov/papers/schirm.pdf small area estimates of overweight prevalence using sample selection adjustment. Statistics in Medicine, 18, 189-200. State Health Access Data Assistance Center (SHADAC) | University of Minnesota School of Public Health 612-624-4802 | fax: 612-624-1493 | www.shadac.org