Stratification [message #1276] |
Mon, 03 February 2014 20:09 |
hlyons
Messages: 5 Registered: February 2014 Location: Seattle
|
Member |
|
|
Hello -- I was wondering if there are some known "strata" variable errors in some of the surveys? My example is a random one, Zambia DHS IV:
Using the children's recode and R for computing, here is a somewhat fake example at the bottom -- fake in the sense the outcome (urban/rural split for children) is not something I'm really looking at. Basically, it looks like there are too many strata (v022). A less theoretical example would be DPT3 coverage nationally for urban and rural areas -- I think the standard errors for urban in the final report are more compatible with using v022. It's hard to say for sure because I do get some differences trying to duplicate the results in the final report. Maybe I'll write another post about that later...
Thoughts, anybody?
Thanks!
Hil
# R example
library(survey)
tmp.data = read.dta("ZMKR42FL.DTA")
# PSU's, checks out: 320
length(unique(tmp.data$v021))
# province, check out: 9
length(unique(tmp.data$v024))
# province and urban/rural combinations: 18
nrow(unique(tmp.data[,c("v023","v025")]))
# strata: 153 instead of 18
length(unique(tmp.data$v022))
# example of standard errors under two designs
# first, stratify on v022; second, on province and u/r combination
DHSdesign.v022 = svydesign(id = tmp.svy$v021, strata=~tmp.svy$v022, weights = tmp.svy$v005/1000000, data=tmp.data)
DHSdesign.prov.ur = svydesign(id = tmp.svy$v021, strata=~tmp.svy$v023+tmp.svy$v025, weights = tmp.svy$v005/1000000, data=tmp.data)
# proportion urban amongst children
svymean(~v025, design = DHSdesign.v022) #SE = 0.0099
svymean(~v025, design = DHSdesign.prov.ur) #SE = 0.0227
|
|
|
|
|
Re: Stratification [message #1464 is a reply to message #1454] |
Fri, 28 February 2014 11:40 |
Trevor-DHS
Messages: 803 Registered: January 2013
|
Senior Member |
|
|
DHS used to use an approach of producing implicit strata, based on combining clusters into pairs or small groups of 3 clusters. These implicit strata were used in the calculation of sampling errors. This approach was used in the World Fertility Surveys and in earlier DHS surveys were there was an implicit stratification based on an ordering of the clusters. The DHS sampling experts no longer recommend this approach for the stratification. v022 in the Zambia DHS IV survey contains the variable that has this pairing or grouping, as was used for the sampling error calculations reported in final report.
You will find this true in many DHS surveys, particularly the older ones.
In cases like this, check also v023 to see if this contains the strata. In many surveys this actually contains the regions rather than the strata. If the strata are not actually provided directly in the data set, then check appendix A of the final reports to see what strata were actually used. In many cases the strata will be urban and rural areas within region (in the case of the Zambia survey, urban and rural areas within province). However, in some surveys there is no stratification beyond the region, while in other surveys, there are 3 or 4 strata within each region. Appendix A of the final report should provide the information for this.
I hope this helps.
|
|
|
|
|
|
|
|
Re: Stratification [message #1477 is a reply to message #1476] |
Mon, 03 March 2014 18:08 |
hlyons
Messages: 5 Registered: February 2014 Location: Seattle
|
Member |
|
|
Thanks for the additional info. I had never encountered it before, but it seems to be (or have been) a pretty common (even standard?) method with systematic sampling where within an explicit stratum the frame is usually sorted geographically. Doing a search, there might be further description of it in Kish's Survey Sampling which I don't have. Anyway, thanks for the tip and docs.
Hil
|
|
|