Survey Design Issue [message #22488] |
Fri, 19 March 2021 17:44 |
hamid
Messages: 9 Registered: November 2018
|
Member |
|
|
Recently I am experiencing some issues in setting the DHS survey design in R using the Afghanistan 2015 Standard DHS7.
After loading the Stata format individual recode dataset (AFIR71FL) into R:
library(foreign)
library(survey)
data <- read.dta("AFIR71FL.DTA")
data$wt <- data$v005 /1000000
I simply run the following command (as indicated in the DHS documentation):
DHSdesign<-svydesign(id = data$v021, strata = data$v023, weights = wt, data=data)
and I get the following error:
Error in svydesign.default(id = data$v021, strata = data$v023, weights = wt, data = data) :
Clusters not nested in strata at top level; you may want nest=TRUE.
which is basically telling me that PSU ids are not unique across Strata (Province +rural/urban).
Some additional notes:
- The same code worked just fine over the past month (working on it everyday);
- I tried a fresh install of R + packages without any success;
- R & packages are up to date;
- I get the same error using different machines;
- The AfDHS file in use is the most recent;
- Using the option "nest=T", as suggested by R, is not a working solution as it creates problems when running "svyglm" regressions.
I am wondering if this is a R:suvery package bug or something else.
Thanks,
Hamid
[Updated on: Sat, 20 March 2021 05:33] Report message to a moderator
|
|
|
Re: Survey Design Issue [message #22510 is a reply to message #22488] |
Mon, 22 March 2021 16:32 |
Bridgette-DHS
Messages: 3214 Registered: February 2013
|
Senior Member |
|
|
Following is a response from DHS Senior Sampling Specialist, Mahmoud Elkasabi:
Apparently this is due an error in the IR dataset (I assume it exists in the other ones as well; I haven't checked). One woman in cluster 476 is coded as rural in v025 although the cluster is urban. This causes the error message.
Here are two possible solutions:
1- Use the nest option after you re-construct v021 and v023:
IRdata$STRAT <- as.integer(factor(with(IRdata, paste(v024, v025))))
IRdata$CLUST <- as.integer(factor(with(IRdata, paste(v023))))
DHSdesign<-svydesign(id = IRdata$CLUST, strata = IRdata$STRAT, weight = IRdata$v005, data=IRdata, nest = TRUE)
2- Recode a new v025 variable (say v025r) where you assign all cases with v021=476 to v025r=1 (otherwise v025r=v025) and then proceed with the code below:
IRdata$STRAT <- as.integer(factor(with(IRdata, paste(v024, v025r))))
IRdata$CLUST <- as.integer(factor(with(IRdata, paste(v023))))
DHSdesign<-svydesign(id = IRdata$CLUST, strata = IRdata$STRAT, weight = IRdata$v005, data=IRdata)
|
|
|
Re: Survey Design Issue [message #22511 is a reply to message #22510] |
Tue, 23 March 2021 05:12 |
hamid
Messages: 9 Registered: November 2018
|
Member |
|
|
Thanks a lot. This actually works!
I prefer the second approach better. This issue does not apply to the couples recode (probably because this woman is not listed there).
As already written, the nest=T option creates problems when running some R function (e.g., lmtest::waldtest would produce an unequal sample between models error).
Kind regards,
Hamid
|
|
|