Accounting for different sampling areas over different years [message #3783] |
Tue, 10 February 2015 23:57 |
UAB_user
Messages: 21 Registered: September 2014 Location: Alabama
|
Member |
|
|
Hello,
I am using the Nepal DHS to look at factors affecting migration across the 01, 06, and 11 survey years.
I have de-normalized the weights for each year according to Ruilin's suggestions, but do I have to somehow account for the different sampling areas for each year. Would it be ok to merge all three years and use the cluster (V001) and strata (V023) variables in my analysis and assume the areas are the same for each survey round?
If I do have to adjust them, how do you recommend I go about doing so?
Thank you
Derek
|
|
|
|
|
|
|
Re: Accounting for different sampling areas over different years [message #4202 is a reply to message #3825] |
Thu, 16 April 2015 13:08 |
mmr-UMICH
Messages: 21 Registered: February 2015 Location: A2, MI
|
Member |
|
|
Strata are consistent across surveys for a country indicates that the codes/values of strata variable (after combining region and residence variables) are the same across the survey waves (e.g, 2001, 2006, 2011). If country has 5 regions and urban/rural, so there are 10 strata codes (say, 1 to 10) for each survey year. My understanding is that in pooled data set the number of strata is still to be 10. Because the stratification was the same but the sampling of clusters within stratum was different for each survey year, so cluster codes must be the different for identical strata across the survey waves. If we treat strata codes different across the surveys, the variance estimation is not only affected but also the degrees of freedom, confidence intervals, and p-value calculations.
|
|
|
Re: Accounting for different sampling areas over different years [message #4203 is a reply to message #4202] |
Thu, 16 April 2015 16:05 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
My intuition is that you would want to use different strata too - the idea being that the stratification was done separately by survey round, even if they overlap - but I think this is probably, if not an open question in the survey analysis literature, at least sufficiently esoteric that there is no agreed-upon course of action. That said, I do have two points I'm more sure about:
1 - you say "If we treat strata codes different across the surveys, the variance estimation is not only affected but also the degrees of freedom, confidence intervals, and p-value calculations." But variance estimation will always affect CIs and P-values, and the effect of the loss of DF should not affect critical values, given the large number.
2 - depending on your variables of interest and how those are constructed, you might want to use a standard error estimator that accounts for more robust correlations than those you would use if you were just looking at a single, individual-level covariate from one survey. Error terms are likely correlated across time within region (worse if you are using aggregated or constructed variables on the right hand side of your regression) and the standard DHS method won't account for this, but clustering by spatial region across survey rounds would.
|
|
|