The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Sampling » Accounting for different sampling areas over different years
Accounting for different sampling areas over different years [message #3783] Tue, 10 February 2015 23:57 Go to next message
UAB_user is currently offline  UAB_user
Messages: 21
Registered: September 2014
Location: Alabama
Member
Hello,

I am using the Nepal DHS to look at factors affecting migration across the 01, 06, and 11 survey years.

I have de-normalized the weights for each year according to Ruilin's suggestions, but do I have to somehow account for the different sampling areas for each year. Would it be ok to merge all three years and use the cluster (V001) and strata (V023) variables in my analysis and assume the areas are the same for each survey round?

If I do have to adjust them, how do you recommend I go about doing so?

Thank you
Derek
Re: Accounting for different sampling areas over different years [message #3791 is a reply to message #3783] Wed, 11 February 2015 10:44 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3016
Registered: February 2013
Senior Member
Following is a response from Senior DHS Specialists Ruilin Ren & Trevor Croft:


You need to do something to distinguish the cluster numbers (V001) that have the same value, but they actually came from different surveys. Each cluster from each survey should stand alone as a cluster in your merged file.

You can create a new cluster number as follows:
gen year =.
{ Convert Nepali years to unique survey years }
replace year=2011 if v007 == 2067 | v007 == 2068
replace year=2006 if v007 == 2062 | v007 == 2063
replace year=2001 if v007 == 2057 | v007 == 2058
egen newcluster = group(year v001)

Re: Accounting for different sampling areas over different years [message #3794 is a reply to message #3791] Wed, 11 February 2015 18:14 Go to previous messageGo to next message
UAB_user is currently offline  UAB_user
Messages: 21
Registered: September 2014
Location: Alabama
Member
Great!

Thank you
Derek
Re: Accounting for different sampling areas over different years [message #3813 is a reply to message #3791] Tue, 17 February 2015 15:56 Go to previous messageGo to next message
UAB_user is currently offline  UAB_user
Messages: 21
Registered: September 2014
Location: Alabama
Member
Would i have to do this to V023 as well?
Re: Accounting for different sampling areas over different years [message #3825 is a reply to message #3813] Wed, 18 February 2015 07:58 Go to previous messageGo to next message
Trevor-DHS is currently offline  Trevor-DHS
Messages: 787
Registered: January 2013
Senior Member
While the strata in v023 are consistent across the 3 surveys and represent the same areas (unlike v001 which are different clusters in each survey year), I would recommend following the same procedure to create a separate strata for each survey year as for v001.
Re: Accounting for different sampling areas over different years [message #4202 is a reply to message #3825] Thu, 16 April 2015 13:08 Go to previous messageGo to next message
mmr-UMICH is currently offline  mmr-UMICH
Messages: 21
Registered: February 2015
Location: A2, MI
Member
Strata are consistent across surveys for a country indicates that the codes/values of strata variable (after combining region and residence variables) are the same across the survey waves (e.g, 2001, 2006, 2011). If country has 5 regions and urban/rural, so there are 10 strata codes (say, 1 to 10) for each survey year. My understanding is that in pooled data set the number of strata is still to be 10. Because the stratification was the same but the sampling of clusters within stratum was different for each survey year, so cluster codes must be the different for identical strata across the survey waves. If we treat strata codes different across the surveys, the variance estimation is not only affected but also the degrees of freedom, confidence intervals, and p-value calculations.
Re: Accounting for different sampling areas over different years [message #4203 is a reply to message #4202] Thu, 16 April 2015 16:05 Go to previous message
Reduced-For(u)m
Messages: 292
Registered: March 2013
Senior Member

My intuition is that you would want to use different strata too - the idea being that the stratification was done separately by survey round, even if they overlap - but I think this is probably, if not an open question in the survey analysis literature, at least sufficiently esoteric that there is no agreed-upon course of action. That said, I do have two points I'm more sure about:

1 - you say "If we treat strata codes different across the surveys, the variance estimation is not only affected but also the degrees of freedom, confidence intervals, and p-value calculations." But variance estimation will always affect CIs and P-values, and the effect of the loss of DF should not affect critical values, given the large number.

2 - depending on your variables of interest and how those are constructed, you might want to use a standard error estimator that accounts for more robust correlations than those you would use if you were just looking at a single, individual-level covariate from one survey. Error terms are likely correlated across time within region (worse if you are using aggregated or constructed variables on the right hand side of your regression) and the standard DHS method won't account for this, but clustering by spatial region across survey rounds would.
Previous Topic: Working with Wealth Index Quintiles
Next Topic: Mali DHS sub-sample analysis
Goto Forum:
  


Current Time: Thu Mar 28 12:38:24 Coordinated Universal Time 2024