The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Weighting data » Pooling 3 rounds of DHS Nepal -- weights? (pooled cross-section sampling weights Nepal)
Pooling 3 rounds of DHS Nepal -- weights? Tue, 08 October 2019 13:38
 LeahBevis Messages: 1Registered: October 2019 Member
I am working on a project where we pool 3 rounds (2006, 2011, 2016) of Nepal's DHS data in the terai region only, doing analysis in Stata with all observations at once. While I know how to use DHS weights for a single cross-section, I am not fully sure how to adjust the weights, PSU and strata variables in order to have correct SEs in all analysis. Three questions.

1. It looks like I should generate weights in this way, where 2497.704, 2660.37095, and 2773.8584 are the total population of Nepal in each round. (I obtained those populations by first creating wt as below (v005/1000000), then using tab year [iweight=wt].)

gen wt = v005/1000000
replace wt=wt/2770.8256 if year==2006
replace wt=wt/2660.37095 if year==2011
replace wt=wt/2773.8584 if year==2016

After doing this, I find that the weighted-average of my variable wt is 1 in each round. I think this is correct?

2. I am unsure how to change the stratum within Nepal, in part because I'm not clear on the construction of the stratum in each year, and also because I don't know if I would ideally want these stratum to be unique by year or to change by year. Right now, the stratum in the terai range from 9-13 in 2006 and 2011, but from 1-14 in 2016. If stratum should be the SAME in all years, then I need to recode the 2016 stratum to match the previous years. (And I would need to know, from DHS, how to do this so the locations were consistent.) If stratum are supposed to be UNIQUE by year, I simply need to differentiate the numbers 9-13 from each round.

3. Similarly, the primary sampling unit IDs (held in variable v021) changed in 2016. The IDs range up to 7,000 in 2006 and 2011, but stop at 400 in 2016. Similar to the question above, I'm not really sure what the goal is here... do I want PSUs to be unique by round, or the same across rounds? If the same across rounds, I need to know how to recode the 2016 PSUs into the 2006 and 2011 PSUs. (Also, is it a problem that while many PSU IDs are given in both 2006 and 2011, there are also many PSU IDs that are only in 2006, or only in 2011?)

4. And just to be sure, after having created an adjusted PSU var and an adjusted strata var, I believe I use the wt variable defined above, and then run:

Thank you!
Leah
Re: Pooling 3 rounds of DHS Nepal -- weights? [message #18252 is a reply to message #18185] Mon, 21 October 2019 06:59
 Bridgette-DHS Messages: 1785Registered: February 2013 Senior Member

Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

Say that you have combined the three surveys (by appending them) and you use v021 as the psu id and v023 and the stratum id. Then run these two lines:

egen clusterid=group(survey v021)
egen stratumid=group(survey v023)

This is the easiest way to get unique ids in the combined file for clusters and strata. There is no need to do any other kind of spatial reconciliation.

What you have done with the weights seems ok to me. Yes, the means will be 1 within each survey and overall. However, there is a problem of scale. You divided v005 by 1000000, but when you divide again by 2770.8256, etc. you will have a number which is much less than 1. You should avoid having decimal points or anything to the right of a decimal point in a weight. There are some kinds of weights (such as fweight in Stata) that require a weight to be an integer. I have encountered weighting procedures that will truncate a weight and ignore anything to the right of the decimal point, which means that a weight between 0 and 1 will be treated like 0--that is, the case will be ignored entirely! You might want to rescale your weights somehow with an arbitrary multiplier such as 1000000.
 Previous Topic: PSU and Strata identifiers - DV module Nepal Next Topic: Use of weight on subsetted data
Goto Forum:

Current Time: Mon Jun 1 14:47:41 Eastern Daylight Time 2020