Home » Data » Weighting data » When/How to use (de-normalized) weights for pooled data analysis (four waves of BD child anthro data
When/How to use (de-normalized) weights for pooled data analysis (four waves of BD child anthro data [message #1558] |
Thu, 13 March 2014 10:49 |
annadinnyc
Messages: 9 Registered: March 2014
|
Member |
|
|
Hello,
I am interested in getting region-level means for the child anthropometric variables from the Bangladesh DHS 1999/2000, 2004, 2007, and 2011. As my base, I am using the PR files for 2004, 2007, and 2011 and the KR file for 2000 (since only children of interviewed women were measured/weighed). Also, for 2004 and 2007, I am using the HW files to get the anthro variables related to the new WHO standards, which I merge with the base files.
Once I pool the data, I understand (from the note by Rulin Ren) that I need to de-normalize the weights (hv005) using the following formula: hv005_denorm = hv005 X (#residential HH in country at time of survey)/(# HH interviewed in survey).
At this point, I am not sure how to use the survey weights in calculating the region-level means. My plan is to do the following: first calculate household means; then calculate region-level means. Do I use the de-normalized weights when calculating the household means or only when calculating the region-level means?
Also, I want to confirm that this is the correct code to use to set the survey design in STATA, where hv005_denorm is the de-normalized weight, hv021 is the primary sampling unit variable for BD, and hv023 is the strata id variable for BD (note: even though stratification changes over time, my understanding is that these variables capture the correct survey design for each survey year):
svyset [pweight=hv005_denorm], psu(v021) strata(v023)
Any suggestions would be much appreciated. Thank you.
Anna
|
|
|
Re: When/How to use (de-normalized) weights for pooled data analysis (four waves of BD child anthro data [message #1562 is a reply to message #1558] |
Thu, 13 March 2014 14:50 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
Hi Anna,
Well... this is a tough one. I have a couple of thoughts, but there isn't really an agreed upon answer as best as I can tell.
First - by switching recodes, you might be accidentally switching weighting schemes. Why can't you use the child recode for all? (I forget, I thought I had used all 4 child recodes from Bangladesh).
Second - what 'weight' do you want each total survey to have? Equal? you could just sum up the weights from each survey round, divide the old weight by the sum-of-survey-weights, and have within-survey-preserving-pweights that will sum to 1 for each survey.
Third - for regions, you could do the above, but at the region level (sum up regionXsurvey, divide, so each region sums to 1). Then do a weighted summarize (or collapse, with weights, by regionXsurvey) to recover weighted region means. Not sure the value of household means as an intermediate - at least not if you use the child recode for everything.
Fourth - I would create new strata and new PSUs for each survey round - I usually, say, multiply the survey year by 10,000 and add the psu to get a unique psu for each survey round, and something similar for strata. I don't think you want to lump them together.
Some small things: you can use a new stata package "zscore06" to get the new WHO standards fro all surveys using raw height/age/gender data. Why would you only want those for the 04/07? Or are the newer variables merged into the older files already? Seems weird to use two different standards.
Running a bit out of time (train WiFi!), but if this isn't clear, write back and I'll try again. Also, you don't have to take my advice. This is just one way to think about it and do it. Like I said, I don't think there is full agreement on this.
|
|
|
|
|
Re: When/How to use (de-normalized) weights for pooled data analysis (four waves of BD child anthro data [message #1610 is a reply to message #1608] |
Mon, 17 March 2014 17:00 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
Hi Anna,
Not a whole lot I can offer, but as for your options 1/2 on weighting: what I suggest is a kind of de-normalizing, just one where you only want to preserve relative within-survey probability of selection. The other version you mentioned earlier tries to re-adjust the weights from the household weights to get a "fraction of total households in the area that were surveyed" and would, I think, allow you to use a fully pooled sample and make that "population representative" by (sort of) accounting for changing population over time (supposing you had different Nhouseholds estimates for each survey year). But since you want separate estimates for each region-X-survey (yes?), you shouldn't need to do that.
Like I said, I tend to think of each survey as population-representative, and usually conceive the population as being static (I don't want to weight my 2000 results less than my 2007 just because the population was larger in 2007). But that is a preference, not a rule.
Question - do you really need to pool these surveys at all to do what you want? Why not just collapse each survey down into regional cells, weighting by the given weights? Oh, except that 2004 claims (in the coding, but not the documentation) it is not regionally representative, which is odd.
Also, a last comment on stratification (which obviously I'm still not great on): accounting for strata shouldn't affect point estimates, only standard errors. In fact, accounting for strata should decrease the size of your se's a little bit. So you could be "conservative" with your p-values/CIs and not account for strata at all. You could even, to be more conservative, skip the "svyset" part, specify the weight in the regression itself, and use the cluster robust se option clustered on PSU. Those SEs will be, maybe, slightly too big (conservative), but not by much, and if they are significant you should know that regardless of how you stratify they will remain significant (I know, this is a somewhat non-compelling argument for many reasons, but I find stratification specification doesn't matter too much here).
Can I ask what you want region-X-survey-round estimates for? You'll end up with like N=10regions-X-3-surveys = 30 mean anthropometric measurements. Not a lot to do further analysis on, right? I ask because I have special interest in Bangladesh, but would totally understand if you had some really cool idea you didn't want to share yet.
|
|
|
|
|
|
Goto Forum:
Current Time: Sun Jan 5 21:58:40 Coordinated Universal Time 2025
|