Home » Data » Weighting data » Question re-weighting combined survey data
|Re: Question re-weighting combined survey data [message #593 is a reply to message #233]
|Wed, 03 July 2013 18:17
Registered: March 2013
Location: Silver Spring Maryland
I am doing some somewhat similar things, so I'm going to take a stab at this. However I am not sure I am understanding exactly what is meant by "de-normalization". I take it to mean reversing the process of normalizing the weights to the sample size. So my comments are in that line.
I understand the trepidation of DHS-User. With regard to the statement about needing the second piece of information, I think that may vary by survey. For example, I have been working with Uganda AIS surveys (the standard DHS uses the same sampling scheme) and the documentation indicates that the number of households in the PSU (enumeration area) is found by a canvas of the PSU before the interviews start. So only the housing count of the PSU is needed, not the population - at least in that survey. That is, if one one feels that de-normalization is necessary. It would be nice if DHS posted the spreadsheet of counts for each stage of weighting.
Now on to the issue of "is it necessary". I don't usually work with normalized weights - DHS is the only data that I use that has them. I am also assuming that survey design software is being used to analyze the data. So I'm always wondering what the impact of normalizing is. For one activity I was doing (creating weights for the couple file) I did a test to make sure that we got the same results using the normalized household weight as we do with an original, not normalized weight. We know that E[c*X] equals c*E[X] so I expected that it did not matter, and it doesn't. But this is not a case of combining surveys, this is just one survey.
What is the purpose of combining surveys and what is the resulting estimate? That is probably the more important question. We regularly combine multiple years of U.S. surveys (NHIS, NHANES, MEPS, etc) and we do not de-normalize the weights. In fact, the weights are much more complex than the DHS weights. What we do is adjust the weight to reflect the number of surveys we are combining. This usually takes the form of dividing the weight by the number of surveys - nothing more than that. The reason we do this is for totals, not means. The result is that we are creating an estimate for the mid-point (in time) of the survey. Of course, these surveys are designed for this, and that may have an impact.
So what is the purpose of combining DHS surveys? Is it to get sufficient sample to evaluate a small population? Is it to compare countries? I can't tell this from the question posted, and the answer may impact what should be done with the weights. What I will point out is that the stratum variable needs to be modified to be survey specific. It is important that the strata and PSU information cannot be combined across surveys, unless that is planned for in the survey design, and to my knowledge none of the DHS surveys do this. This is easy to do, just add some multiple of 1000 to the stratum variable for each survey, i.e. 1000 for country A, 2000 for country B, 3000 for country C, etc. (Make sure the order of magnitude of the additive is larger than the order of magnitude of the stratum variable in every survey).
Back to the question of the weights. If the only reason for combining surveys is to make it easier to estimate survey specific results, for sure the weight is fine. Just fix the stratum variable.
If the purpose is some form of pooled estimate then it may also depend on the time of the survey. For example, pooling multiple countries that are all surveyed at approximately the same time then I don't see a reason to alter the weight variable. Now if multiple surveys from different time points for the same country are being combined, this might be more complicated. But even here I am not certain it is necessary to de-normalize. I think this specific case may require more investigation.
One final thought. Perhaps de-normalizing is necessary in the context of using software that does not have survey capabilities. In which case the strata and clustering is being ignored anyway, so we know the variances are not design correct. The solution here is simple - use the correct software. But I generally don't see a reason why the scale of the weights matters when using the correct software.
So, my take is this: 1) Use survey design capable software; 2) Modify the stratum variable to be survey specific; and 3) altering the weights is not necessary, but it is necessary to use weights.
Social & Scientific Systems, Inc.
Current Time: Thu Feb 29 09:27:33 Coordinated Universal Time 2024