Home » Data » Weighting data » Weighting after de-normalization
|
|
|
|
|
Re: Weighting after de-normalization [message #3987 is a reply to message #3642] |
Mon, 16 March 2015 11:39 |
kinsukmanisinha@gmail.com
Messages: 9 Registered: January 2015 Location: Milan
|
Member |
|
|
Hi,
Many thanks for the discussion, it helped me understand few critical points.
I am new to survey analysis and hence, have a very basic naive question,
In the discussion you guys mention:
After pooling the data and de-normalizing the weight, you can use the de-normalized weight for any kind of analysis, restricted to a domain or not, except for estimating totals, because the weight is not in the right scale for totals.
What do you mean when you say "estimating totals"? What does "totals" stand for? From what I understand, is it mean?
In that case, does it imply that pooled data should not be used to perform cross-country (over time) descriptive analysis?
I would really appreciate any clarification.
Many thanks..!!!!
Regards
Kinsuk
|
|
|
Re: Weighting after de-normalization [message #3993 is a reply to message #3987] |
Mon, 16 March 2015 15:08 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
I think Ruilin means that if you wanted to, say, count the number of stunted children, you would get the wrong answer, because in that calculation the value of the weights has a meaning itself (instead of just the relative values of the weights). You will get the same mean stunting rate with original or de-normalized weights, but you would't get the same total number of stunted children.
|
|
|
|
Re: Weighting after de-normalization [message #4006 is a reply to message #3995] |
Tue, 17 March 2015 05:57 |
kinsukmanisinha@gmail.com
Messages: 9 Registered: January 2015 Location: Milan
|
Member |
|
|
Hi,
Thanks a lot for the previous explanations.
As I mentioned I am new to survey analysis and DHS database. Consequently, I have few more questions and I would appreciate any help.
Please find attached with this msg an excel sheet which contains the list of countries, respective years and surveys that I intend to pool for my analysis.
I am interested in child birth and health variables, women empowerment variables, household living conditions. I know that for the child health variables, I need to look into the child file. However,I found that women data file is not available for all the countries always. Am I missing out on searching somewhere?
Then, I learned that in order to pool the datasets I need to de-normalize the sampling weights. ( http://userforum.dhsprogram.com/index.php?t=msg&th=1189& amp;start=0&S=dac787ddfcaa55c72987b9d7b09759fa)
And, also change the PSU variable. So, if I understand well because I pool surveys of different phases from different countries, I will have two PSU. First, at country level and then at household level, right?
Once I de-normalize the weights, fix the PSU and append the datasets, the database is ready for analysis, right? As in I don't need to do something else to the weights or the PSU after I pool in the database. I am sorry if you have already answered this question, I am not confident with how to proceed.
My analysis will consist of descriptive statistics. I intend to perform descriptive analysis with the country level databases (these will be country level for multiple years) and then a regression analysis for the pooled dataset.
Do, I need to take into account some special treatment for the weight and PSU for the above two analysis, apart from what I already mentioned?
Once again many thanks, this forum has been very helpful, thanks a lot..!!!
Regards
Kinsuk
|
|
|
Re: Weighting after de-normalization [message #4017 is a reply to message #4006] |
Tue, 17 March 2015 21:14 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
"if I understand well because I pool surveys of different phases from different countries, I will have two PSU. First, at country level and then at household level, right?" - actually you just want 1 PSU number for each sample PSU, it just needs to be unique across years and countries. So for example, you could generate your PSU numbers by giving each country a number between 10 and 99 and then generating a PSU by "PSU*1000000 + Year*100 + CountryNumber" ... something like that (you could also probably concatenate variable in some way, you just need to create a unique number of each country-X-survey-X-PSU
"Once I de-normalize the weights, fix the PSU and append the datasets, the database is ready for analysis, right" - yep. You just need to use svyset and the svy: prefix before your regressions (so as to actually use the weights and PSUs).
"Do, I need to take into account some special treatment for the weight and PSU for the above two analysis, apart from what I already mentioned?" - nope, just set the svyset*.
This is just a technical note: you will be implicitly weighting regressions here by not just probability weight, but also by the sum of the total weights for the country (that is, by the number of survey rounds and the size of the country, which determine the values of the de-normalized weights). If you have big countries or countries with many more survey rounds than other countries, they will get higher weight in your regression. Maybe they should, maybe they shouldn't - that is just an issue regarding interpretation of the regression coefficients.
|
|
|
Re: Weighting after de-normalization [message #4023 is a reply to message #4017] |
Wed, 18 March 2015 08:56 |
Bridgette-DHS
Messages: 3190 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Sampling Specialist, Ruilin Ren:
If you de-normalized the weight properly, you can apply the de-normalized weight in your analysis, either restricted to certain domains or to certain sections of the questionnaire. Define a new variable name for the de-normalized weight in your pooled data set, and then declare that variable as your weight variable.
As for the problem of estimating "totals", here "totals" means estimates of population totals, such as the estimation of "total number of children under 5 years of age who had fever in the last two weeks in a country". Because the sampling weight in the data file is a relative weight without a scale, it is not valid for estimating population totals. While the de-normalized weight provides a weight that can be used for your analysis, the weight produced may be user specific, depending on the de-normalization procedure used, so different users may produce different estimates of the same indicator. Therefore we do not recommend using the de-renormalized weight for estimating population totals. However the scale of the weight has less to no effects on other analyses such as for estimating means, proportions, ratios and rates, and for correlation analysis.
|
|
|
Re: Weighting after de-normalization [message #4057 is a reply to message #4023] |
Tue, 24 March 2015 13:00 |
kinsukmanisinha@gmail.com
Messages: 9 Registered: January 2015 Location: Milan
|
Member |
|
|
Many thanks for the replies and the clarification..!!!
"actually you just want 1 PSU number for each sample PSU, it just needs to be unique across years and countries. So for example, you could generate your PSU numbers by giving each country a number between 10 and 99 and then generating a PSU by "PSU*1000000 + Year*100 + CountryNumber" ... something like that (you could also probably concatenate variable in some way, you just need to create a unique number of each country-X-survey-X-PSU"
So, for every country for every year I will have one PSU. And, this is not affected by the survey. Hence when you say country-X-survey-X-PSU, you basically mean country and year.
Thanks once again, it has been very helpful..!!!
|
|
|
Re: Weighting after de-normalization [message #4059 is a reply to message #4057] |
Tue, 24 March 2015 16:59 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
Suppose you have datasets from two countries and two years for each country:
County A Year 1 - 20 PSUs
Country A Year 2 - 20 PSUs
Country B Year 1 - 30 PSUs
Country B Year 2 - 35 PSUs
Here you would want a total of 105 PSUs. You want the number of PSUs in the final dataset to equal the SUM of the number of PSUs in EACH dataset. This is NOT one PSU per dataset (I'm calling a dataset a county-round or country-year - that is, a point in time and space where a DHS is conducted), it is N PSUs per dataset, where N is the number of PSUs in the original sampling design.
|
|
|
Goto Forum:
Current Time: Fri Nov 8 20:52:55 Coordinated Universal Time 2024
|