Guidance Needed on Weighting for Pooled DHS Data in Logistic Regression [message #30319] |
Sat, 02 November 2024 08:49 |
ygkim127
Messages: 3 Registered: November 2024
|
Member |
|
|
Hello,
I am reaching out with a question regarding data weighting. I am conducting research on "The Effect of Girls' Empowerment on Adolescent Pregnancy in Sub-Saharan Africa," aiming to investigate whether increased aged 15-19 girls' empowerment has a positive effect on reducing adolescent pregnancy rates in this region.
I plan to pool data from 27 Sub-Saharan African countries and will be using DHS-7 and DHS-8 data from the IR datasets of these countries. The explanatory variable will be women's empowerment, while the dependent variable will be the pregnancy status of adolescents aged 15-19. I intend to perform logistic regression analysis using Stata.
Since I need the overall pooled set weights that can represent Sub-Saharan Africa, I want to ensure that I am correctly calculating and applying these weights. I have read previous posts on this users forum and the "Note on DHS standard weight de-normalization" file, but I would appreciate your guidance to confirm my understanding and approach.
I have conducted weight de-normalization for each country using the formula: V005×(total females age 15-49 in the country at the time of the survey)/(number of women age 15-49 interviewed in the survey)
I have extracted data only for married women aged 15-19 from each country.
I have used the "append" function to pool the data from the 27 countries into one dataset.
I want to apply weights when conducting logistic regression analysis, but I am unsure how to do so.
Please let me know if there are any mistakes in the sequence of these steps or in the weight de-normalization process. Additionally, I would greatly appreciate the exact Stata code for applying weights in the pooled dataset and conducting the logistic regression analysis.
I look forward to your response.
Thank you.
|
|
|
Re: Guidance Needed on Weighting for Pooled DHS Data in Logistic Regression [message #30322 is a reply to message #30319] |
Mon, 04 November 2024 11:47 |
Bridgette-DHS
Messages: 3214 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS staff member, Tom Pullum:
What you have done so far looks fine to me. For the last step, actually applying the weights, please check previous posts on the "svyset" command. This will adjust for the clusters and strata in the surveys, as well as the weights. As you will see in the earlier posts, you need to construct a new variable for the cluster ID, to distinguish between v001 in different surveys. For example, you can enter "egen cluster_ID=group(survey v001)". You will also need a new variable for the statum ID. We recently posted (again) a file that specifies the stratification variable in all the surveys. In recent surveys it is v022=v023 and in most older surveys it is v024 x v025, but there are exceptions, and they are given in that file.
In the analysis I would include a fixed effect for survey with "i.survey". A multi-level model with a random effect for survey is not justified, in my opinion, and it would add complexity.
I expect that your outcome variable has huge variation from one survey to another, as well as variation within most surveys. The whole concept of pooling surveys with this kind of an outcome seems to me to be unnecessary, but of course you can do what you want.
|
|
|
|
|
|
|
|
|
|
|