Home » Topics » General » Weithted analysis in R
Weithted analysis in R [message #29696] |
Tue, 23 July 2024 10:22 |
Mahir
Messages: 12 Registered: September 2023
|
Member |
|
|
Dear DHS team,
I have one more question regarding analysis of data using sampling weights.
I am using weighted data for analysis in R. I use r and I am using tbl_svysummary() in R to generate weighted table, how ever after accounting for weight using the following formula I get a weird error
library(survey)
tb1<-svydesign(ids = ~v021,weights = ~wt,data = tb_data, strata = ~v022, nest= T) %>%
tbl_svysummary(include = c(age,v025,WI),statistic = list(all_continuous() ~"{mean} ({sd})",all_categorical() ~ "{n} ({p}%)"), digits = all_continuous() ~ 2, label = c(age~"Age",v025~"Residence",WI~"Wealth Index"))
I am running this command and I have checked the syntax is ture. However I encounter the following error:
"Warning message:
There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `df_stats = pmap(...)`.
Caused by warning in `svymean.survey.design2()`:
! Sample size greater than population size: are weights correctly scaled?
I am using the DHS_Benin data and I will do the same for seven other counties (Cameroon, Cote d'Ivoire, Ghana, Kenya, liberia, Nigeria and Uganda). Could you help me with this, it would be a big help. I have posted the question on DHS user forum already. Thank you!
Best
Mahir
|
|
|
Re: Weithted analysis in R [message #29723 is a reply to message #29696] |
Fri, 26 July 2024 14:57 |
Janet-DHS
Messages: 921 Registered: April 2022
|
Senior Member |
|
|
Following is a response from Senior DHS staff member, Ali Roghani:
This warning likely stems from how the weights are currently scaled in your dataset. To eliminate this warning, you can adjust the weights column so that it represents the total population size rather than just the number of survey respondents.
Here's a step-by-step guide to making this adjustment:
First, define the total population size. Replace with your actual population size:
total_population <- XXXXXX
Next, scale the weights in your dataset:
tb_data$wt_scaled <- tb_data$wt * total_population /sum(tb_data$wt)
Finally, create your survey design object using the scaled weights:
tb1 <- svydesign(ids = ~v021, weights = ~wt_scaled, data = tb_data, strata = ~v022, nest = TRUE)
Using the adjusted weights in your code should remove the warning.
The best source of estimates of population size for different countries and years is the U.N. Population Division:https://population.un.org/wpp/.
Please let us know if this does not solve the problem or if you have other questions.
|
|
|
|
|
Re: Weithted analysis in R [message #29814 is a reply to message #29807] |
Thu, 08 August 2024 09:29 |
Mahir
Messages: 12 Registered: September 2023
|
Member |
|
|
Dear DHS team,
Thank you for your response.
In my case, I am using DHS data from eight countires, Benin 2017-18, Cameroon 2018, Ivory Coast 2021, Ghana 2022, Liberia 2018-19, Kenya 2022, Nigeria 2021 and Uganda 2016. I am interested in the IYCF indicators from KR file for children aged 0-23 months.From the nutrition data I have created a diet index, I want to look at the association of diet index with wealth index, mother's education, place od residence etc. I want to do this using the weighted data. I create a sub dataset from the KR file that contains only children aged 0-23 months. As I have already mentioned, when I use scaled weight, I get a talbe with total population of Benin. Should I only use total population of children aged 0-23 months to scale the weight ? In case of Benin, for example, there are total of 3937 children aged 0-23 months , when i use the weights togenerate a table, the table shows 2933 children (not the scaled weight, original weight variable). This should not be the case. Do you have any advise for this?
Secondly, I would also like to create a new population level weight by denormalzing the weight for the pooled data of all eight countires. Just to make camparasions between countires. Can you share tell me how I can do this? I am using R for data analysis. I can figure out the code if you can could just explain me how to conceptually do this. But if you can share the R code that would be also great. :)
Best
Mahir
|
|
|
Re: Weithted analysis in R [message #29825 is a reply to message #29814] |
Fri, 09 August 2024 09:35 |
Bridgette-DHS
Messages: 3223 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS staff member, Tom Pullum:
You are making a bigger deal out of the weights than necessary. You are looking for some multiplier to re-scale the weights, but almost all DHS estimates--rates, means, proportions, regression coefficients, etc., are invariant with respect to such a multiplier. You can confirm this by, say, running an analysis with weight v005, as given in the data, and then multiplying v005 by 2 or 10 or 100, and re-running. Nothing will change, not even the standard errors.
The only exception would be if you are trying to estimate, say, the NUMBER of children age 0-23 months at the national level who were never breastfed. It that's what you want to do, you should re-scale to the population number of children age 0-23 months. You could get an estimate of that population number from Population Prospects 2024.
Weights for that purpose are called expansion weights. DHS reports, in both final reports and research reports, so far as I am aware, have never used expansion weights.
If you want to make pooled estimates, for example for "West Africa" (I put this in quotes because there are alternative lists of countries and in any list DHS has not had surveys in all countries) then you could multiply v005 by P/p, where P is the national population and p is the sum of the weights in the sample (without the factor of 1 million). But I would strongly recommend against such pooled estimates, because the pooled population is never well-defined and the results will be dominated by the largest country, such as Nigeria, obscuring differences between countries. If the goal of your pooling is to look at DIFFERENCES between countries, which is common, then you do not need to adjust the weights at all.
There have been many postings on weights on the forum. We have nothing to add to what has already been stated.
|
|
|
|
Re: Weithted analysis in R [message #29872 is a reply to message #29863] |
Thu, 15 August 2024 10:09 |
Bridgette-DHS
Messages: 3223 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS staff member, Tom Pullum:
The weight of a child in the KR file is the same as the weight of the mother in the IR file. The cases in the KR file are not respondents, but are the births in the past 5 years to the women in the IR file.
If you are rescaling to the number of women in the population, then you are basically rescaling the weights in the IR file, NOT the weights in the KR file
For this survey, the total number of women in the IR file is 15928. The weights in the IR file, v005, are scaled so that their total matches 15928 (with a factor of 1000000).
You want to rescale so that the sum of the weights in the IR file would be 2717666. You do this by multiplying v005 by 2717666/15928.
If you multiply v005 in the KR file by this same ratio, the total will be (approximately) he number of births in the past 5 years to the women in the population. That's what you would want it to be. Note that the KR file includes children who died, as well as children who survived.
Let us know if you still have a problem.
|
|
|
Goto Forum:
Current Time: Wed Jan 15 00:14:35 Coordinated Universal Time 2025
|