The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Weighting data » Survey design in R (Getting warining that Sample size greater than population size)
Survey design in R [message #29599] Mon, 08 July 2024 10:18 Go to next message
RobertB is currently offline  RobertB
Messages: 4
Registered: June 2024
Member
Hello all,

Has anyone encountered the warning below when creating tabulations/statistical summaries of an outcome across various socio-economic characteristic. This is after accounting for survey design. The function used for statistical summary is tbl_svysummary() from the gtsummary package.

Warning: There were 48 warnings in `mutate()`.
The first warning was:
ℹ In argument: `df_stats = pmap(...)`.
Caused by warning in `svymean.survey.design2()`:
! Sample size greater than population size: are weights correctly scaled?
ℹ Run dplyr::last_dplyr_warnings() to see the 47 remaining warnings.

code used for survey design
svydesign(id=mydata$hv021,data=mydata, strata=mydata$hv023, 
                     weight=mydata$wt,nest=T)
options(survey.lonely.psu="adjust")

Notably when i run similar analysis in stata I do not get the error and despite the warning the proportions produced in R are similar to those
produced in stata.

Should i be concerned about the warning in R or ignore?

Thanks in advance.

Best wishes,
RobertB

Re: Survey design in R [message #29601 is a reply to message #29599] Tue, 09 July 2024 10:15 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3151
Registered: February 2013
Senior Member

Following is a response from Senior DHS staff member, Ali Roghani:

The warning you are encountering when using tbl_svysummary() from the gtsummary package is likely related to how the weights are scaled in your survey design. In R, the svymean.survey.design2() function is quite strict about weight scaling. To eliminate the warning, you can inflate the weights column so that it sums to the number of individuals it actually represents, rather than the number of survey respondents. Here's how we can do that:

# Adjust weights to sum to the actual population size
 total_population <- 25000000 # Replace with your actual population size
 mydata$wt_scaled <- mydata$wt * total_population / sum(mydata$wt) 
svy_design <- svydesign(id = ~hv021,  data = mydata,  strata = ~hv023,  weights = ~wt_scaled,  nest = TRUE) 

Using the adjusted weights in your svydesign() may eliminate the warning.
Previous Topic: Weighting for calendar module
Next Topic: Mali 2012/13 - Multilevel Modeling Weights
Goto Forum:
  


Current Time: Sat Sep 7 16:10:24 Coordinated Universal Time 2024