The DHS Program User Forum: Weighting data » Confirming correct survey design

Home » Data » Weighting data » Confirming correct survey design

Show: Today's Messages :: Show Polls :: Message Navigator

Confirming correct survey design [message #25024]

Sun, 21 August 2022 19:54

vbrar4
Messages: 13
Registered: February 2022

Member

Once a survey design has been set, what would be the best course of action for determining if it has been set correctly for a dataset that has its final report in its volume 1 (volume 2 not released yet). I am working with the NFHS-5 dataset, more specifically the mens subset and recently set the svy design with the code "generate weight = mv005/1000000" & "svyset [pw=weight], psu(mv001) strata(mv022)" as recommened by experts on this forum. However, the final report seems to be lacking an appendix and i cannot seem to find any mention of the percise number of strata and PSU's (just a quick mention of 30198 PSU's at the beginning of the report). I wonder this because anytime i tabulate anything using the command "svy: tab predictorname" i am presented number of PSU's and strata in the top left corner of STATA but, the number of PSU's is stated to be 9102.

Am i making an error or approaching this the wrong way because i see no other way to confirm the survey design and am not getting anything consistent to the final report.

Report message to a moderator

Re: Confirming correct survey design [message #25039 is a reply to message #25024]

Mon, 22 August 2022 20:26

vbrar4
Messages: 13
Registered: February 2022

Member

Hi!

I'm not exactly sure how this helps me confirm it. Doing this still makes STATA output provide the exact same results (e.g., for a tabulation, logistic regression etc.,). It also did not change the number of strata and PSU's STATA mentions when any command is ran.

I have been running "generate weight = mv005/1000000" & "cluster_ID=group(mv024 mv001)" & "svyset [pw=weight], psu(cluster_ID) strata(mv022) singleunit(centered)".

I understand how this may be the "correct" method because clusters are numbered within states, but this method does not change any output and once again provides nothing close to the PSU's mentioned in the final report.

[Updated on: Mon, 22 August 2022 20:33]

Report message to a moderator

Re: Confirming correct survey design [message #25046 is a reply to message #25039]

Tue, 23 August 2022 09:37

Bridgette-DHS
Messages: 3230
Registered: February 2013

Senior Member

Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

I made a mistake yesterday when I said that you needed to construct a new cluster id, and we are taking down that response to avoid confusion. v001 has already been constructed to be unique across states. Using egen group, as I suggested, will not change the results (as you found out).

If you enter "codebook mv001 mv022" you will see that there are indeed 9,102 clusters (unique values of v001) and 2,681 strata (unique values of v022). Those numbers, which Stata repeats for any command with svy and your svyset, are correct.

I see that on page 2 of the final report the number of PSUs is given as 30,198, as you say, a number that does not match the data files. We will post an explanation of the difference between 9,102 and 30,198. However, what matters now is that there is nothing wrong with your svyset command.

Report message to a moderator

Re: Confirming correct survey design [message #25047 is a reply to message #25046]

Tue, 23 August 2022 13:13

vbrar4
Messages: 13
Registered: February 2022

Member

Thank you for clearing up this confusing! Where could I expect to see this explanation whenever it is posted?

Report message to a moderator

Re: Confirming correct survey design [message #25061 is a reply to message #25047]

Thu, 25 August 2022 07:23

Bridgette-DHS
Messages: 3230
Registered: February 2013

Senior Member

Following is a response from DHS Research & Data Analysis Director, Tom Pullum (Fred Arnold contributed to this response):

The NFHS-5 had a 15% subsample for men and for some topics in the survey of women. To achieve a representative 15% subsample, men were interviewed in alternate households in 30 percent of the randomly selected clusters. That's the reason why there were about 30,000 clusters in the entire survey, and that many distinct values of hv001 in the HR and PR files and v001 in the IR, KR, and BR files, but only about 9,000 distinct values of mv001 in the MR file. There should be no problem with svyset and svy, or with merges. This design is unusual but it makes sense as a strategy to get a 15% subsample in such a large survey.

Report message to a moderator

Previous Topic:	creation of level 2 weights for multi level models
Next Topic:	Clustered Standard Errors

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Thu Jul 10 12:32:54 Coordinated Universal Time 2025