The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Countries » India » NFHS-5 data from STATA not matching the factsheets (Extracting data using Household data file does not yield similar results as per the factsheet. What am I missing?)
NFHS-5 data from STATA not matching the factsheets [message #28361] Wed, 20 December 2023 14:32 Go to next message
anshul.11 is currently offline  anshul.11
Messages: 2
Registered: December 2023
Member
Hi!

For a project, I am supposed to extract data from the Household data file. However, despite running commands which as per me are correct for tabulating, the results are not matching the fact sheets published by DHS. For example, following are the commands I run for defining svyset

gen pwt=.
replace pwt= hv005/1000000
tab hv206
egen cluster_id = group( hv021 hv024 )
egen stratum_id = group( hv023 hv024)
svyset cluster_id [pw=pwt], strata(stratum_id)

following which, I am trying to tabulte the percentage of households which have electricity in each state

svy: tabulate hv024 hv206, row

However, the results have a mismatch from the factsheets. as in, Bihar has 95.61% households with electricity as per the output in STATA. In the factsheet the percentage is 96.3% (accessed from: http://rchiips.org/nfhs/NFHS-5_FCTS/Bihar.pdf). Similarly, for the country, the percentage of households with electricity in the output is 96.53%; factsheet has 96.8% as the value. Same discrepancy exists with other indicators. Can anyone please let me know what am I doing wrong?






Re: NFHS-5 data from STATA not matching the factsheets [message #28367 is a reply to message #28361] Thu, 21 December 2023 14:55 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3043
Registered: February 2013
Senior Member

Following is a response from Senior DHS staff member, Tom Pullum:

I see that you used the HR file, in which households are units. Your calculation is correct for households. However, in the Bihar report, and elsewhere, the label is "Population living in households with electricity (%)". If you run the same lines on the PR file, in which individuals are units, you will match the report. Or, see below, you can match the report with the HR file if you multiply the weight by the household size.

For the percentages you only need the weights. You do not need the full svyset command. The adjustments for clusters and strata can be omitted. Also you do not need all the steps for the weights. You just need the following two lines in the HR file:

* The following line gives the percentages for households
tab hv024 hv206 [iweight=hv005/1000000], row

* The following line gives the percentages for individuals
tab hv024 hv206 [iweight=hv009*hv005/1000000], row

In the second command, I multiply the weight by hv009, which is the number of individuals in the household.
Re: NFHS-5 data from STATA not matching the factsheets [message #28370 is a reply to message #28367] Fri, 22 December 2023 01:36 Go to previous messageGo to next message
anshul.11 is currently offline  anshul.11
Messages: 2
Registered: December 2023
Member
Thank you so much Tom and Bridgette! It solved my issue.

I have a follow-up query Since we do not svyset for percentages across the dataset. Can you please specify where do we need it? which usecase does the svyset command serve in respect to NFHS. For example, if we are using the wealth index estimates at a district level, will [iw=hv005/1000000] suffice?

Thanks in advance.
Re: NFHS-5 data from STATA not matching the factsheets [message #28374 is a reply to message #28370] Fri, 22 December 2023 09:08 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3043
Registered: February 2013
Senior Member

Following is a response from Senior DHS staff member, Tom Pullum:

In general, you should always adjust for weights, one way or another, in order to get unbiased estimates of population values. Adjustments for clusters and strata are only relevant for the calculation of standard errors, which are used for confidence intervals or statistical tests. I'm sure that many users include the full svyset adjustments even when they are not producing confidence intervals or test statistics, and it's ok to do that. In general, just as a matter of principle, I prefer simplicity over complexity and I don't like to include options that are not needed. I'll admit that's somewhat retro, in a world that is increasingly complex!



Previous Topic: Construction of a variable related to agricultural land
Next Topic: Merging BR to PR in NFHS-1 and NFHS-2
Goto Forum:
  


Current Time: Sat Apr 27 19:26:58 Coordinated Universal Time 2024