The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Countries » India » Clubbing individual recode and mens recode file to calculate overall prevalence
Re: Clubbing individual recode and mens recode file to calculate overall prevalence [message #13904 is a reply to message #13853] Tue, 23 January 2018 07:13 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3199
Registered: February 2013
Senior Member
Following is a response from Senior DHS Stata Specialist, Tom Pullum:

There are a couple of issues. First, your measure of tobacco use is defective. There is a binary variable in the IR file (v463z) and in the MR file (mv463z) that takes the value 1 for "no tobacco use" and 0 otherwise. Therefore "anytobaccouse" should be defined as 1-v463z or 1-mv463z. Second, this is one of several surveys with only a subsample of men. Fifteen percent of the households have hv027=1, meaning "household selected for male interview". The other households have hv027=0. If the household was selected, then all men in the household (who satisfied other eligibility requirements such as age) would be interviewed. If hv027=0, no men would be interviewed. It can happen that a household has hv027=1, but there were no eligible men in the household. For that reason, you cannot identify all of the households with hv027=1 by just looking at the MR file. The HR and PR files are the only files that include hv027.

There are at least four ways to estimate the combined percentage of men+women who smoke. The first way is to restrict to men and women in the households with hv027=1. That requires merging the IR and MR files with the PR file and selecting the men and women who are in such households. Using weights, I then get anytobaccouse percentages of 6.8% for women, 45.5% for men, and 25.4% for women and men combined. This is a good estimate but it ignores most of the women, who were in households with hv027=0.

A second way is to multiply the weight for men in the households with hv027 by 6.67 or 1/.15, because their probability of selection was one-sixth 15% as high as that for women. I then get percentages of 6.8% for women, 45.5% for men, and 26.8% for women and men combined.

I will paste below the Stata code for the first two approaches.

A third way is to inflate the weights for men by a "post-stratification" factor rather than by a simple factor of 1/.15. We do not advise this because of the complexity of the sampling design for this survey.

Finally, a fourth way to approach this would be to calculate the percentage for women, using all the women, calculate the percentage for men, using all the men, and then estimate the pooled mean with a calculator or spreadsheet. You could use census data to estimate the numbers of men and women in the population in the specified age interval. If a fraction f of the population is female and a fraction m is male (f+m=1), and Pf is the percentage of women who use tobacco and Pm is the percentage of men who use tobacco, then calculate f*(Pf) + m*(Pm). That will be a good estimate of the pooled mean for adults (men and women combined) in the age interval.

These procedures would apply to any outcomes that are obtained from both men and women and surveys that involve a subsample of men. The question is actually very general.


* Prepare IR file for merge
use e:\DHS\DHS_data\IR_files\IAIR71FL.dta, clear 
gen anytobaccouse_women=1-v463z
keep v001 v002 v003 v005 any
gen hv001=v001
gen hv002=v002
gen hvidx=v003
sort hv001 hv002 hvidx
save e:\DHS\DHS_data\scratch\IAIRtemp.dta, replace

* Prepare MR file for merge
use e:\DHS\DHS_data\MR_files\IAMR71FL.dta, clear 
gen anytobaccouse_men=1-mv463z
keep mv001 mv002 mv003 mv005 any
gen hv001=mv001
gen hv002=mv002
gen hvidx=mv003
sort hv001 hv002 hvidx
save e:\DHS\DHS_data\scratch\IAMRtemp.dta, replace

* Prepare PR file for merge
use e:\DHS\DHS_data\PR_files\IAPR71FL.dta, clear
* hv027: household selected for male interview
keep hv001 hv002 hvidx hv005 hv104 hv027
sort hv001 hv002 hvidx

* Merge IR with PR
merge hv001 hv002 hvidx using  e:\DHS\DHS_data\scratch\IAIRtemp.dta
drop _merge
sort hv001 hv002 hvidx

* Merge MR with IR+PR
merge hv001 hv002 hvidx using  e:\DHS\DHS_data\scratch\IAMRtemp.dta
drop _merge

* This file is MR+IR+PR
gen anytobaccouse=. replace weight_adjusted= 6*weight if hv104==1 & hv027==1
summarize any* [iweight=weight_adjusted/1000000] 
* The estimates are 6.8% (women), 45.5% (men), 25.8% (women+men)

replace anytobaccouse=anytobaccouse_men   if hv104==1
replace anytobaccouse=anytobaccouse_women if hv104==2

gen weight=.
replace weight=mv005 if hv104==1
replace weight= v005 if hv104==2

* Calculate estimate for women and men combined using only the cases with hv027=1 
summarize any* [iweight=weight/1000000] if hv027==1
* The estimates are 6.8% (women), 45.5% (men), 25.4% (women+men) if limited to the households with hv027=1

* Calculate estimate for women and men combined using all cases but re-weighting the men
summarize any* [iweight=weight/1000000]
gen weight_adjusted=weight
replace weight_adjusted= (1/.15)*weight if hv104==1 & hv027==1
summarize any* [iweight=weight_adjusted/1000000] 
* The estimates are 6.8% (women), 45.5% (men), 26.8% (women+men)

 
Read Message
Read Message
Previous Topic: NFHS 4
Next Topic: State codes
Goto Forum:
  


Current Time: Wed Nov 27 18:32:54 Coordinated Universal Time 2024