The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Sampling » Pooling male and female files
Pooling male and female files [message #13158] Fri, 29 September 2017 15:29 Go to next message
claurti is currently offline  claurti
Messages: 2
Registered: September 2017
I'm working with 2013 Nigeria DHS data.

I would like to pool the male and female files. I have already prepared each file individually (e.g., renamed variables) and appended the male to female file.

However, I need help with the following:

1. When I run a simple tabulation on gender, 69% of the sample is female. I've applied the sampling weights, so I was expecting this would be more of a 50% / 50% split, reflecting the population. Is another step needed?

2. In addition to the PSU, I need to adjust for the household stage of sampling. What is an elegant way in Stata to identify the household level?

Many thanks for all your help. I'm a new DHS user, so appreciate any guidance you can provide.

Re: Pooling male and female files [message #13172 is a reply to message #13158] Sun, 01 October 2017 01:12 Go to previous messageGo to next message
schoumaker is currently offline  schoumaker
Messages: 58
Registered: May 2013
Location: Belgium
Senior Member
The sampling rate is usually smaller among men (e.g. one out of 2 eligible men).
The weights are computed to adjust for differences in sampling rate within a sample (e.g. men) -> their sum is equal to the sample size.
What I would do:
- If the sampling rate of men is half the sampling rate of women, you divide each man's weight (mv005) by 2.
- you then divide each weight by the average weight (in the full sample) to normalize them (not necessary in Stata, but necessary in other software packages).

Bruno Schoumaker
Centre for Demographic Research
Université catholique de Louvain
Re: Pooling male and female files [message #13190 is a reply to message #13172] Mon, 02 October 2017 14:40 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 2635
Registered: February 2013
Senior Member

Following is a response from Senior DHS Stata Specialist, Tom Pullum:

Regarding your second question, we do not have the sampling fractions or weights for households within clusters. The weight we provide is for the combination of sampling clusters and sampling households within clusters. There have been several earlier posts on this.

Regarding the first question, if you open the PR file and enter "tab1 hv117 hv118" you will see that this survey subsampled only half of the men. When you combine with women, you need to approximately double the weight for the men.

The following routine will calculate the correct weight for the men relative to the women

use e:\DHS\DHS_data\PR_files\NGPR6AFL.dta, clear

* Reduce the PR file to men who are eligible by age and are de facto
keep if hv105>=15 & hv105<=49
keep if hv103==1
keep if hv104==1

* Total hh weight for men who are eligible by age and are de facto
summarize hv005
scalar W=r(sum)

* Total hh weight for men who are eligible by age and are de facto and are subsampled
summarize hv005 if hv104==1 & hv118==1
scalar W1=r(sum)

* Calculate the ratio 
scalar factor=W/W1
scalar list factor

use e:\DHS\DHS_data\MR_files\NGMR6AFL.dta, clear

* Multiply mv005 by the ratio and round to an integer
gen mv005_rewtd=round(mv005*factor)

* Then append the IR and MR files and use mv005_rewtd as the weight for men

It would be slightly better to calculate separate factors within each stratum. You can re-normalize so that the mean weight in the file of women and men is 1000000, but if you will only be using pweights, you can skip that step because Stata (with pweight) always normalizes the weights to have a mean of 1.

Previous Topic: Multilevel modelling
Next Topic: how to set svyset with three-stage sampling data (Nigeria 2013)
Goto Forum:

Current Time: Sat Dec 3 21:58:48 Coordinated Universal Time 2022