The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Topics » General » Understanding dataset structure
Understanding dataset structure [message #13515] Mon, 13 November 2017 09:44 Go to next message
popanalyst is currently offline  popanalyst
Messages: 2
Registered: November 2017
Hi, I'm working with the Malawi 2015-2016 DHS data to identify out-of-school adolescent girls and young women aged 15-24 who have an HIV test result. I am a bit confused about all the different datasets and wondering from where I should pull observations. Currently, I have tried to pull from the individual recode and couples recode, and match them to their HIV status. I'm wondering if I should pull from the household roster, or if I have already counted those by using the individual dataset? It looks like I will need to pull current schooling status from the household dataset as well, and I'm wondering how to connect people in the roster to the other datasets, if possible. Thanks!
Re: Understanding dataset structure [message #13516 is a reply to message #13515] Mon, 13 November 2017 10:06 Go to previous messageGo to next message
Messages: 1516
Registered: February 2013
Senior Member
Dear User,
If you are merging datasets, please refer to, if you want a description of the various datasets and how you can use to analyze, please refer to We also recommend The Guide to DHS Statistics https:// onnaires-and-manuals.cfm and The Standard Recode Manual https:// ires-and-manuals.cfm. We also have various YouTube videos available to assist users.
DHS Dataset Types in 60 Seconds:
Introduction to DHS Datasets:
Introduction to DHS Data Structure:
De Jure and De Facto:

After reviewing these resources, if you still have questions, please feel free to post again.
Thank you!
Re: Understanding dataset structure [message #14577 is a reply to message #13515] Sat, 21 April 2018 22:14 Go to previous message
kingx025 is currently offline  kingx025
Messages: 95
Registered: August 2016
Location: Minneapolis. Minnesota
Senior Member
The set of households covered by the household member (PR) files are randomly selected from within primary sampling areas, regardless of whether then include a woman of childbearing age or not. Women of childbearing age within those households (which is most commonly defined as women age 15-49) then receive the long individual women's questionnaire and go into the IR (women's) and couples files. The young women age 15-24 from the household roster are already included in the IR file, so I'm afraid you would be double counting them if you used the same women from the PR file without merging between the file types. Though I don't have personal experience working with the couples' file, I believe the women in the couples file are a subset of the same women in the IR file.

It may help to think of how the data are collected, in terms of the survey forms. Usually there is just a household survey form (including the roster of household members), a woman's survey form (which also collects information for the children and birth recode files), and maybe a men's file. There is no separate survey for couples, and the women who get the individual women's form are drawn from the larger group of randomly selected households with their household rosters. While it's easy to think of the separate DHS files as having their own separate existence, they are really rearrangements of material collected via the 3 forms of household, women, and men.

We hope to eventually do such merging of IR and PR files within the IPUMS-DHS project, but so far we have only linked across household, women (IR), child (KR), and birth (BR) files. I wonder if you might roughly approximate the education data you need using the data on total years of schooling or highest level of schooling for women in your age group of interest (V106, V133) in the IR files. If you need to link between the IR file and the PR file for the school attendance data, I assume you would use the household id number and the person's line number (V003 in the IR file) and HVIDX (the person's line number in the household).

Good luck!

Miriam King

Dr. Miriam King
IPUMS-DHS Project Manager (
Previous Topic: Missing data in Kenya 2014 on decision making
Next Topic: Accessing Consanguinity Module
Goto Forum:

Current Time: Sat Oct 1 23:17:08 Coordinated Universal Time 2022