The DHS Program User Forum      
Discussions regarding The DHS Program data and results
Home » Countries » Ethiopia » Challenges and limitations of using big DHS data sets
Challenges and limitations of using big DHS data sets [message #13124] Sun, 24 September 2017 17:30 Go to next message
sko16 is currently offline  sko16
Messages: 12
Registered: June 2016
Location: Addis Ababa, Ethiopia
I am carrying the study primarily based on data from Ethiopian DHS 2000, 2005, 2011 and 2016 EDHS on women file recode. The unit of the study are women in reproductive age (15 - 49). I am planning to see further analysis on women's' maternal health care utilization at different levels (Individual Vs Community Level). Would you give me a clue on which the community Cluster level variables that exist in these Ethiopian DHS? Also, I need your professional explanation on Challenges and limitations of using big DHS data sets. Thank you so much in advance for sharing my concern.
Re: Challenges and limitations of using big DHS data sets [message #13133 is a reply to message #13124] Tue, 26 September 2017 10:53 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 1200
Registered: February 2013
Senior Member

Following is a response from Senior DHS Stata Specialist, Tom Pullum:

If you are using a good statistical package, the size of the files will not be a problem. The main difficulty will be learning how to use the package.

There are no cluster-level variables in the data files. You have to construct them as the means and percentages of individual-level variables for the women in each cluster. (The cluster id code is v001.) If you are using Stata, you can do this with some of the "egen" commands. I hope other forum users can offer specific suggestions.

Re: Challenges and limitations of using big DHS data sets [message #13159 is a reply to message #13124] Fri, 29 September 2017 17:48 Go to previous message
boyle014 is currently offline  boyle014
Messages: 15
Registered: December 2015
Location: Minneapolis
Dear sko16,

If you use IPUMS-DHS (at, you won't have any trouble with file size. IPUMS-DHS is original DHS data harmonized and integrated. It allows you to select just the Ethiopian samples (under SELECT SAMPLES) and just the variables of interest to you, and download them in a single data file. For most people, this file will be much smaller than even one IR file.

You will find the variables you need under TOPICS -> WOMEN & INFANT HEALTH.

In terms of community level variables, IPUMS-DHS makes it relatively easy to link the Ethiopian DHS with Ethiopian census data, which is available through IPUMS-International (

Here's how the linking works. Both IPUMS-DHS and IPUMS-International have a variable called DHS_IPUMSI_ET, which you can find under TOPICS -> GEOGRAPHY. Create a data extract in both projects and make sure they both include this variable. Then collapse the census data that interests you by DHS_IPUMSI_ET and merge the collapsed file with your IPUMS-DHS file.

This process will aggregate variables from the census at the level of DHS regions. Over the next year, we will be adding finer-grained environmental and social context variables directly to IPUMS-DHS using the GPS location of clusters.

Your DHS ID and password will work for IPUMS-DHS. You'll have to register for IPUMS-International, but approval is typically very straightforward.

Note that we're currently in the process of adding the Ethiopia 2016 DHS to IPUMS-DHS. We plan to release it by the end of the year.

Hope this helps. Feel free to post again if you have any questions.

Liz Boyle, University of Minnesota

Professor Elizabeth Boyle
Sociology & Law, University of Minnesota, USA
Principal Investigator, IPUMS-DHS
Previous Topic: 2016 data but year of interview is 2008?
Next Topic: Can't find the .doc file with country-specific recode information
Goto Forum:

Current Time: Wed Jan 24 06:32:27 Eastern Standard Time 2018