The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » Merging and appending data files (I would like to 1) append data files of different countries and survey waves, 2) merge hh characteristics and coordinates to individuals)
Re: Merging and appending data files [message #24903 is a reply to message #24902] Mon, 01 August 2022 12:03 Go to previous messageGo to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3199
Registered: February 2013
Senior Member
Following is another response from DHS Research & Data Analysis Director, Tom Pullum:

I'll add some suggestions but they may not answer all your questions.

In the HR file, there is one very wide line of data for each household, with household members identifiee with subscripts that range from 1 to 20. The HR file can be very efficient for a merge for strictly household-level variables, such as water, sanitation, or the length of the household interview. However, matching the line number in the IR file (v003=1, 2, 3, etc) with the line number in the HR file ( subscripts _01, _02, _03, etc) is just too much work. Maybe someone can do it, but I have never even tried! The "long" format of the PR file is simply much easier.

Merging and then appending, in that sequence, is simpler. If you append and then merge, you will have to match on a survey ID code, and the data files do not include a unique survey ID code. You may think that v000 is a survey identifier, but it is not. Two surveys conducted within the same phase of DHS (for example the current phase is 8) will have the same value of v000. Also if you append first you will have an extremely long file (lots of cases) and the data processing time will go way up. Merges in individual surveys are very fast.

There are a few old surveys in which v001 is missing but in those surveys it is given by v021. In almost all surveys, both v001 and v021 are included and are equal.

The following will tell you how to "unpack" the columns of caseid (or hhid).

* Open an IR file and enter this:

describe caseid

* this will tell you the string length, for example 12. Then:

forvalues li=1/12 {
gen col_`li'=substr(caseid,`li',1)
}

list col* v001 v002 v003 if _n<=50, table clean

Good luck!
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Child nutrition and spousal violence
Next Topic: How to link child health outcome with mothers' characteristics
Goto Forum:
  


Current Time: Mon Nov 25 10:30:31 Coordinated Universal Time 2024