The DHS Program User Forum: Merging data files » Merging and appending data files

Home » Data » Merging data files » Merging and appending data files (I would like to 1) append data files of different countries and survey waves, 2) merge hh characteristics and coordinates to individuals)

Show: Today's Messages :: Show Polls :: Message Navigator

Merging and appending data files [message #24845]

Wed, 20 July 2022 04:48

nora-dhs
Messages: 2
Registered: July 2022

Member

Hello, I am working with the DHS data and have some problems with merging different data files. This is what I want to do:

1) append data files of different survey waves and countries, e.g., append all HR data files for the African continent over the period 1999-2019 to one single data file called appended_HR. I then want to do the same with the other data types. I think I managed the first step and end up with four data files called appended_HR, appended_IR, appended_MR, appended_KR.

2) merge household characteristics and coordinates to the individuals. To this end, I want to merge appended_HR to the other appended files. To this end, I need unique identifiers. Here, I struggle. I noticed that some identifiers seem incorrectly coded (e.g., v001, v002, v003 are missing or do not correspond to mcaseid/caseid). I tried to solve these inconsistencies, but my approach does not work:

appended_HR:

duplicates tag v007 v000 v001 v002, gen(duple)

gen lhhid = strlen(hhid) // should be 12-character string
	
drop if duple != 0 & lhhid != 12 // drop if it's a duplicate and hhid is not of correct length
	
gen helpvar_v002 = substr(hhid,8,3) if duple != 0
destring helpvar_v002, gen(helpvar_v002num)
replace v002 = helpvar_v002num if duple != 0
drop helpvar_v002 helpvar_v002num duple

appended_IR, etc.:

duplicates tag v007 v000 v001 v002 v003, gen(duple)

gen lcaseid = strlen(casein) // should be 15-character string
	
drop if duple != 0 & lhhid != 15 // drop if it's a duplicate and caseid is not of correct length
	
gen helpvar_v002 = substr(caseid,8,3) // does not work, sometimes on another position
destring helpvar_v002, gen(helpvar_v002num) // does not work, Stata says: "contains nonnumeric characters; no generate"
replace v002 = helpvar_v002num if duple != 0
drop helpvar_v002 helpvar_v002num 
	
gen helpvar_v003 = substr(caseid,11,2) // same here
destring helpvar_v003, gen(helpvar_v003num) // same here
replace v003 = helpvar_v003num if duple != 0
drop helpvar_v003 helpvar_v003num

Does anyone know how the correct approach would be?

Also, can you tell me what parts the caseid consists of in the below example? What does the "1" between "12" and "3" mean?

caseid .....12..1.3..4
v001 12
v002 3
v003 4

Thank you very much for your help!!
All the best,  Nora

Report message to a moderator

[Message index]

		Merging and appending data files By: nora-dhs on Wed, 20 July 2022 04:48
		Re: Merging and appending data files By: Janet-DHS on Thu, 21 July 2022 12:28
		Re: Merging and appending data files By: nora-dhs on Mon, 01 August 2022 08:40
		Re: Merging and appending data files By: Bridgette-DHS on Mon, 01 August 2022 12:03
		Re: Merging and appending data files By: kiran on Mon, 15 January 2024 13:09
		Re: Merging and appending data files By: Bridgette-DHS on Tue, 16 January 2024 12:12

Previous Topic:	Child nutrition and spousal violence
Next Topic:	How to link child health outcome with mothers' characteristics

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Thu Dec 4 06:31:11 Coordinated Universal Time 2025