The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » Merging and appending data files (I would like to 1) append data files of different countries and survey waves, 2) merge hh characteristics and coordinates to individuals)
Merging and appending data files [message #24845] Wed, 20 July 2022 04:48 Go to previous message
nora-dhs is currently offline  nora-dhs
Messages: 2
Registered: July 2022
Member
Hello, I am working with the DHS data and have some problems with merging different data files. This is what I want to do:

1) append data files of different survey waves and countries, e.g., append all HR data files for the African continent over the period 1999-2019 to one single data file called appended_HR. I then want to do the same with the other data types. I think I managed the first step and end up with four data files called appended_HR, appended_IR, appended_MR, appended_KR.

2) merge household characteristics and coordinates to the individuals. To this end, I want to merge appended_HR to the other appended files. To this end, I need unique identifiers. Here, I struggle. I noticed that some identifiers seem incorrectly coded (e.g., v001, v002, v003 are missing or do not correspond to mcaseid/caseid). I tried to solve these inconsistencies, but my approach does not work:

appended_HR:
duplicates tag v007 v000 v001 v002, gen(duple)

gen lhhid = strlen(hhid) // should be 12-character string
	
drop if duple != 0 & lhhid != 12 // drop if it's a duplicate and hhid is not of correct length
	
gen helpvar_v002 = substr(hhid,8,3) if duple != 0
destring helpvar_v002, gen(helpvar_v002num)
replace v002 = helpvar_v002num if duple != 0
drop helpvar_v002 helpvar_v002num duple

appended_IR, etc.:
duplicates tag v007 v000 v001 v002 v003, gen(duple)

gen lcaseid = strlen(casein) // should be 15-character string
	
drop if duple != 0 & lhhid != 15 // drop if it's a duplicate and caseid is not of correct length
	
gen helpvar_v002 = substr(caseid,8,3) // does not work, sometimes on another position
destring helpvar_v002, gen(helpvar_v002num) // does not work, Stata says: "contains nonnumeric characters; no generate"
replace v002 = helpvar_v002num if duple != 0
drop helpvar_v002 helpvar_v002num 
	
gen helpvar_v003 = substr(caseid,11,2) // same here
destring helpvar_v003, gen(helpvar_v003num) // same here
replace v003 = helpvar_v003num if duple != 0
drop helpvar_v003 helpvar_v003num

Does anyone know how the correct approach would be?

Also, can you tell me what parts the caseid consists of in the below example? What does the "1" between "12" and "3" mean?
caseid .....12..1.3..4
v001 12
v002 3
v003 4

Thank you very much for your help!!
All the best, 
Nora
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Child nutrition and spousal violence
Next Topic: How to link child health outcome with mothers' characteristics
Goto Forum:
  


Current Time: Mon Nov 25 10:34:06 Coordinated Universal Time 2024