Home » Data » Merging data files » Merging Egypt 2000 KR and HW files
Merging Egypt 2000 KR and HW files [message #3583] |
Tue, 13 January 2015 15:06 |
rejone4@emory.edu
Messages: 8 Registered: November 2014 Location: United States
|
Member |
|
|
Hello,
I am trying to merge the KR and HW files from Egypt's DHS 2000 files. I have used this code for other years with the same variables and have had no problems. However
when I run the code below for these two files the code runs but then there are no observations.
*Create the household ID by dropping the 3 d character line number from caseid
gen hwhhid = substr(caseid,1,length(caseid)-3)
*make a copy of the household line number
clonevar hwline = b16
*Sort on the household ID and the line number from the household schedule
sort hwhhid hwline
*merge the data
merge m:1 hwhhid hwline using "H:\Thesis\Egypt_2008\Egypt_2000(IV)\EGHW01FL.dta"
|
|
|
|
|
Re: Merging Egypt 2000 KR and HW files [message #11541 is a reply to message #11523] |
Mon, 09 January 2017 21:43 |
Bridgette-DHS
Messages: 3189 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Stata Specialist, Tom Pullum:
Quote:This merge is non-standard because caseid in the KR file for this survey is left-adjusted and has no leading blanks. You have to move the trailing blanks to the beginning, and remove the woman's line number in order to get a hhid that will match with the HW file. This was a mistake but unfortunately it comes up in a few other surveys too. To figure this out I had to do quite a bit of column-by-column deciphering of caseid. The following lines will work. You have to change the paths. I am using an older version of the merge command, which I much prefer to the current version.
* How to merge the KR and HW files from the Egypt 2000 DHS survey
set more off
* Prepare the KR file
use e:\DHS\DHS_data\KR_files\EGKR42FL.dta, clear
* caseid is left-adjusted with no leading blanks, but has trailing blanks;
* caseid is str15, but longest is str13. It should be hhid plus 3 cols for v003
* I must convert it to be right-adjusted and without the woman's line number
use e:\DHS\DHS_data\KR_files\EGKR42FL.dta, clear
codebook caseid
recast str13 caseid
gen length=strlen(caseid)
tab length
* remove the last three non-missing columns of caseid, which are the line number of the woman, to get hhid
gen str2 blank2=" "
gen str3 blank3=" "
gen str4 blank4=" "
gen str12 caseidrev=" "
replace caseidrev=blank4+substr(caseid,1,8) if length==11
replace caseidrev=blank3+substr(caseid,1,9) if length==12
replace caseidrev=blank2+substr(caseid,1,10) if length==13
rename caseidrev hhid
rename b16 hvidx
sort hhid hvidx
save e:\DHS\scratch\temp.dta, replace
* Prepare the HW file and do the merge
use e:\DHS\DHS_data\HW_files\EGHW41FL.dta, clear
* hwhhid is right adjusted with no trailing blanks, but has leading blanks, as is normal
* hwhhid is str12
rename hwhhid hhid
rename hwline hvidx
sort hhid hvidx
merge hhid hvidx using e:\DHS\scratch\temp.dta
tab _merge
* then keep if _merge==3, drop _merge, and save the file
|
|
|
|
Re: Merging Egypt 2000 KR and HW files [message #20168 is a reply to message #20145] |
Mon, 05 October 2020 07:00 |
Bridgette-DHS
Messages: 3189 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Stata Specialist, Tom Pullum:
The tricky thing about these merges is that the HW files have the household id as a string. With "describe" you can find that hwhhid is a 12-character string. You then need to extract v001 and v002 from hwhhid. Below I show a trick to help find out which columns are v001 and which are v002. You use destring and substring. I show how to do it with the 2000 survey but the alignment is the same in the 2005 survey.
use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\EGHW41FL.DTA"
rename hwline b16
local li=1
while `li'<=12 {
gen col`li'=substr(hwhhid,`li',1)
local li=`li'+1
}
list hwhhid col* if _n<=20, table clean
drop col*
gen v001=substr(hwhhid,3,7)
gen v002=substr(hwhhid,10,3)
destring v001 v002, replace
sort v001 v002 b16
save e:\DHS\DHS_data\scratch\EGtemp.dta, replace
use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\EGKR42FL.DTA"
drop if b16==0 | b16==.
sort v001 v002 b16
merge v001 v002 b16 using e:\DHS\DHS_data\scratch\EGtemp.dta
tab _merge
|
|
|
|
|
Re: Merging Egypt 2000 KR and HW files [message #20198 is a reply to message #20172] |
Wed, 07 October 2020 07:51 |
Bridgette-DHS
Messages: 3189 Registered: February 2013
|
Senior Member |
|
|
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:
DHS has some naming rules that can be confusing. In the PR file, some variables (such as anthropometry) have the prefix hc for children, ha for women, and (if included) hb for men. In the IR, KR, and BR files the prefix for children is hw. If you find hc70 in the PR file and hw70 in the KR file, they are the same variable, with the same values for the same child. The only difference is that there will be some children in the PR file who are not in the KR file (because their mother was not in the household), so there are a few more cases with hc70 than with hw70. The point is that it's the same variable, just in a different file. So when you merge the HW file with the PR file, the prefix should be hc, and when you merge with the KR file it should be hw. But it's a minor matter, just a naming convention.
|
|
|
|
Goto Forum:
Current Time: Thu Nov 7 16:41:14 Coordinated Universal Time 2024
|