The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Countries » Ethiopia » Confirming the correctiness of mereging two datasets
Confirming the correctiness of mereging two datasets [message #25738] Fri, 02 December 2022 07:08 Go to next message
gebretsh@gmail.com is currently offline  gebretsh@gmail.com
Messages: 17
Registered: June 2022
Member
Dear DHS experts,
As usual, I would like to thank you for the much-needed assistances you provide to me on questions I pose related to DHS data.
I merged the 2000 KR file with the 2000 wealth index file in Stata. The observations in the KR file are 10873 and in the WI, they are 14072. Now, after I merged the two data, the result shows that all 10873 observations in the KR are matched with the WI observations. However, the unmatched observations from the WI file are 7095, far more than I normally expect (14072-10873=3199). I guess that this would happen since I use many-to-one merge, but I am not quite sure whether this is true, and I kindly bring it to your attention for confirmation.

Regards,
Re: Confirming the correctiness of mereging two datasets [message #25756 is a reply to message #25738] Mon, 05 December 2022 13:13 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3017
Registered: February 2013
Senior Member
Following is a response from Senior DHS staff member, Tom Pullum:

Your merge is ok. The WI file has one record for every household in the survey, and there were 7095 households that had no children under 5. Below I will give the Stata code for this merge, because it shows how to unpack whhid in the WI file. I use an older version of the merge command, which I prefer because it does not require spedifying 1:m, etc.

* Specify a workspace
cd e:\DHS\DHS_data\scratch

* Prepare the WI file
use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\ETWI41FL.DTA" 
describe hhid

* whhid is str12

forvalues li=1/12 {
gen col`li'=substr(whhid,`li',1)
}

list col* if _n<=20, table clean
tab1 col*

* It appears that hv001 is cols 7-9 and hv002 is cols 10-12
gen hv001=substr(whhid,7,3)
gen hv002=substr(whhid,10,3)

destring(hv001), generate(cluster)
destring(hv002), generate(hh)

sort cluster hh
save ETWItemp.dta, replace


* Prepare the KR file
use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\ETKR41FL.DTA" 
summarize v001 v002
* v001 and v002 have 1-3 columns 
gen cluster=v001
gen hh=v002

list cluster hh if _n<=20, table clean
sort cluster hh

* Do the merge
merge cluster hh using ETWItemp.dta
tab _merge

* _merge=2 for 7095 cases; these are households that have no children under 5; drop them

drop if _merge==2
drop _merge

Re: Confirming the correctiness of mereging two datasets [message #25760 is a reply to message #25756] Tue, 06 December 2022 04:21 Go to previous message
gebretsh@gmail.com is currently offline  gebretsh@gmail.com
Messages: 17
Registered: June 2022
Member
Thanks so much
Previous Topic: Timing of variables' collection
Next Topic: Appropriate handling of missing values in analysis
Goto Forum:
  


Current Time: Fri Mar 29 09:49:56 Coordinated Universal Time 2024