The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » Merging KR file with PR file India DHS 2019-21 (NFHS-5) (Merging KR file with PR file India DHS 2019-21 (NFHS-5))
Re: Merging KR file with PR file India DHS 2019-21 (NFHS-5) [message #27435 is a reply to message #27392] Mon, 14 August 2023 10:55 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3189
Registered: February 2013
Senior Member
Following is a response from Senior DHS staff member, Tom Pullum:

We apologize for the delay in this response, due to travel. I have slightly revised your code, as follows:

* specify workspace
cd e:\DHS\DHS_data\scratch

use "...IAPR7EFL.DTA", clear 
keep hv001 hv002 hvidx hv024
gen cluster=hv001
gen hh=hv002
gen chline=hvidx
gen state=hv024
sort state cluster hh chline
save IAPR_temp.dta, replace 


use "...IAKR7EFL.DTA", clear 
keep v001 v002 b16 v024
keep if b16>0 & b16<.
gen cluster=v001
gen hh=v002
gen chline=b16
gen state=v024
sort state cluster hh chline

* At this point a merge will fail, because of a few duplicated values of b16 in the KR file. 
* Identify the duplicates.
egen repeated_b16=seq(),by(cluster hh chline)
tab repeated_b16

* There are 7 households in which the same value of b16 appears twice
* Remove the second case in duplicates. Note: it is possible that in some households it is the
*   first, not the second, that should be removed. Values of b16 could be edited.
list if repeated_b16>1, table clean
drop if repeated_b16>1
drop repeated_b16

* sort again
sort state cluster hh chline
* Note that this is a 1:1 merge
merge 1:1 state cluster hh chline using IAPR_temp.dta
tab _merge

* There is one case with _merge=1. This is a child in the KR file with a valid value of b16 but
*  the value does not appear in the PR file. Should not happen; drop this case but could edit b16.
list if _merge==1, table clean
drop if _merge==1

There are 3 issues. First, the KR file includes 7 households in which the same value of b16 is repeated for different children. I simply drop the repeats. Ideally, someone could probably figure out a better solution for those 7 cases but there are nearly 222,000 children and the effect is negligible. Second, the KR file includes one child who has a valid value of b16 but that line number is not found in the household in the PR file. Again, I just drop the case but there are other options. Third, you do this as a m:m merge but it should be 1:1. The children with _merge=3 will be matched in the two files.

[Updated on: Mon, 14 August 2023 10:57]

Report message to a moderator

 
Read Message
Read Message
Previous Topic: Merging IR and HR files for Bangladesh
Next Topic: Merging children, mother, and father's characteristics
Goto Forum:
  


Current Time: Thu Nov 7 21:46:05 Coordinated Universal Time 2024