Home » Data » Merging data files » Issue merging NFHS 2 women and household dataset
Issue merging NFHS 2 women and household dataset [message #13962] |
Tue, 30 January 2018 03:45 |
Mrinal
Messages: 14 Registered: January 2018 Location: Bhubaneswar, India
|
Member |
|
|
I am trying to merge the women and the household dataset of NFHS 2. However, after merging the results are not making any sense!! NFHS 2 women dataset has 90,303 observations and household dataset contains 92,486 observations but after merging the merged file shows 104,979 observation, I am trying to understand the result, is it correct or the merging has some issue?? I am also getting an error saying
Quote:variables v001 v002 do not uniquely identify observations in the master data
variables v001 v002 do not uniquely identify observations in D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA
Complete result of merging and codes used to merge is given below for reference.
"_merge" result
tab _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
2 | 5,722 5.45 5.45
3 | 99,257 94.55 100.00
------------+-----------------------------------
Total | 104,979 100.00
Merging codes
**Merging household on women dataset**
**Round 2**
use "D:\Desktop\dhs\data\nfhs\2\IAHR42FL.DTA", clear
gen int v001 = hv001
gen int v002 = hv002
sort v001 v002
save "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA", replace
use "D:\Desktop\dhs\data\nfhs\2\IAIR42FL.DTA", clear
sort v001 v002
merge v001 v002 using "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA"
save "D:\Desktop\dhs\data\nfhs\2\IA_HR_IR_42FL.DTA", replace
Thanks
Mrinal
|
|
|
|
|
Re: Issue merging NFHS 2 women and household dataset [message #13992 is a reply to message #13973] |
Thu, 01 February 2018 07:13 |
Bridgette-DHS
Messages: 3199 Registered: February 2013
|
Senior Member |
|
|
Another response from Tom Pullum:
The lines I gave yesterday will work for almost every DHS survey. However, I see that for this survey of India the id code includes another number. I saw this by comparing hhid in the PR file with caseid in the IR file. If you check you will find that hhid is a 12-character string and caseid is a 15-character string consisting of hhid in columns 1-12 and v003 in columns 13-15. I answered a related question on the forum yesterday regarding the Mali 2006 survey. In the case of Mali 2006 and some other surveys in West Africa, there is a sub-household code embedded in hhid and caseid. For this India survey, the extra code is the state, not the sub-household, but the strategy is basically the same--that is, to match caseid in the IR file with hhid hvidx in the PR file.
As an alternative, since state is given by v024, you could match v024 v001 v002 v003 in the IR file with hv024 hv001 hv002 hvidx in the PR file.
set more off
* Prepare IR file for merge
use e:\DHS\DHS_data\IR_files\IAIR42FL.dta, clear
gen hhid=substr(caseid,1,12)
gen hvidx=v003
sort hhid hvidx
save e:\DHS\DHS_data\scratch\IAIRtemp.dta, replace
* Prepare PR file for merge
use e:\DHS\DHS_data\PR_files\IAPR42FL.dta, clear
sort hhid hvidx
* Merge IR with PR
merge hhid hvidx using e:\DHS\DHS_data\scratch\IAIRtemp.dta
tab _merge
|
|
|
|
|
Goto Forum:
Current Time: Thu Nov 28 21:37:59 Coordinated Universal Time 2024
|