The DHS Program User Forum      
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » Issue merging NFHS 2 women and household dataset
Issue merging NFHS 2 women and household dataset [message #13962] Tue, 30 January 2018 03:45 Go to next message
Mrinal is currently offline  Mrinal
Messages: 11
Registered: January 2018
Location: Bhubaneswar, India
Member
I am trying to merge the women and the household dataset of NFHS 2. However, after merging the results are not making any sense!! NFHS 2 women dataset has 90,303 observations and household dataset contains 92,486 observations but after merging the merged file shows 104,979 observation, I am trying to understand the result, is it correct or the merging has some issue?? I am also getting an error saying
Quote:
variables v001 v002 do not uniquely identify observations in the master data
variables v001 v002 do not uniquely identify observations in D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA


Complete result of merging and codes used to merge is given below for reference.

"_merge" result

tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          2 |      5,722        5.45        5.45
          3 |     99,257       94.55      100.00
------------+-----------------------------------
      Total |    104,979      100.00



Merging codes
**Merging household on women dataset**
	**Round 2**
	use "D:\Desktop\dhs\data\nfhs\2\IAHR42FL.DTA", clear
	gen int v001 = hv001
	gen int v002 = hv002
	sort v001 v002
	save "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA", replace
	
	use "D:\Desktop\dhs\data\nfhs\2\IAIR42FL.DTA", clear
	sort v001 v002 
	merge v001 v002 using "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA"
    
	save "D:\Desktop\dhs\data\nfhs\2\IA_HR_IR_42FL.DTA", replace


Thanks
Mrinal
Re: Issue merging NFHS 2 women and household dataset [message #13965 is a reply to message #13962] Tue, 30 January 2018 08:42 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 1437
Registered: February 2013
Senior Member
Another response from Tom Pullum:

You should not use the HR file for this merge. The HR file has one record for each household. Instead, use the PR file, which has one record for every person in the household. The following lines will do the merge. In the output file, the cases with _merge=3 are the women in the IR file. The cases with _merge=1 are the other people in the PR file, that is, the household members other than the women in the IR file.

set more off
* Prepare IR file for merge
use e:\DHS\DHS_data\IR_files\IAIR42FL.dta, clear 
gen hv001=v001
gen hv002=v002
gen hvidx=v003
sort hv001 hv002 hvidx
save e:\DHS\DHS_data\scratch\IAIRtemp.dta, replace


* Prepare PR file for merge
use e:\DHS\DHS_data\PR_files\IAPR42FL.dta, clear
sort hv001 hv002 hvidx

* Merge IR with PR
merge hv001 hv002 hvidx using  e:\DHS\DHS_data\scratch\IAIRtemp.dta
tab _merge

Re: Issue merging NFHS 2 women and household dataset [message #13973 is a reply to message #13965] Wed, 31 January 2018 02:38 Go to previous messageGo to next message
Mrinal is currently offline  Mrinal
Messages: 11
Registered: January 2018
Location: Bhubaneswar, India
Member
Thanks, both of you again!!

But I am really puzzled by the results.

IR file has 90,303 observations
PR file has 517,379 observations

However, in the merged file its showing 143,944 observations for _merge=3 which as you aforementioned are women in IR file.

Whats going on?

One more thing, can I use IR file as the base and merge only the matching observations form PR file?

Regards,
Mrinal
Re: Issue merging NFHS 2 women and household dataset [message #13992 is a reply to message #13973] Thu, 01 February 2018 07:13 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 1437
Registered: February 2013
Senior Member

Another response from Tom Pullum:

The lines I gave yesterday will work for almost every DHS survey. However, I see that for this survey of India the id code includes another number. I saw this by comparing hhid in the PR file with caseid in the IR file. If you check you will find that hhid is a 12-character string and caseid is a 15-character string consisting of hhid in columns 1-12 and v003 in columns 13-15. I answered a related question on the forum yesterday regarding the Mali 2006 survey. In the case of Mali 2006 and some other surveys in West Africa, there is a sub-household code embedded in hhid and caseid. For this India survey, the extra code is the state, not the sub-household, but the strategy is basically the same--that is, to match caseid in the IR file with hhid hvidx in the PR file.

As an alternative, since state is given by v024, you could match v024 v001 v002 v003 in the IR file with hv024 hv001 hv002 hvidx in the PR file.


set more off
* Prepare IR file for merge
use e:\DHS\DHS_data\IR_files\IAIR42FL.dta, clear 
gen hhid=substr(caseid,1,12)
gen hvidx=v003
sort hhid hvidx
save e:\DHS\DHS_data\scratch\IAIRtemp.dta, replace


* Prepare PR file for merge
use e:\DHS\DHS_data\PR_files\IAPR42FL.dta, clear
sort hhid hvidx

* Merge IR with PR
merge hhid hvidx using  e:\DHS\DHS_data\scratch\IAIRtemp.dta
tab _merge

Previous Topic: NFHS 4 Data (Merging of PR and KR data file)
Next Topic: Merging children to their mothers
Goto Forum:
  


Current Time: Sun Aug 19 07:39:28 Eastern Daylight Time 2018