The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » STATA codes to merge women and household datasets
STATA codes to merge women and household datasets [message #13937] Fri, 26 January 2018 11:53 Go to next message
Mrinal is currently offline  Mrinal
Messages: 14
Registered: January 2018
Location: Bhubaneswar, India
Member
I am working with NFHS datasets and need to merge women and household datasets using STATA, which I hardly use. May I please have the Stata codes to merge aforementioned datasets of all four rounds of NFHS. I will certainly appreciate it.

Thanks and regards,
Mrinal
Re: STATA codes to merge women and household datasets [message #13957 is a reply to message #13937] Mon, 29 January 2018 20:05 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3013
Registered: February 2013
Senior Member
Following is a response from Senior DHS Stata Specialist, Tom Pullum:

If you want to combine, say, the household (PR) files from the successive surveys, you should use the "append" command. This is distinct from a "merge", in which, say, the KR records and PR records from a single survey could be combined child by child.

Some variables will have different codes and categories in different surveys. For example, v023 may not be defined the same way in every survey. That must be taken into account. When you use the append command, the variable names and labels from the last survey in the append command will over-ride any previous names and labels.

The main reason for appending files is for convenience of file manipulation. In the case of the India surveys, all the files are very large and after appending they will be enormous--and slow to work with. I would recommend trimming the files and just carrying along the variables you need for analysis.

Issues related to appending have been discussed in other forum postings.
Re: STATA codes to merge women and household datasets [message #13961 is a reply to message #13957] Tue, 30 January 2018 03:28 Go to previous messageGo to next message
Mrinal is currently offline  Mrinal
Messages: 14
Registered: January 2018
Location: Bhubaneswar, India
Member
Thank you, Bridgette and Pullum. Actually, I was more interested in merging codes for stata package than appending. However, I managed to construct the merging code for NFHS-2 and is given below.

**Merging household on women dataset**
	**Round 2**
	use "D:\Desktop\dhs\data\nfhs\2\IAHR42FL.DTA", clear
	gen int v001 = hv001
	gen int v002 = hv002
	gen byte v003 = hv003
	sort v001 v002 v003
	save "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA", replace
	
	use "D:\Desktop\dhs\data\nfhs\2\IAIR42FL.DTA", clear
	sort v001 v002 v003
	merge v001 v002 v003 using "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA"
    
	save "D:\Desktop\dhs\data\nfhs\2\IA_HR_IR_42FL.DTA", replace


Thanks and regards,
Mrinal
Re: STATA codes to merge women and household datasets [message #13996 is a reply to message #13961] Thu, 01 February 2018 11:14 Go to previous messageGo to next message
boyle014 is currently offline  boyle014
Messages: 78
Registered: December 2015
Location: Minneapolis
Senior Member
Mrnal,

You should consider using IPUMS-DHS, which has already harmonized all of the variables across the surveys. You select the samples you want--all of the Indian ones it sounds like--and then the variables you want. You don't have to download separate files and merge or append them. You can download a single file with multiple surveys. The latest Indian sample is being uploaded into the system now. It will be available in March. The other three Indian samples are already there.

Liz Boyle



Professor Elizabeth Boyle
Sociology & Law, University of Minnesota, USA
Principal Investigator, IPUMS-DHS
Re: STATA codes to merge women and household datasets [message #14029 is a reply to message #13996] Mon, 05 February 2018 01:07 Go to previous messageGo to next message
Mrinal is currently offline  Mrinal
Messages: 14
Registered: January 2018
Location: Bhubaneswar, India
Member
Dear Elizabeth,

First of all, I would like to thank and congratulate you and your team for the much needed PUMS-DHS initiative. It is such a relief to have a resource like this when working with huge databases such as NFHS. Interestingly, the first thing I did after getting access to DHS was to browse the IPUMS-DHS project, however, I realized that I should take it as an opportunity to learn a thing or two while preparing the database for analysis. But again I heartily appreciate your contribution in smoothening the path to analyze DHS database.


Mrinal
Re: STATA codes to merge women and household datasets [message #14035 is a reply to message #14029] Mon, 05 February 2018 13:08 Go to previous messageGo to next message
boyle014 is currently offline  boyle014
Messages: 78
Registered: December 2015
Location: Minneapolis
Senior Member
Thanks, Mrinal!

Professor Elizabeth Boyle
Sociology & Law, University of Minnesota, USA
Principal Investigator, IPUMS-DHS
Re: STATA codes to merge women and household datasets [message #14536 is a reply to message #13996] Wed, 18 April 2018 18:52 Go to previous messageGo to next message
Gowokani is currently offline  Gowokani
Messages: 9
Registered: October 2016
Location: Malawi
Member
This IPUMS-DHS , is just unavailable. See the message I am getting (attached)

Gowo

  • Attachment: Capture.PNG
    (Size: 38.31KB, Downloaded 758 times)


Gowokani Chijere Chirwa
Re: STATA codes to merge women and household datasets [message #14542 is a reply to message #14536] Fri, 20 April 2018 14:00 Go to previous messageGo to next message
boyle014 is currently offline  boyle014
Messages: 78
Registered: December 2015
Location: Minneapolis
Senior Member
Dear Gowo,

Sorry to hear that you got a Temporarily Unavailable page. We uploaded lots of new data this week. When we do this, the IT people sometimes have to take the website offline for a minute or two to fix bugs. It's working again now!

Liz Boyle


Professor Elizabeth Boyle
Sociology & Law, University of Minnesota, USA
Principal Investigator, IPUMS-DHS
Re: STATA codes to merge women and household datasets [message #16025 is a reply to message #13996] Tue, 23 October 2018 17:18 Go to previous messageGo to next message
priyoma is currently offline  priyoma
Messages: 7
Registered: January 2017
Member
Hi!

I've a similar question. I wish to merge PR file with IR file for NFHS 2( India). I had a look at IPUMS. It is indeed very helpful. However when I try to select variables from two different types of files (IR and PR), the webpage says my selection (data cart) will not be retained.


Any way around this?

I primarily need all of IR variables and education variables of PR (ie education information of each and every member of the household)

Best,
Priyoma
Re: STATA codes to merge women and household datasets [message #16073 is a reply to message #16025] Thu, 01 November 2018 17:20 Go to previous messageGo to next message
boyle014 is currently offline  boyle014
Messages: 78
Registered: December 2015
Location: Minneapolis
Senior Member
Hi priyoma,

Thanks for the query. You've uncovered a temporary weakness with the system when using household members as the unit of analysis--the Household Number (HHID) variable is not available for selection.

To put all the household members' education on each woman's record in IPUMS DHS, you would first create a woman's data file (extract) with all the other variables you need. Then you would go back into the system, select household members as the unit of analysis and create a second extract with the additional variables. You would then merge the files on the HHID variable.

We are in the process of fixing this now. HHID and a few other technical variables will become available for household members extracts next week. We will be adding them when we release a new set of samples from Afghanistan, Angola, Burundi, Lesotho, Myanmar, Namibia, and Senegal. I will post again when that process is complete.


Professor Elizabeth Boyle
Sociology & Law, University of Minnesota, USA
Principal Investigator, IPUMS-DHS
Re: STATA codes to merge women and household datasets [message #17989 is a reply to message #13961] Sun, 11 August 2019 08:48 Go to previous messageGo to next message
Isabelle is currently offline  Isabelle
Messages: 2
Registered: July 2019
Member
Hello Mrinal and the community,

I am working with the NFHS-2 and I am trying to merge the datasets "household" and "individual" exactly like you described earlier. I used the same code, however, I am not able to merge the datasets as
"variables v001 v002 v003 do not uniquely identify observations in the master data".

I added the commands I used. I cannot identify what I am doing wrong, so I am thankful for any advice.
___________

use "C:\Users\Isabelle\Desktop\DHS Data India\1998-99\Household\IAHR42FL.dta" // household dataset
gen int v001 = hv001
gen int v002 = hv002
gen byte v003 = hv003
sort v001 v002 v003
save "C:\Users\Isabelle\Desktop\DHS Data India\Single Datasets Recode\1998-99Recode\1998-99HouseholdSort.dta", replace

use "C:\Users\Isabelle\Desktop\DHS Data India\1998-99\Individual\IAIR42FL" //women dataset
sort v001 v002 v003
merge 1:m v001 v002 v003 using "C:\Users\Isabelle\Desktop\DHS Data India\Single Datasets Recode\1998-99Recode\1998-99HouseholdSort.dta"

save "C:\Users\Isabelle\Desktop\DHS Data India\Single Datasets Recode\1998-99Recode\1998-99RecMerge.dta"
___________

Kind regards,
Isabelle
Re: STATA codes to merge women and household datasets [message #18038 is a reply to message #17989] Fri, 23 August 2019 13:40 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3013
Registered: February 2013
Senior Member

Following is another response from Senior DHS Stata Specialist, Tom Pullum:

For all of the India surveys (and ONLY the India surveys), the household ID and case ID include state (generically "region"), given by hv024 in the PR file and v024 in the IR file. Just add that into the merge, along with v001, v002, v003 (in any order) and you should be ok. Let us know if you still have a problem.
Re: STATA codes to merge women and household datasets [message #18042 is a reply to message #18038] Sat, 24 August 2019 09:56 Go to previous messageGo to next message
Isabelle is currently offline  Isabelle
Messages: 2
Registered: July 2019
Member
Thank you Bridgette for your response and help.

However, I have a follow up question after merging the datasets including forth identifier v024/hv024.

When merging the datasets IR and PR using

household member:
gen int v001 = hv001
gen int v002 = hv002
gen byte v003 = hv003
gen int v024 = hv024
sort v001 v002 v003 v024

individual:
sort v001 v002 v003 v024
merge 1:m v001 v002 v003 v024 using "C:\Users\Isabelle\Desktop\DHS Data India\Single Datasets Recode\1998-99Recode\V21998-99HHMSort.dta"

I get the following result:
not matched: 337,481
from master 49,203
from using: 288,278

matched: 229,101

The household dataset has 517,379 observations
The individual dataset has 90,582 observation

Does this mean that from my master (individual) file, only (90,582-49,203) 41,379 individuals are not matched to a household; meaning I cannot use them in my further analysis?

Thank you in advance for your help!
Best
Isabelle


Re: STATA codes to merge women and household datasets [message #18063 is a reply to message #18042] Tue, 03 September 2019 15:37 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3013
Registered: February 2013
Senior Member
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

Hi Isabelle--Here is how I would do the merge. I use the old version of the merge command but you would get the same thing if you used 1:1. I also introduce a variable called "in_IR", which is coded 1 for every case in the IR file. It just clarifies the "_merge" code, which is described in your results with different terms.

ALL of the 90,303 cases in the IR file are also in the PR file. 427,076 people in the PR file are NOT also in the IR file. Hope this makes sense. Tom


cd e:\DHS\DHS_data\scratch

use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\IAIR42FL.DTA" , clear
gen hv024=v024
gen hv001=v001
gen hv002=v002
gen hvidx=v003
gen in_IR=1
sort hv024 hv001 hv002 hvidx
save IAIR42_temp.dta, replace

use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\IAPR42FL.DTA" , clear
sort hv024 hv001 hv002 hvidx
merge hv024 hv001 hv002 hvidx using IAIR42_temp.dta
replace in_IR=0 if in_IR==.
tab _merge
tab _merge in_IR

The following table is produced:

/index.php?t=getfile&id=1465&private=0
  • Attachment: table.PNG
    (Size: 11.14KB, Downloaded 3503 times)
Re: STATA codes to merge women and household datasets [message #19139 is a reply to message #18063] Sun, 26 April 2020 00:54 Go to previous message
vpatil is currently offline  vpatil
Messages: 9
Registered: March 2019
Member
Hi

I am trying to merge NFHS4 household and Individual datasets and I have tried all the options discussed in the previous thread on this topic but no success. This is what I get:

merge 1:m hv001 hv002 hv003 hv024 using "/Users/drvaishalipatil/Desktop/DHS datasets/RO1 India/RO12015sort.dta"

Result # of obs.
-----------------------------------------
not matched 1,280,793
from master 591,308 (_merge==1)
from using 689,485 (_merge==2)

matched 10,201 (_merge==3)
-----------------------------------------

The final numbers are wrong. I have tried 1:1, 1:1, m:1, and all different possibilities. Please tell me what am I doing wrong.
Previous Topic: SPA 2012-2017 Senegal
Next Topic: sample weights in women file
Goto Forum:
  


Current Time: Tue Mar 19 00:14:19 Coordinated Universal Time 2024