| Home » Data » Merging data files » STATA codes to merge women and household datasets Goto Forum:
	| 
		
			| STATA codes to merge women and household datasets [message #13937] | Fri, 26 January 2018 11:53  |  
			| 
				
				
					|  Mrinal Messages: 14
 Registered: January 2018
 Location: Bhubaneswar, India
 | Member |  |  |  
	| I am working with NFHS datasets and need to merge women and household datasets using STATA, which I hardly use. May I please have the Stata codes to merge aforementioned datasets of all four rounds of NFHS. I will certainly appreciate it. 
 Thanks and regards,
 Mrinal
 |  
	|  |  |  
	| 
		
			| Re: STATA codes to merge women and household datasets [message #13957 is a reply to message #13937] | Mon, 29 January 2018 20:05   |  
			| 
				
				
					|  Bridgette-DHS Messages: 3230
 Registered: February 2013
 | Senior Member |  |  |  
	| Following is a response from Senior DHS Stata Specialist, Tom Pullum: 
 If you want to combine, say, the household (PR) files from the successive surveys, you should use the "append" command.  This is distinct from a "merge", in which, say, the KR records and PR records from a single survey could be combined child by child.
 
 Some variables will have different codes and categories in different surveys.  For example, v023 may not be defined the same way in every survey. That must be taken into account.  When you use the append command, the variable names and labels from the last survey in the append command will over-ride any previous names and labels.
 
 The main reason for appending files is for convenience of file manipulation.  In the case of the India surveys, all the files are very large and after appending they will be enormous--and slow to work with.  I would recommend trimming the files and just carrying along the variables you need for analysis.
 
 Issues related to appending have been discussed in other forum postings.
 
 |  
	|  |  |  
	| 
		
			| Re: STATA codes to merge women and household datasets [message #13961 is a reply to message #13957] | Tue, 30 January 2018 03:28   |  
			| 
				
				
					|  Mrinal Messages: 14
 Registered: January 2018
 Location: Bhubaneswar, India
 | Member |  |  |  
	| Thank you, Bridgette and Pullum. Actually, I was more interested in merging codes for stata package than appending. However, I managed to construct the merging code for NFHS-2 and is given below. 
 
 **Merging household on women dataset**
	**Round 2**
	use "D:\Desktop\dhs\data\nfhs\2\IAHR42FL.DTA", clear
	gen int v001 = hv001
	gen int v002 = hv002
	gen byte v003 = hv003
	sort v001 v002 v003
	save "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA", replace
	
	use "D:\Desktop\dhs\data\nfhs\2\IAIR42FL.DTA", clear
	sort v001 v002 v003
	merge v001 v002 v003 using "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA"
    
	save "D:\Desktop\dhs\data\nfhs\2\IA_HR_IR_42FL.DTA", replace
 Thanks and regards,
 Mrinal
 |  
	|  |  |  
	| 
		
			| Re: STATA codes to merge women and household datasets [message #13996 is a reply to message #13961] | Thu, 01 February 2018 11:14   |  
			| 
				
				
					|  boyle014 Messages: 78
 Registered: December 2015
 Location: Minneapolis
 | Senior Member |  |  |  
	| Mrnal, 
 You should consider using IPUMS-DHS, which has already harmonized all of the variables across the surveys. You select the samples you want--all of the Indian ones it sounds like--and then the variables you want. You don't have to download separate files and merge or append them. You can download a single file with multiple surveys. The latest Indian sample is being uploaded into the system now. It will be available in March. The other three Indian samples are already there.
 
 Liz Boyle
 
 
 
 Professor Elizabeth Boyle
 Sociology & Law, University of Minnesota, USA
 Principal Investigator, IPUMS-DHS
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re: STATA codes to merge women and household datasets [message #14542 is a reply to message #14536] | Fri, 20 April 2018 14:00   |  
			| 
				
				
					|  boyle014 Messages: 78
 Registered: December 2015
 Location: Minneapolis
 | Senior Member |  |  |  
	| Dear Gowo, 
 Sorry to hear that you got a Temporarily Unavailable page. We uploaded lots of new data this week. When we do this, the IT people sometimes have to take the website offline for a minute or two to fix bugs. It's working again now!
 
 Liz Boyle
 
 Professor Elizabeth Boyle
 Sociology & Law, University of Minnesota, USA
 Principal Investigator, IPUMS-DHS
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: STATA codes to merge women and household datasets [message #16073 is a reply to message #16025] | Thu, 01 November 2018 17:20   |  
			| 
				
				
					|  boyle014 Messages: 78
 Registered: December 2015
 Location: Minneapolis
 | Senior Member |  |  |  
	| Hi priyoma, 
 Thanks for the query. You've uncovered a temporary weakness with the system when using household members as the unit of analysis--the Household Number (HHID) variable is not available for selection.
 
 To put all the household members' education on each woman's record in IPUMS DHS, you would first create a woman's data file (extract) with all the other variables you need. Then you would go back into the system, select household members as the unit of analysis and create a second extract with the additional variables. You would then merge the files on the HHID variable.
 
 We are in the process of fixing this now. HHID and a few other technical variables will become available for household members extracts next week. We will be adding them when we release a new set of samples from Afghanistan, Angola, Burundi, Lesotho, Myanmar, Namibia, and Senegal. I will post again when that process is complete.
 
 
 Professor Elizabeth Boyle
 Sociology & Law, University of Minnesota, USA
 Principal Investigator, IPUMS-DHS
 |  
	|  |  |  
	|  |  
	|  |  
	| 
		
			| Re: STATA codes to merge women and household datasets [message #18042 is a reply to message #18038] | Sat, 24 August 2019 09:56   |  
			| 
				
				
					|  Isabelle Messages: 2
 Registered: July 2019
 | Member |  |  |  
	| Thank you Bridgette for your response and help. 
 However, I have a follow up question after merging the datasets including forth identifier v024/hv024.
 
 When merging the datasets IR and PR using
 
 household member:
 gen int v001 = hv001
 gen int v002 = hv002
 gen byte v003 = hv003
 gen int v024 = hv024
 sort v001 v002 v003 v024
 
 individual:
 sort v001 v002 v003 v024
 merge 1:m v001 v002 v003 v024 using "C:\Users\Isabelle\Desktop\DHS Data India\Single Datasets Recode\1998-99Recode\V21998-99HHMSort.dta"
 
 I get the following result:
 not matched: 337,481
 from master 49,203
 from using: 288,278
 
 matched: 229,101
 
 The household dataset has 517,379 observations
 The individual dataset has 90,582 observation
 
 Does this mean that from my master (individual) file, only (90,582-49,203) 41,379 individuals are not matched to a household; meaning I cannot use them in my further analysis?
 
 Thank you in advance for your help!
 Best
 Isabelle
 
 
 
 |  
	|  |  |  
	| 
		
			| Re: STATA codes to merge women and household datasets [message #18063 is a reply to message #18042] | Tue, 03 September 2019 15:37   |  
			| 
				
				
					|  Bridgette-DHS Messages: 3230
 Registered: February 2013
 | Senior Member |  |  |  
	| Following is a response from DHS Research & Data Analysis Director, Tom Pullum: 
 Hi Isabelle--Here is how I would do the merge.  I use the old version of the merge command but you would get the same thing if you used 1:1.  I also introduce a variable called "in_IR", which is coded 1 for every case in the IR file.  It just clarifies the "_merge" code, which is described in your results with different terms.
 
 ALL of the 90,303 cases in the IR file are also in the PR file. 427,076 people in the PR file are NOT also in the IR file. Hope this makes sense.  Tom
 
 
 
 cd e:\DHS\DHS_data\scratch
use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\IAIR42FL.DTA" , clear
gen hv024=v024
gen hv001=v001
gen hv002=v002
gen hvidx=v003
gen in_IR=1
sort hv024 hv001 hv002 hvidx
save IAIR42_temp.dta, replace
use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\IAPR42FL.DTA" , clear
sort hv024 hv001 hv002 hvidx
merge hv024 hv001 hv002 hvidx using IAIR42_temp.dta
replace in_IR=0 if in_IR==.
tab _merge
tab _merge in_IR
The following table is produced:
 
 
  
 
	
	 Attachment: table.PNG (Size: 11.14KB, Downloaded 4430 times)
 |  
	|  |  |  
	| 
		
			| Re: STATA codes to merge women and household datasets [message #19139 is a reply to message #18063] | Sun, 26 April 2020 00:54  |  
			| 
				
				
					|  vpatil Messages: 9
 Registered: March 2019
 | Member |  |  |  
	| Hi 
 I am trying to merge NFHS4 household and Individual datasets and I have tried all the options discussed in the previous thread on this topic but no success. This is what I get:
 
 merge 1:m hv001 hv002 hv003 hv024 using "/Users/drvaishalipatil/Desktop/DHS datasets/RO1 India/RO12015sort.dta"
 
 Result                           # of obs.
 -----------------------------------------
 not matched                     1,280,793
 from master                   591,308  (_merge==1)
 from using                    689,485  (_merge==2)
 
 matched                            10,201  (_merge==3)
 -----------------------------------------
 
 The final numbers are wrong. I have tried 1:1, 1:1, m:1, and all different possibilities. Please tell me what am I doing wrong.
 
 |  
	|  |  | 
 
 
 Current Time: Sun Oct 26 09:45:35 Coordinated Universal Time 2025 |