Home » Data » Merging data files » STATA codes to merge women and household datasets
STATA codes to merge women and household datasets [message #13937] |
Fri, 26 January 2018 11:53 |
Mrinal
Messages: 14 Registered: January 2018 Location: Bhubaneswar, India
|
Member |
|
|
I am working with NFHS datasets and need to merge women and household datasets using STATA, which I hardly use. May I please have the Stata codes to merge aforementioned datasets of all four rounds of NFHS. I will certainly appreciate it.
Thanks and regards,
Mrinal
|
|
|
Re: STATA codes to merge women and household datasets [message #13957 is a reply to message #13937] |
Mon, 29 January 2018 20:05 |
Bridgette-DHS
Messages: 3218 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Stata Specialist, Tom Pullum:
If you want to combine, say, the household (PR) files from the successive surveys, you should use the "append" command. This is distinct from a "merge", in which, say, the KR records and PR records from a single survey could be combined child by child.
Some variables will have different codes and categories in different surveys. For example, v023 may not be defined the same way in every survey. That must be taken into account. When you use the append command, the variable names and labels from the last survey in the append command will over-ride any previous names and labels.
The main reason for appending files is for convenience of file manipulation. In the case of the India surveys, all the files are very large and after appending they will be enormous--and slow to work with. I would recommend trimming the files and just carrying along the variables you need for analysis.
Issues related to appending have been discussed in other forum postings.
|
|
|
Re: STATA codes to merge women and household datasets [message #13961 is a reply to message #13957] |
Tue, 30 January 2018 03:28 |
Mrinal
Messages: 14 Registered: January 2018 Location: Bhubaneswar, India
|
Member |
|
|
Thank you, Bridgette and Pullum. Actually, I was more interested in merging codes for stata package than appending. However, I managed to construct the merging code for NFHS-2 and is given below.
**Merging household on women dataset**
**Round 2**
use "D:\Desktop\dhs\data\nfhs\2\IAHR42FL.DTA", clear
gen int v001 = hv001
gen int v002 = hv002
gen byte v003 = hv003
sort v001 v002 v003
save "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA", replace
use "D:\Desktop\dhs\data\nfhs\2\IAIR42FL.DTA", clear
sort v001 v002 v003
merge v001 v002 v003 using "D:\Dropbox\stata\nfhs\IAHR42FL_sort.DTA"
save "D:\Desktop\dhs\data\nfhs\2\IA_HR_IR_42FL.DTA", replace
Thanks and regards,
Mrinal
|
|
|
Re: STATA codes to merge women and household datasets [message #13996 is a reply to message #13961] |
Thu, 01 February 2018 11:14 |
boyle014
Messages: 78 Registered: December 2015 Location: Minneapolis
|
Senior Member |
|
|
Mrnal,
You should consider using IPUMS-DHS, which has already harmonized all of the variables across the surveys. You select the samples you want--all of the Indian ones it sounds like--and then the variables you want. You don't have to download separate files and merge or append them. You can download a single file with multiple surveys. The latest Indian sample is being uploaded into the system now. It will be available in March. The other three Indian samples are already there.
Liz Boyle
Professor Elizabeth Boyle
Sociology & Law, University of Minnesota, USA
Principal Investigator, IPUMS-DHS
|
|
|
|
|
|
Re: STATA codes to merge women and household datasets [message #14542 is a reply to message #14536] |
Fri, 20 April 2018 14:00 |
boyle014
Messages: 78 Registered: December 2015 Location: Minneapolis
|
Senior Member |
|
|
Dear Gowo,
Sorry to hear that you got a Temporarily Unavailable page. We uploaded lots of new data this week. When we do this, the IT people sometimes have to take the website offline for a minute or two to fix bugs. It's working again now!
Liz Boyle
Professor Elizabeth Boyle
Sociology & Law, University of Minnesota, USA
Principal Investigator, IPUMS-DHS
|
|
|
|
Re: STATA codes to merge women and household datasets [message #16073 is a reply to message #16025] |
Thu, 01 November 2018 17:20 |
boyle014
Messages: 78 Registered: December 2015 Location: Minneapolis
|
Senior Member |
|
|
Hi priyoma,
Thanks for the query. You've uncovered a temporary weakness with the system when using household members as the unit of analysis--the Household Number (HHID) variable is not available for selection.
To put all the household members' education on each woman's record in IPUMS DHS, you would first create a woman's data file (extract) with all the other variables you need. Then you would go back into the system, select household members as the unit of analysis and create a second extract with the additional variables. You would then merge the files on the HHID variable.
We are in the process of fixing this now. HHID and a few other technical variables will become available for household members extracts next week. We will be adding them when we release a new set of samples from Afghanistan, Angola, Burundi, Lesotho, Myanmar, Namibia, and Senegal. I will post again when that process is complete.
Professor Elizabeth Boyle
Sociology & Law, University of Minnesota, USA
Principal Investigator, IPUMS-DHS
|
|
|
|
|
Re: STATA codes to merge women and household datasets [message #18042 is a reply to message #18038] |
Sat, 24 August 2019 09:56 |
Isabelle
Messages: 2 Registered: July 2019
|
Member |
|
|
Thank you Bridgette for your response and help.
However, I have a follow up question after merging the datasets including forth identifier v024/hv024.
When merging the datasets IR and PR using
household member:
gen int v001 = hv001
gen int v002 = hv002
gen byte v003 = hv003
gen int v024 = hv024
sort v001 v002 v003 v024
individual:
sort v001 v002 v003 v024
merge 1:m v001 v002 v003 v024 using "C:\Users\Isabelle\Desktop\DHS Data India\Single Datasets Recode\1998-99Recode\V21998-99HHMSort.dta"
I get the following result:
not matched: 337,481
from master 49,203
from using: 288,278
matched: 229,101
The household dataset has 517,379 observations
The individual dataset has 90,582 observation
Does this mean that from my master (individual) file, only (90,582-49,203) 41,379 individuals are not matched to a household; meaning I cannot use them in my further analysis?
Thank you in advance for your help!
Best
Isabelle
|
|
|
Re: STATA codes to merge women and household datasets [message #18063 is a reply to message #18042] |
Tue, 03 September 2019 15:37 |
Bridgette-DHS
Messages: 3218 Registered: February 2013
|
Senior Member |
|
|
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:
Hi Isabelle--Here is how I would do the merge. I use the old version of the merge command but you would get the same thing if you used 1:1. I also introduce a variable called "in_IR", which is coded 1 for every case in the IR file. It just clarifies the "_merge" code, which is described in your results with different terms.
ALL of the 90,303 cases in the IR file are also in the PR file. 427,076 people in the PR file are NOT also in the IR file. Hope this makes sense. Tom
cd e:\DHS\DHS_data\scratch
use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\IAIR42FL.DTA" , clear
gen hv024=v024
gen hv001=v001
gen hv002=v002
gen hvidx=v003
gen in_IR=1
sort hv024 hv001 hv002 hvidx
save IAIR42_temp.dta, replace
use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\IAPR42FL.DTA" , clear
sort hv024 hv001 hv002 hvidx
merge hv024 hv001 hv002 hvidx using IAIR42_temp.dta
replace in_IR=0 if in_IR==.
tab _merge
tab _merge in_IR
The following table is produced:
-
Attachment: table.PNG
(Size: 11.14KB, Downloaded 3956 times)
|
|
|
Re: STATA codes to merge women and household datasets [message #19139 is a reply to message #18063] |
Sun, 26 April 2020 00:54 |
vpatil
Messages: 9 Registered: March 2019
|
Member |
|
|
Hi
I am trying to merge NFHS4 household and Individual datasets and I have tried all the options discussed in the previous thread on this topic but no success. This is what I get:
merge 1:m hv001 hv002 hv003 hv024 using "/Users/drvaishalipatil/Desktop/DHS datasets/RO1 India/RO12015sort.dta"
Result # of obs.
-----------------------------------------
not matched 1,280,793
from master 591,308 (_merge==1)
from using 689,485 (_merge==2)
matched 10,201 (_merge==3)
-----------------------------------------
The final numbers are wrong. I have tried 1:1, 1:1, m:1, and all different possibilities. Please tell me what am I doing wrong.
|
|
|
Goto Forum:
Current Time: Mon Jan 6 17:12:06 Coordinated Universal Time 2025
|