Home » Data » Merging data files » Issue merging household and individual (women's) file
|Issue merging household and individual (women's) file [message #13600]
||Mon, 27 November 2017 19:28
Registered: October 2017
I am having some trouble interpreting the results of my merge of the household and individual files for Tanzania 2015-2016. According to the DHS Final Report for Tanzania, there were 12,563 households that were successfully interviewed. At these households, 13,266 women were successfully interviewed. When I merged the household and women's data files, all 13,266 women matched to a household (as expected). However, the merge reported that there were 3,033 households for which no woman was found. |
• This figure (3,033) does not match the number of households with 0 women present; according to the household file, there should be 2,907 households for which no woman was found.
• When I load my merged file, it shows that there are 16,299 observations instead of the 13,266 women I expected to see. It appears that somehow, 13,266 and 3,033 are getting added together to result in 16,299.
• The mystery 3,033 observations have missing values for almost all variables.
I have two questions. Why are there 3,033 households for which no woman was found instead of 2,907? How should I handle the 3,033 observations?
Here is my Stata code for the merge in case it is helpful:
*import household file to rename variables
*generate variables for the match in the household file
*sort household file
*save revised household dataset
save "C:\Users\student\Desktop\TZHR7HFL_sorted.DTA", replace
*open base data file (women's)
*sort women's file
*save revised women's file
save "C:\Users\student\Desktop\TZIR7HFL_sorted.DTA", replace
*merge the two files
*merge many to one using v001 and v002
merge m:1 v001 v002 using "C:\Users\student\Desktop\TZHR7HFL_sorted.DTA"
*check the merge
Thank you very much for any assistance you can provide.
Current Time: Fri Apr 3 19:07:52 Eastern Daylight Time 2020