Home » Topics » Wealth Index » Merge wealth index with individual recode - Malawi 2000
Merge wealth index with individual recode - Malawi 2000 [message #10793] |
Wed, 14 September 2016 20:58 |
jswindle
Messages: 5 Registered: September 2016
|
Member |
|
|
Hi all,
I would like to merge the household wealth index data file for Malawi 2000 with the individual recode data file for Malawi 2000. I would like to do this because I am then merging the Malawi 2000, 2004, and 2010 individual recode data files together, and the 2004 and 2010 data files include the wealth index variables (v190 and v191), but the 2000 data file does not.
My understanding is that I will need to merge the 2000 wealth index data file with the 2000 household recode data file first, after which I can merge this new edited 2000 household recode data file with the 2000 individual recode data file. If that is incorrect and I can merge the 2000 wealth index data file with the 2000 individual recode data file directly, then I would prefer to do that, but am not sure how I would do it.
When I follow my plans to merge the wealth and household files first, and then merge the household and individual files, I run into problems with the output of the merge between the 2000 individual recode and 2000 household recode data files. The files merge, but the merge "match" results appear to be quite off. The total n for the new merged file is 20,052, which is the total n of the two data files put together, rather than the n of the original individual recode data file alone, which is 13,220.
Below is the code I am using in Stata. Any advice would be appreciated. Thank you in advance.
STATA CODE
* Add wealth index to 2000 individual recode
use "H:\DHS\Malawi\2000 wealth recode\MWwi42fl.DTA", clear
rename whhid hhid
merge 1:1 hhid using "H:\DHS\Malawi\2000 household recode\MWHR41FL.DTA"
drop _merge
rename hv001 v001
rename hv002 v002
rename hv003 v003
rename wlthindf v190
rename wlthind5 v191
keep v001 v002 v003 v190 v191
sort v001 v002 v003
save "H:\DHS\Malawi\2000 wealth recode\wealthmergeprep.DTA", replace
clear
use "H:\DHS\Malawi\2000 individual recode\MWIR41FL.DTA", clear
sort v001 v002 v003
merge m:1 v001 v002 v003 using "H:\DHS\Malawi\2000 wealth recode\wealthmergeprep.DTA"
This is the output generated after the merge command:
. merge m:1 v001 v002 v003 using "H:\DHS\Malawi\2000 wealth recode\wealthmergeprep.DTA"
Result # of obs.
-----------------------------------------
not matched 12,671
from master 5,839 (_merge==1)
from using 6,832 (_merge==2)
matched 7,381 (_merge==3)
-----------------------------------------
.
end of do-file
|
|
|
Re: Merge wealth index with individual recode - Malawi 2000 [message #10835 is a reply to message #10793] |
Thu, 22 September 2016 20:56 |
Liz-DHS
Messages: 1516 Registered: February 2013
|
Senior Member |
|
|
Dear User,
A response from one of our experts, Dr. Sarah Staveteig:
Quote:
The household ID (whhid) in the Malawi 2000 wealth index file is a concatenation of the cluster number and the household number. It is similar to the caseid string found in the IR file except that caseid also has the respondent's line number at the end.* To directly merge the IR file with the WI file, you have three options: (1) create a new whhid in the IR file based only on v001 and v002, (2) parse out a numeric cluster number and numeric household number from whhid in the WI file to merge with v001 and v002, or (3) truncate the caseid string in the IR file to match whhid by excluding the respondent's line number.
Option 1 requires that you know the rules for spacing strings within whhid and add a certain number of spaces based on the number of digits in the cluster number. Option 2 involves splitting a string and transforming it into numeric values. Option 3 is the most straightforward, as you only need to know the length of whhid and truncate caseid to that width.
To determine how long whhid is in the WI file, you can click on "variable properties" in the bottom right corner of your Stata screen, or type:
Quote:
Stata will tell you that it is a 12-character string. Based on that information, you can use the substring command to return that number of characters from caseid in the MWIR41FL file. Namely:
gen whhid = substr(caseid,1,12)
Quote:
Now, if you merge whhid between IR and WI files they will match.
For clarity, the full code is:
use MWIR41FL
gen whhid = substr(caseid,1,12)
merge m:1 whhid using "MWwi42fl"
* According to the DHS recode manual, CASEID is typically a concatenation of v001, v002, and v003. To confirm that this is true in Malawi, browse the first few columns of MWIR41FL or open that dataset and type
list caseid v001-v003 in 1/10
Then browse the first few rows of MWwi42fl or open it and type
You can see that caseid is indeed v001, v002, and v003 concatenated together with extra spaces for alignment and that whhid is similar to caseid except--as it is a household-level dataset--there is no respondent line number.
[Updated on: Thu, 22 September 2016 21:10] Report message to a moderator
|
|
|
|
|
|
Goto Forum:
Current Time: Sun Dec 22 13:17:07 Coordinated Universal Time 2024
|