* How to merge ZMWI41FL.dta with ZMIR42FL.dta * Tom Pullum, tom.pullum@icfi.com, Feb. 26, 2014 * Note that this merge will be at the household level, not the individual level * PART 1. First figure out the id codes! use c:\DHS\DHS_data\WI_files\ZMWI41FL.dta, clear describe whhid list whhid if _n<=10, table clean use c:\DHS\DHS_data\IR_files\ZMIR42FL.dta, clear keep v001 v002 describe v001 v002 list v001 v002 if _n<=50, table clean * You pause here and examine the codes. Unfortunately, you will have to play around with them. * whhid is a 12-character string; v001 and v002 are numbers. We need to break whhid into two strings and * then convert those strings to numbers. * Here is what I did. First I have to break out the 12 columns of whhid to find which columns are being used. use c:\DHS\DHS_data\WI_files\ZMWI41FL.dta, clear keep whhid keep if _n==1 gen col_1=substr(whhid,1,1) gen col_2=substr(whhid,2,1) gen col_3=substr(whhid,3,1) gen col_4=substr(whhid,4,1) gen col_5=substr(whhid,5,1) gen col_6=substr(whhid,6,1) gen col_7=substr(whhid,7,1) gen col_8=substr(whhid,8,1) gen col_9=substr(whhid,9,1) gen col_10=substr(whhid,10,1) gen col_11=substr(whhid,11,1) gen col_12=substr(whhid,12,1) * By comparison with the IR id codes, we see that v001 is columns 1-8 of whhid and v002 is columns 9-12 of whhid. * There are embedded blanks in columns 9-12 and they must be replaced with 0 replace col_10="0" if col_10==" " & col_9~=" " replace col_11="0" if col_11==" " & col_10~=" " replace col_12="0" if col_12==" " & col_11~=" " gen str8 v001=col_1+col_2+col_3+col_4+col_5+col_6+col_7+col_8 gen str4 v002=col_9+col_10+col_11+col_12 destring v001, replace destring v002, replace list, table clean ********************************************************************************************** * PART 2. Now start over and do the actual merge * Open the smaller file, rename / copy the id codes for the match, and sort use c:\DHS\DHS_data\WI_files\ZMWI41FL.dta, clear gen col_1=substr(whhid,1,1) gen col_2=substr(whhid,2,1) gen col_3=substr(whhid,3,1) gen col_4=substr(whhid,4,1) gen col_5=substr(whhid,5,1) gen col_6=substr(whhid,6,1) gen col_7=substr(whhid,7,1) gen col_8=substr(whhid,8,1) gen col_9=substr(whhid,9,1) gen col_10=substr(whhid,10,1) gen col_11=substr(whhid,11,1) gen col_12=substr(whhid,12,1) replace col_10="0" if col_10==" " & col_9~=" " replace col_11="0" if col_11==" " & col_10~=" " replace col_12="0" if col_12==" " & col_11~=" " gen str8 v001=col_1+col_2+col_3+col_4+col_5+col_6+col_7+col_8 gen str4 v002=col_9+col_10+col_11+col_12 drop col* destring v001, replace destring v002, replace sort v001 v002 save c:\temp_workspace\ZMWI4.dta, replace * Open the larger file, sort use c:\DHS\DHS_data\IR_files\ZMIR42FL.dta sort v001 v002 save c:\temp_workspace\ZMIR4.dta, replace * Do the merge merge v001 v002 using c:\temp_workspace\ZMWI4.dta * Check the merge tab _merge * _merge=3 means v001 and v002 appeared in both files * _merge=2 means v001 and v002 appeared only in the WI file, meaning there was a household * with no eligible women. Those cases should be dropped. drop if _merge==2 drop _merge save c:\temp_workspace\ZMIR4_with_WI4.dta, replace