* How to merge ZMWI41FL.dta with ZMIR42FL.dta
* Tom Pullum, tom.pullum@icfi.com, Feb. 26, 2014

* Note that this merge will be at the household level, not the individual level

* PART 1. First figure out the id codes!


use c:\DHS\DHS_data\WI_files\ZMWI41FL.dta, clear

describe whhid
list whhid if _n<=10, table clean


use c:\DHS\DHS_data\IR_files\ZMIR42FL.dta, clear

keep v001 v002
describe v001 v002
list v001 v002 if _n<=50, table clean


* You pause here and examine the codes.  Unfortunately, you will have to play around with them.

* whhid is a 12-character string; v001 and v002 are numbers.  We need to break whhid into two strings and
*  then convert those strings to numbers.

* Here is what I did.  First I have to break out the 12 columns of whhid to find which columns are being used. 

use c:\DHS\DHS_data\WI_files\ZMWI41FL.dta, clear
keep whhid
keep if _n==1
gen col_1=substr(whhid,1,1)
gen col_2=substr(whhid,2,1)
gen col_3=substr(whhid,3,1)
gen col_4=substr(whhid,4,1)
gen col_5=substr(whhid,5,1)
gen col_6=substr(whhid,6,1)
gen col_7=substr(whhid,7,1)
gen col_8=substr(whhid,8,1)
gen col_9=substr(whhid,9,1)
gen col_10=substr(whhid,10,1)
gen col_11=substr(whhid,11,1)
gen col_12=substr(whhid,12,1)

* By comparison with the IR id codes, we see that v001 is columns 1-8 of whhid and v002 is columns 9-12 of whhid.

* There are embedded blanks in columns 9-12 and they must be replaced with 0

replace col_10="0" if col_10==" " & col_9~=" "
replace col_11="0" if col_11==" " & col_10~=" "
replace col_12="0" if col_12==" " & col_11~=" "

gen str8 v001=col_1+col_2+col_3+col_4+col_5+col_6+col_7+col_8

gen str4 v002=col_9+col_10+col_11+col_12

destring v001, replace
destring v002, replace

list, table clean



**********************************************************************************************

* PART 2. Now start over and do the actual merge


* Open the smaller file, rename / copy the id codes for the match, and sort

use c:\DHS\DHS_data\WI_files\ZMWI41FL.dta, clear

gen col_1=substr(whhid,1,1)
gen col_2=substr(whhid,2,1)
gen col_3=substr(whhid,3,1)
gen col_4=substr(whhid,4,1)
gen col_5=substr(whhid,5,1)
gen col_6=substr(whhid,6,1)
gen col_7=substr(whhid,7,1)
gen col_8=substr(whhid,8,1)
gen col_9=substr(whhid,9,1)
gen col_10=substr(whhid,10,1)
gen col_11=substr(whhid,11,1)
gen col_12=substr(whhid,12,1)
replace col_10="0" if col_10==" " & col_9~=" "
replace col_11="0" if col_11==" " & col_10~=" "
replace col_12="0" if col_12==" " & col_11~=" "

gen str8 v001=col_1+col_2+col_3+col_4+col_5+col_6+col_7+col_8

gen str4 v002=col_9+col_10+col_11+col_12
drop col*

destring v001, replace
destring v002, replace


sort v001 v002

save c:\temp_workspace\ZMWI4.dta, replace



* Open the larger file, sort
use c:\DHS\DHS_data\IR_files\ZMIR42FL.dta

sort v001 v002

save c:\temp_workspace\ZMIR4.dta, replace


* Do the merge

merge v001 v002 using c:\temp_workspace\ZMWI4.dta

* Check the merge

tab _merge

* _merge=3 means v001 and v002 appeared in both files
* _merge=2 means v001 and v002 appeared only in the WI file, meaning there was a household
*  with no eligible women.  Those cases should be dropped.

drop if _merge==2
drop _merge


save c:\temp_workspace\ZMIR4_with_WI4.dta, replace