* do e:\DHS\Requests_and_queries\questions_on_merges\PH_health_do_25Aug2015_v2.txt use e:\DHS\DHS_data\PR_files\PHPR61FL.dta cd c:\scratch * It appears that all the relevant variables have the form sh2*_1 through sh2*_9 * Say that there is another set of variables you want to save. I will say they are hv0* and hv1* set more off drop if sh207==. | sh207==0 save tempa.dta, replace * all the desired information is exactly the same on every line in the household; keep one line per hh keep hv001 hv002 hvidx sh208*_* sh209*_* sh210*_* sh211*_* sh212*_* sh213a*_* * add any other variables that you want to keep gen newsh209=. gen newsh210=. gen newsh211=. gen newsh212=. gen newsh213a=. * seqno is sequence number, the number of sh208 (1 through 9) that gives the person's line number gen seqno=. local li=1 while `li'<=9 { replace seqno=`li' if sh208_`li'==hvidx local li=`li'+1 } * Now pull off just the variables with that sequence number local li=1 quietly while `li'<=9 { replace newsh209=sh209_`li' if seqno==`li' replace newsh210=sh210_`li' if seqno==`li' replace newsh211=sh211_`li' if seqno==`li' replace newsh212=sh212_`li' if seqno==`li' replace newsh213a=sh213a_`li' if seqno==`li' local li=`li'+1 } label values newsh209 SH209 label values newsh210 SH210 label values newsh211 SH211 label values newsh212 SH212 label values newsh213a SH213a * add any other sh variables * Use the next line to check *list hv001 hv002 hvidx sh208_1-sh208_3 sh209_1-sh209_3 newsh209 if _n<=100, table clean nolabel drop sh209_* drop sh210_* drop sh211_* drop sh212_* drop sh213a_*