-------------------------------------------------------------------------------------------------------------------------------------------------------------------- name: log: e:\DHS\programs\tests\diffs_between_surveys_log_22July2016.txt log type: text opened on: 25 Jul 2016, 08:28:09 . . * Tom Pullum, tom.pullum@icfi.com, July 25, 2016 . . set more off . cd e:\DHS\DHS_data\KR_files e:\DHS\DHS_data\KR_files . . **************************************************************** . . program define setup1 1. . * Construct the indicator, number the surveys, save the needed variables . . scalar ssurvey=ssurvey+1 2. local lsurvey=ssurvey 3. gen survey=ssurvey 4. . * CONSTRUCT THE INDICATOR . . * values other than 0 and 1 should be interpreted as . . replace g100=. if g100>1 5. replace g102=. if g102>1 6. . gen y = . 7. replace y=0 if g100<. 8. replace y=1 if g102==1 9. . keep v005 v021 v023 y survey 10. save temp_`lsurvey'.dta, replace 11. . end . . **************************************************************** . . program define setup2 1. . * Combine the surveys into one file . . use temp_1.dta, clear 2. append using temp_2.dta 3. . egen cluster=group(v021 survey) 4. egen stratum=group(v023 survey) 5. . save temp.dta, replace 6. . end . . **************************************************************** . . program define analyze 1. . * Test whether the "survey" variable statistically significant . . svyset cluster [pweight=v005], strata(stratum) singleunit(scaled) 2. . tab survey y 3. tab survey y [iweight=v005/1000000], row 4. . * Test for significance of change or difference . svy: logit y i.survey 5. scalar p=e(p) 6. scalar list p 7. . end . . **************************************************************** . **************************************************************** . **************************************************************** . **************************************************************** . **************************************************************** . * EXECUTION BEGINS HERE . . * Example: difference between two surveys in FGM prevalence . . * Kenya 27.1% in 2008-09 vs 21.0% in 2014 . . * Uganda 0.6% in 2006 vs. 1.4% in 2011 . . scalar ssurvey=0 . . use e:\DHS\DHS_data\IR_files\KEIR52FL.dta, clear . setup1 (7 real changes made, 7 to missing) (4 real changes made, 4 to missing) (8,444 missing values generated) (8,437 real changes made) (2,541 real changes made) file temp_1.dta saved . . use e:\DHS\DHS_data\IR_files\KEIR70FL.dta, clear . setup1 (0 real changes made) (0 real changes made) (31,079 missing values generated) (14,739 real changes made) (4,377 real changes made) file temp_2.dta saved . . setup2 file temp.dta saved . analyze pweight: v005 VCE: linearized Single unit: scaled Strata 1: stratum SU 1: cluster FPC 1: | y survey | 0 1 | Total -----------+----------------------+---------- 1 | 5,896 2,541 | 8,437 2 | 10,362 4,377 | 14,739 -----------+----------------------+---------- Total | 16,258 6,918 | 23,176 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | y survey | 0 1 | Total -----------+----------------------+---------- 1 | 6,154.278 2,284.252 | 8,438.53 | 72.93 27.07 | 100.00 -----------+----------------------+---------- 2 | 11,557.23 3,065.936 |14,623.162 | 79.03 20.97 | 100.00 -----------+----------------------+---------- Total | 17,711.5 5,350.188 | 23,061.69 | 76.80 23.20 | 100.00 (running logit on estimation sample) Survey: Logistic regression Number of strata = 100 Number of obs = 23,176 Number of PSUs = 1,991 Population size = 23,061,691,753 Design df = 1,891 F( 1, 1891) = 10.60 Prob > F = 0.0011 ------------------------------------------------------------------------------ | Linearized y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 2.survey | -.335849 .103141 -3.26 0.001 -.5381312 -.1335669 _cons | -.9911088 .0958041 -10.35 0.000 -1.179002 -.803216 ------------------------------------------------------------------------------ p = .00114906 . . . clear . . scalar ssurvey=0 . . * In the Uganda surveys we must construct the strata and rename the outcome variables . . * Uganda 2006 . use e:\DHS\DHS_data\IR_files\UGIR52FL.dta . rename s643a g100 . rename s643b g102 . drop v023 . egen v023=group(v024 v025) . setup1 (1 real change made, 1 to missing) (13 real changes made, 13 to missing) (8,531 missing values generated) (8,530 real changes made) (61 real changes made) file temp_1.dta saved . . * Uganda 2011 . use e:\DHS\DHS_data\IR_files\UGIR60FL.dta . rename s631d g100 . rename s631f g102 . drop v023 . egen v023=group(v024 v025) . setup1 (33 real changes made, 33 to missing) (3 real changes made, 3 to missing) (8,674 missing values generated) (8,641 real changes made) (158 real changes made) file temp_2.dta saved . . setup2 file temp.dta saved . analyze pweight: v005 VCE: linearized Single unit: scaled Strata 1: stratum SU 1: cluster FPC 1: | y survey | 0 1 | Total -----------+----------------------+---------- 1 | 8,469 61 | 8,530 2 | 8,483 158 | 8,641 -----------+----------------------+---------- Total | 16,952 219 | 17,171 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | y survey | 0 1 | Total -----------+----------------------+---------- 1 | 8,475.667 54.700871 |8,530.3683 | 99.36 0.64 | 100.00 -----------+----------------------+---------- 2 | 8,514.466 123.720288 | 8,638.186 | 98.57 1.43 | 100.00 -----------+----------------------+---------- Total | 16,990.13 178.421159 | 17,168.55 | 98.96 1.04 | 100.00 (running logit on estimation sample) Survey: Logistic regression Number of strata = 36 Number of obs = 17,171 Number of PSUs = 772 Population size = 17,168,554,314 Design df = 736 F( 1, 736) = 4.56 Prob > F = 0.0331 ------------------------------------------------------------------------------ | Linearized y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 2.survey | .8115765 .3802565 2.13 0.033 .0650597 1.558093 _cons | -5.043075 .3438202 -14.67 0.000 -5.71806 -4.36809 ------------------------------------------------------------------------------ p = .03314916 . . . * p is the significance of a test of H0: in the population, there was no difference . * in the prevalence of the outcome across the surveys . . end of do-file . exit, clear