The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » Merging data files differend years same IR
Re: Merging data files differend years same IR [message #11919 is a reply to message #11917] Fri, 03 March 2017 08:31 Go to previous messageGo to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3216
Registered: February 2013
Senior Member
Following is a response from Senior DHS Stata Specialist, Tom Pullum:

In this situation you do not want to merge the files. You cannot, in fact, merge the files for different surveys, because you have different cases. Instead, you want to append the files. That is, you make one long file in which the records in one survey appear after the records for another survey. To do a test of changes or differences you must be sure to be consistent in the variable names and you must have a code that distinguishes one survey from another. I do not use SPSS, but I can provide an example of how to do this in Stata. The following lines include sub-programs called setup1, setup2, and analyze. The execution of the program begins after the multiple lines of asterisks. It is set up for two surveys but can include any number of surveys, with "use" and "setup1" lines inserted for each survey. The paths would have to be changed. The "analyze" routine could be modified to test differences between survey 1 and survey 2, survey 1 and survey 3 (if there is a 3rd survey), etc. You can add other covariates to the logit models, do chi-square tests, etc., within the analyze routine.

set logtype text
log using e:\DHS\programs\tests\diffs_between_surveys_log_22July2016.txt, replace

* Tom Pullum, tom.pullum@icfi.com, July 25, 2016

set more off
cd e:\DHS\DHS_data\KR_files

****************************************************************

program define setup1

* Construct the indicator, number the surveys, save the needed variables

scalar ssurvey=ssurvey+1
local lsurvey=ssurvey
gen survey=ssurvey

* CONSTRUCT THE INDICATOR

* values other than 0 and 1 should be interpreted as .
replace g100=. if g100>1
replace g102=. if g102>1

gen y = .
replace y=0 if g100<.
replace y=1 if g102==1  

keep v005 v021 v023 y survey
save temp_`lsurvey'.dta, replace

end

****************************************************************

program define setup2

* Combine the surveys into one file

use temp_1.dta, clear
append using temp_2.dta

egen cluster=group(v021 survey) 
egen stratum=group(v023 survey) 

save temp.dta, replace

end

****************************************************************

program define analyze

* Test whether the "survey" variable is statistically significant

svyset cluster [pweight=v005], strata(stratum) singleunit(scaled)

tab survey y
tab survey y [iweight=v005/1000000], row

* Test for significance of change or difference
svy: logit y i.survey
scalar p=e(p)
scalar list p

* p is the significance of a test of H0: in the population, there was no difference 
*  in the prevalence of the outcome across the surveys

end

****************************************************************
****************************************************************
****************************************************************
****************************************************************
****************************************************************
* EXECUTION BEGINS HERE

* Example:  difference between two surveys in FGM prevalence

* Kenya 27.1% in 2008-09 vs 21.0% in 2014

scalar ssurvey=0

use e:\DHS\DHS_data\IR_files\KEIR52FL.dta, clear 
setup1

use e:\DHS\DHS_data\IR_files\KEIR70FL.dta, clear 
setup1

setup2
analyze

 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Issues with Merging Congo 2009 HIV Data
Next Topic: Merging KR and IR
Goto Forum:
  


Current Time: Sat Jan 4 18:42:18 Coordinated Universal Time 2025