Home » Data » Merging data files » Merging data files differend years same IR
Merging data files differend years same IR [message #11917] |
Thu, 02 March 2017 13:47 |
Davison
Messages: 5 Registered: March 2017 Location: Germany
|
Member |
|
|
Hello together,
first I want to say thank you for this great forum. It helps me a lot. I have read in this forums many questions about merging data files from
the same year but different questionaires.
I analyze the DHS data of Lesotho 2004 to 2014. I use SPSS. I want to merge the different Datasets, only the IR-Record together in one file to
make a chi square of significant differences. Can anyone help me? I must note somethings else because of weighting and my planfiles?
Thank you very much.
Many greeting from Germany!
[Updated on: Thu, 02 March 2017 13:55] Report message to a moderator
|
|
|
Re: Merging data files differend years same IR [message #11919 is a reply to message #11917] |
Fri, 03 March 2017 08:31 |
Bridgette-DHS
Messages: 3215 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Stata Specialist, Tom Pullum:
In this situation you do not want to merge the files. You cannot, in fact, merge the files for different surveys, because you have different cases. Instead, you want to append the files. That is, you make one long file in which the records in one survey appear after the records for another survey. To do a test of changes or differences you must be sure to be consistent in the variable names and you must have a code that distinguishes one survey from another. I do not use SPSS, but I can provide an example of how to do this in Stata. The following lines include sub-programs called setup1, setup2, and analyze. The execution of the program begins after the multiple lines of asterisks. It is set up for two surveys but can include any number of surveys, with "use" and "setup1" lines inserted for each survey. The paths would have to be changed. The "analyze" routine could be modified to test differences between survey 1 and survey 2, survey 1 and survey 3 (if there is a 3rd survey), etc. You can add other covariates to the logit models, do chi-square tests, etc., within the analyze routine.
set logtype text
log using e:\DHS\programs\tests\diffs_between_surveys_log_22July2016.txt, replace
* Tom Pullum, tom.pullum@icfi.com, July 25, 2016
set more off
cd e:\DHS\DHS_data\KR_files
****************************************************************
program define setup1
* Construct the indicator, number the surveys, save the needed variables
scalar ssurvey=ssurvey+1
local lsurvey=ssurvey
gen survey=ssurvey
* CONSTRUCT THE INDICATOR
* values other than 0 and 1 should be interpreted as .
replace g100=. if g100>1
replace g102=. if g102>1
gen y = .
replace y=0 if g100<.
replace y=1 if g102==1
keep v005 v021 v023 y survey
save temp_`lsurvey'.dta, replace
end
****************************************************************
program define setup2
* Combine the surveys into one file
use temp_1.dta, clear
append using temp_2.dta
egen cluster=group(v021 survey)
egen stratum=group(v023 survey)
save temp.dta, replace
end
****************************************************************
program define analyze
* Test whether the "survey" variable is statistically significant
svyset cluster [pweight=v005], strata(stratum) singleunit(scaled)
tab survey y
tab survey y [iweight=v005/1000000], row
* Test for significance of change or difference
svy: logit y i.survey
scalar p=e(p)
scalar list p
* p is the significance of a test of H0: in the population, there was no difference
* in the prevalence of the outcome across the surveys
end
****************************************************************
****************************************************************
****************************************************************
****************************************************************
****************************************************************
* EXECUTION BEGINS HERE
* Example: difference between two surveys in FGM prevalence
* Kenya 27.1% in 2008-09 vs 21.0% in 2014
scalar ssurvey=0
use e:\DHS\DHS_data\IR_files\KEIR52FL.dta, clear
setup1
use e:\DHS\DHS_data\IR_files\KEIR70FL.dta, clear
setup1
setup2
analyze
|
|
|
Re: Merging data files differend years same IR [message #11923 is a reply to message #11919] |
Mon, 06 March 2017 12:15 |
bakerchowdhury
Messages: 25 Registered: April 2014
|
Member |
|
|
I am working on a similar project (same IR files multiple years). I did manage to append data for Bangladesh 1999 to 2014 starting with the 2014 data and had year as an indicator variable for each survey using Stata.
I plan to do multiple linear regression, binary logistic, and multinomial logistic regression for three different types outcomes.
Since I will be using "svy" command, I need to consider PSU (V021) and Strata (V022). However, I noticed the values (# of unique values and range) for V021 and V023 are not equal for each year.
I was wondering if that might be a problem for my analysis.
Thank you so much for your help
Baker
|
|
|
|
Re: Merging data files differend years same IR [message #11927 is a reply to message #11917] |
Tue, 07 March 2017 14:26 |
bakerchowdhury
Messages: 25 Registered: April 2014
|
Member |
|
|
Thank you so much for the response. Yes, my survey indicator variable is 'year' which takes values 1999, 2004,...2014.
Here are the steps I have done in my code
* create sample weight variable
gen wgt=v005/1000000
* generate new psuid and stratumid variable for svy command
egen psuid=group(year v021)
egen stratumid=group(year v022)
* set svy command
svyset[pw=wgt],psu(psuid) strata(stratumid)
*Example regression
svy: reg y x v012 i.v190
Could you please have a look if this looks okay.
Best regards,
Baker
|
|
|
|
Re: Merging data files differend years same IR [message #11964 is a reply to message #11930] |
Mon, 13 March 2017 15:25 |
bakerchowdhury
Messages: 25 Registered: April 2014
|
Member |
|
|
Thank you so much for the recommendation.The code worked. Could you please tell why we are using singleunit(centered) command? Also, for the SVY command is there anything I need to pay attention to the modeling (multiple and logistic regression) exercise.
Thanks again.
Baker
|
|
|
|
|
Goto Forum:
Current Time: Mon Dec 30 13:22:13 Coordinated Universal Time 2024
|