The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Dataset use in Stata » Merging datasets from multiple countries
Merging datasets from multiple countries [message #4010] Tue, 17 March 2015 10:32 Go to next message
bwbennett09 is currently offline  bwbennett09
Messages: 3
Registered: March 2015
Location: Providence, RI
Member
I am working on an ecological study across 7 countries regarding the relationship of measures of Women's Empowerment to HIV status. What would the best procedure be for merging the country level datasets in stata? I have already merged the HIV datasets to the Individual Recode datasets for each individual country.

Once I merge these country level datasets into one large dataset for analysis, what would be the best way to weight the dataset? Thanks!
Re: Merging datasets from multiple countries [message #4018 is a reply to message #4010] Tue, 17 March 2015 21:15 Go to previous messageGo to next message
Reduced-For(u)m
Messages: 292
Registered: March 2013
Senior Member


1) You don't want to merge them, you want to append them. It is easy: append using "filename"

2) Yes. You have to consider weights carefully and de-normalize them before using them. There is a lot of discussion on the boards about that - look in the "weighting data" threads.
Re: Merging datasets from multiple countries [message #4022 is a reply to message #4018] Wed, 18 March 2015 08:25 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3016
Registered: February 2013
Senior Member
Following is a response from DHS Senior Stata Specialist, Tom Pullum:


You can combine several countries/surveys in the way you describe with "append". It's an easy but powerful command. You can get the syntax with "help append". Here are a few suggestions. First, be sure to have some variable (I call it "survey") to identify the different surveys. You cannot always rely on hv000 or v000 to do this. Second, understand that some variables, such as region, are country-specific and the codes will mean different things in different surveys. Third, the variable and value labels for the final survey in the append will be the only ones that are saved. Fourth, do not keep more variable than you need because the file can get very large. Fifth, if you use svyset, you need to re-define the cluster and stratum variables, for example with "egen cluster=group(survey v001)", and probably re-normalize the weights. I prefer to weight each survey equally, for example by forcing the total weight in each survey to be one billion (hv005 or v005 is constructed to have mean value of 1 million). The steps to do this sort of thing are described elsewhere on the forum.
Previous Topic: Link the DHS Individuals (IR) with DHS HIV (AR)
Next Topic: Creating a panel dataset
Goto Forum:
  


Current Time: Thu Mar 28 04:55:06 Coordinated Universal Time 2024