The DHS Program User Forum: Dataset use in Stata » Merging datasets from multiple countries

Home » Data » Dataset use in Stata » Merging datasets from multiple countries

Show: Today's Messages :: Show Polls :: Message Navigator

Merging datasets from multiple countries [message #4010]

Tue, 17 March 2015 10:32

bwbennett09
Messages: 3
Registered: March 2015
Location: Providence, RI

Member

I am working on an ecological study across 7 countries regarding the relationship of measures of Women's Empowerment to HIV status. What would the best procedure be for merging the country level datasets in stata? I have already merged the HIV datasets to the Individual Recode datasets for each individual country.

Once I merge these country level datasets into one large dataset for analysis, what would be the best way to weight the dataset? Thanks!

Report message to a moderator

Re: Merging datasets from multiple countries [message #4018 is a reply to message #4010]

Tue, 17 March 2015 21:15

Reduced-For(u)m
Messages: 292
Registered: March 2013

Senior Member

1) You don't want to merge them, you want to append them. It is easy: append using "filename"

2) Yes. You have to consider weights carefully and de-normalize them before using them. There is a lot of discussion on the boards about that - look in the "weighting data" threads.

Report message to a moderator

Re: Merging datasets from multiple countries [message #4022 is a reply to message #4018]

Wed, 18 March 2015 08:25

Bridgette-DHS
Messages: 3230
Registered: February 2013

Senior Member

Following is a response from DHS Senior Stata Specialist, Tom Pullum:

You can combine several countries/surveys in the way you describe with "append". It's an easy but powerful command. You can get the syntax with "help append". Here are a few suggestions. First, be sure to have some variable (I call it "survey") to identify the different surveys. You cannot always rely on hv000 or v000 to do this. Second, understand that some variables, such as region, are country-specific and the codes will mean different things in different surveys. Third, the variable and value labels for the final survey in the append will be the only ones that are saved. Fourth, do not keep more variable than you need because the file can get very large. Fifth, if you use svyset, you need to re-define the cluster and stratum variables, for example with "egen cluster=group(survey v001)", and probably re-normalize the weights. I prefer to weight each survey equally, for example by forcing the total weight in each survey to be one billion (hv005 or v005 is constructed to have mean value of 1 million). The steps to do this sort of thing are described elsewhere on the forum.

Report message to a moderator

Previous Topic:	Link the DHS Individuals (IR) with DHS HIV (AR)
Next Topic:	Creating a panel dataset

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Thu Dec 11 18:06:27 Coordinated Universal Time 2025