The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » Merging DHS data in Stata
Re: Merging DHS data in Stata [message #11176 is a reply to message #71] Tue, 15 November 2016 11:26 Go to previous messageGo to next message
jack.murphy is currently offline  jack.murphy
Messages: 10
Registered: September 2016
Location: USA
Member

Thank you for these instructions.

Regarding the domain variable, I have merged the Burkina Faso datasets (BFIR62, BFMR62, BFAR62) and have found that v023 is described as the "Stratification used in sample design." Is this still the domain variable I should use in svyset? If not, could you please give me steps on how to generate a domain variable?

At the moment, this is my svyset code:

svyst [pw=hiv05], psu(v001) strata(v023)
Re: Merging DHS data in Stata [message #11183 is a reply to message #11176] Wed, 16 November 2016 09:32 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3013
Registered: February 2013
Senior Member
Following is a response from Senior DHS Stata Specialist, Tom Pullum:

Quote:
In the past, the terminology (domains vs. strata) has not always been consistent. Please interpret v023 as the stratification variable. That is what you should use in svyset, as in "svyset...., strata(v023)".

Sometimes v023 is empty. We are slowly producing a complete list of the stratification variable for all the surveys. In almost all cases, v023 (and/or the correct stratification variable) are the combinations of region (v024) and place of residence (v025). You can construct that with "egen stratumid=group(v024 v025)" and then "svyset....., strata(stratumid)". Actually I recommend "svyset....., strata(stratumid) singleunit(centered)", or another singleunit option, to avoid an error message.....

Re: Merging DHS data in Stata [message #11207 is a reply to message #11183] Thu, 17 November 2016 16:53 Go to previous messageGo to next message
jack.murphy is currently offline  jack.murphy
Messages: 10
Registered: September 2016
Location: USA
Member

Brilliant, this worked! Thank you for your help.

I have another question regarding using svyset for master dataset created by appending multiple countries together. I generated a variable for each country (called "country_ID" with strings for the names of each country included). Then I used the dropdown menus to make a 2-stage svyset codeline with country_ID as the higher level strata variable and stratumid (from your earlier post) as the lower level strata variable. Here is my code:

svyset v001 [pweight=hiv05], strata(country_ID) vce(linearized) singleunit(centered) || v001, strata(stratumid)

Is this accurate?
Re: Merging DHS data in Stata [message #11221 is a reply to message #11207] Fri, 18 November 2016 10:31 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3013
Registered: February 2013
Senior Member
Another response from Tom Pullum:

Quote:

Hi Jack--The only change I would make to that line would be with v001. That's the cluster (PSU) id code and it is typically numbered 1, 2, 3, etc., within each survey. If you don't renumber the clusters in the combined file, Stata will think that cluster 1 in each survey is the same cluster, etc. You need to renumber the clusters in the same way that you renumbered the strata (in Stata the command would be "egen clusterid=group(survey v001").

For a very few surveys (only the Egypt surveys, so far as I know) the PSU variable is v021 rather than v001. Usually v001 and v021 are exactly the same. If they are not, you would use v021 in svyset.

Re: Merging DHS data in Stata [message #11224 is a reply to message #11221] Fri, 18 November 2016 11:28 Go to previous messageGo to next message
jack.murphy is currently offline  jack.murphy
Messages: 10
Registered: September 2016
Location: USA
Member

Thanks very much!
Re: Merging DHS data in Stata [message #11225 is a reply to message #11221] Fri, 18 November 2016 11:38 Go to previous messageGo to next message
jack.murphy is currently offline  jack.murphy
Messages: 10
Registered: September 2016
Location: USA
Member

Oh I forgot to ask: in the line of code you wrote,

egen clusterid=group(survey v001)

By "survey" did you mean the v000 variable that identifies the country and phase of the survey? Or would you recommend that I create a new variable called "survey" with a unique number for each country, the same way we created "sex" in the men's and women's recode files before appending them?
Re: Merging DHS data in Stata [message #11226 is a reply to message #11225] Fri, 18 November 2016 14:36 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3013
Registered: February 2013
Senior Member
Another response from Tom Pullum:

Quote:
Sorry about that. Yes, you need an identifier for each survey. You cannot use v000 for that purpose, because it can happen that DHS does two surveys in the same country with the same version of the core questionnaire. (The third character of v000 is often taken to be the phase of DHS--that's what I thought until relatively recently--but it actually identifies the version of the questionnaire.) You can find many instances of two successive surveys in the same country having the same value of v000. It is best to construct a unique identifier for each survey, as you describe. That's what I meant by "survey" in that line.
Re: Merging DHS data in Stata [message #11227 is a reply to message #11226] Fri, 18 November 2016 14:37 Go to previous messageGo to next message
jack.murphy is currently offline  jack.murphy
Messages: 10
Registered: September 2016
Location: USA
Member

Excellent, thank you.
Re: Merging DHS data in Stata [message #14792 is a reply to message #11227] Sun, 06 May 2018 05:58 Go to previous messageGo to next message
Hassen
Messages: 121
Registered: April 2018
Location: Ethiopia,Africa
Senior Member
Dear all Thank you,This posts are very helpful!!

Hassen Ali(Chief Public Health Professional Specialist)
Re: Merging DHS data in Stata [message #19337 is a reply to message #70] Tue, 02 June 2020 06:21 Go to previous messageGo to next message
pie is currently offline  pie
Messages: 4
Registered: June 2020
Member
Hi all,

I have a similar question. I am using Cambodia DHS datasets 2014. I am seeking help.

I would like to test differences between sexually active men and women with regard to different variables (age, education, residence, wealth, occupation, PTMCT knowledge, HIV stigma...etc.). I will use basic inferential statistical tests (chi-square, fisher, t-test or wilcoxon) to detect the differences between the two groups.

The IR dataset (women) contains: v005 (sample weight), v021 (sampling unit) and v022 (strata). The MR (men) contains: mv005, mv021 and mv022. These variables are needed to declare survey design in STATA when doing separate men and women analyses.

After appending/merging, what is the appropriate weight approach I should use to achieve my goal mentioned above? Please advise!

Thanks
Pie

[Updated on: Tue, 02 June 2020 06:22]

Report message to a moderator

Re: Merging DHS data in Stata [message #19373 is a reply to message #19337] Fri, 05 June 2020 15:32 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3013
Registered: February 2013
Senior Member

Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

After any merging, if you have multiple versions of the *v021 or *v022 variables, it makes no difference which one you use. But if you have multiple versions of the *v005 variables, it does make a slight difference which one you use. The priority is like this: v005 has priority over hv005 for women and children; mv005 has priority over hv005 for men; mv005 has priority over v005 for couples; and d005 has priority over v005 when using the domestic violence module for women. I don't believe this survey included HIV testing, but for surveys that do, when using HIV test results, hiv05 has priority over any other version of the weight. These rules are based on the general pattern of nonresponse. Each weight includes an adjustment for nonresponse.
Previous Topic: How to match women to their husbands
Next Topic: Merging HR and PR Files
Goto Forum:
  


Current Time: Mon Mar 18 23:46:44 Coordinated Universal Time 2024