The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Weighting data » Pooling cross country IPUMS-DHS data for all available surveys; using svset (Pooling IPUMS-DHS data from different countries and years)
Re: Pooling cross country IPUMS-DHS data for all available surveys; using svset [message #17922 is a reply to message #17887] Thu, 18 July 2019 08:34 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3039
Registered: February 2013
Senior Member

Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

There have been many postings on how to do this topic. Rather than repeat that information, I'd like to question what you mean by "pooling". Term can mean different things.

Something that many users do, and that we sometimes do here at DHS, is to combine several surveys into a single file, using the "append" command in Stata. Then you can loop through the analysis of each survey, one at a time, within a single combined file. You don't have to keep opening and closing separate files. This can make data processing more efficient but it doesn't affect the results. You can also use a combined file to look at differences between surveys.

However, sometimes the term means that you want to produce estimates that refer to some mega-population, such as "East Africa". I would only advise this for the analysis of some relatively rare event, such as fistula, for example, where you may not have enough cases (i.e. statistical power) in the separate surveys. But when you pool surveys this way, the reference time is blurred, you don't usually have a well-defined population, and the relative weight of each survey will be proportional to the sample size for that survey, which is arbitrary (unless you adjust the weights to make each survey count equally or to be proportional to the population of the country). With child mortality, you have enough cases for each survey and I wouldn't recommend this kind of pooling.

I wish I knew more about IPUMS, but I HOPE the files include a stratum variable even if they don't include region. Usually the stratum variable is just a crossing of all combinations of v024 and v025, and when it isn't, the stratum variable is MORE appropriate than the crossing of v024 and v025. If you replace "v024 v025" in your command "egen stratid = group (year v024 v025), label" with the name of the stratum variable, you will be ok.

To avoid the single sampling unit problem, just add "singleunit(centered)" at the end of the svyset command. Hope that works.
 
Read Message
Read Message
Previous Topic: Pooling data & DV weights
Next Topic: PSU and Strata identifiers - DV module Nepal
Goto Forum:
  


Current Time: Wed Apr 24 19:27:12 Coordinated Universal Time 2024