The DHS Program User Forum: Dataset use in Stata » Setting up pooled DHS data as panel

Home » Data » Dataset use in Stata » Setting up pooled DHS data as panel

Show: Today's Messages :: Show Polls :: Message Navigator

Setting up pooled DHS data as panel [message #2490]

Sun, 29 June 2014 10:36

dan7580
Messages: 5
Registered: June 2014

Member

I have pooled 2 rounds of data (appended) for several different countries separately and would like to set up as longitudinal data sets so that I can examine trends over time. Can you please share how DHS researchers have done this in the past using Stata? I have read about the xtset command but it would be helpful to see how this has been applied to DHS data and what steps are necessary to employ this command.

Thank you for your time.

Dan

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2491 is a reply to message #2490]

Sun, 29 June 2014 18:48

Reduced-For(u)m
Messages: 292
Registered: March 2013

Senior Member

Dan,

I've done this in a few ways, but it really depends on the outcomes you are using and the right-hand-side variables of interest. Can you give me a little more detail?

1 - each country separately, or all countries pooled?

2 - what is the unit of analysis (and which recode)? Family, child, mother?

xtset (and xtreg/whatever) may be useful, but maybe not. It really depends on the kind of variation you are trying to grab and the questions you are asking.

If you just want to look at within-country changes over time, you can do the regressions separately by round and compare the coefficient estimates from first/second survey, and then use the "sureg" commands (seemingly unrelated regressions) to get standard errors/p-values/confidence intervals). That would essentially compare means from one survey to the next, and would not require adjusting the weights or re-setting new strata or the other kinds of problems that can arise when appending multiple rounds (see the discussions on merging/appending datasets that we have only partially managed to resolve on the forums).

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2493 is a reply to message #2491]

Mon, 30 June 2014 03:37

leo2015
Messages: 4
Registered: June 2014

Member

Thanks very much for your quick reply.

1 - I am looking at each country separately
2 - the unit of analysis is mother (I have merged the IR and HR data sets previously as I needed some variables from the HR data set)

My outcomes are all binary (ANC 4+ times, TT vaccination 2+ times, etc.) I'd like to compare within country and see if these changes are significant over time.

Dan

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2501 is a reply to message #2493]

Mon, 30 June 2014 15:34

Reduced-For(u)m
Messages: 292
Registered: March 2013

Senior Member

Dan,

Then I suggest doing the regressions separately and using "suest"* to compute the standard errors of the differences. That would be the most straight forward. It works something like this:

eststo: svy: reg Y X if survey==1
estimates store reg1

eststo: svy: reg Y X if survey==2
estimates store reg2

suest reg1 reg2
test X

- that should give you a comparison of the coefficients on X from the two regressions, along with standard errors, and accounting for survey design and all that. I haven't used this with the "svy" prefix, but I think it should work fine.

Documentation for "suest" here: http://www.stata.com/manuals13/rsuest.pdf

There are of course ways to do it using the pooled data, but I think this might be the simplest, most transparent way and the easiest to get all the weighting/stratification right (because you are only estimating parameters within-survey, and then combining those estimates across surveys and the "suest" command does all the work calculating the "simultaneous (co)variance matrix of the sandwich/robust type".

*I think I called it "sureg" in the previous post - my bad (its because it uses "seemingly unrelated regressions".

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2509 is a reply to message #2501]

Wed, 02 July 2014 05:46

dan7580
Messages: 5
Registered: June 2014

Member

Thank you for the information on using the suest command. This is really helpful.

Do you mind sharing a little more about some of the weighting/stratification limitations you mentioned with setting up the data as panel?

Thanks again,
Dan

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2515 is a reply to message #2509]

Wed, 02 July 2014 21:56

Reduced-For(u)m
Messages: 292
Registered: March 2013

Senior Member

Hi Dan,

I wish I could be super helpful, but probably only a little helpful.

A good place to start is with the document I'm attaching, which comes from this thread (the "weighting" thread in general has a lot of discussion on this topic):

http://userforum.dhsprogram.com/index.php?t=tree&goto=82 &S=e3a92c3e8a765f0217181d74f8127581#msg_82

The major problem is that the DHS weighting design is intended for national (and sometimes sub-national) representativeness, but not for combining survey rounds - either within country over time, or across countries. The sample weights sum to the total sample size, and thus the meaning of some weight within some survey is lost when it is compared to an observation in another survey. The DHS way of handling that is to "de-normalize" the weights as described above. Then, in theory, the re-computed weights should work across survey rounds, and have implicit weighting for population size as well.

Unfortunately, if you read that document above, the "de-normalizing" process is complicated and requires you to bring in outside data that may or may not be good/useful. I've proposed other methods, such as forcing weights within a survey to sum to 1, and then multiplying them by some population size estimate to overlay population weights on the (DHS given) probability weights. I haven't done the math to see if this reduces to the formula provided by DHS, but I hope to get around to it at some point (I'm not a statistician/econometrician, so I'm worried I'm missing something with this method, even if the algebra seems reasonable).

You also have to re-define strata and cluster variables when using multiple survey rounds, so that cluster "101" in one survey is differentiated from cluster "101" in some other survey (and same with strata).

I think the "do all the surveys separately" thing might be easiest for you, since you are effectively looking at one survey as one observation - so all the weighting and p-value/standard-error problems reduce to the regular single-round DHS method. Then again, if the sample sizes are fairly constant within-country, you should be able to just use the DHS-provided weights and get a very similar answer. You could also ignore weighting altogether, cluster your standard errors using ", cluster(clustervar)", skip the "svy" part, and just estimate an effect within the sample population that is not scaled to be population representative (only a problem if there are heterogeneous effects of X on Y across the population - otherwise, in a lot ways, one observation is as good as any another one).

Its up to you. That's most of what I know. I'm happy to follow up on anything, but don't know if I'll have much to offer.

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2524 is a reply to message #2515]

Mon, 07 July 2014 09:17

dan7580
Messages: 5
Registered: June 2014

Member

Your feedback has been incredibly helpful -thank you.

I want to make sure that I am clear on the suest syntax example you provided previously. I read through the manual but wanted to quickly follow-up with you. It looks as though the data are appended with both survey rounds if you are setting each regression up with the "if survey==1" or "if survey==2" command. I had understood these would be done separately, is that correct?

Thanks again.

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2526 is a reply to message #2524]

Mon, 07 July 2014 13:06

Reduced-For(u)m
Messages: 292
Registered: March 2013

Senior Member

Yes - the survey rounds are actually appended before you do this procedure, you just aren't treating them as though they were appended when analyzing them. Having them in one dataset allows you to avoid problems with clearing the data in between regressions, and thus clearing out the information saved for the suest comparisons.

So you append the sets together, then run the regressions on each survey-round using an "if" command, then compare the regression results using "suest". That is just the way I have done it, I'm sure there are other ways.

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2530 is a reply to message #2526]

Tue, 08 July 2014 07:48

dan7580
Messages: 5
Registered: June 2014

Member

OK. So in this case, you would not do anything to the weights or strata before using the svy command on the pooled data?

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2535 is a reply to message #2530]

Tue, 08 July 2014 19:56

Reduced-For(u)m
Messages: 292
Registered: March 2013

Senior Member

Correct - you do not need to adjust the weights or the strata, just use the "if" command.

But - one obvious way to check this is to run some estimate on just \one survey round, then run it again on the pooled dataset using the "if" command, and compare the point-estimates and standard errors. I think you should get exactly the same results - let me know if you try that, I'm curious. If you get different numbers, try using the "subpop" command on the pooled dataset (with the sub-population being the survey round) and see if that gets you back to the single-survey estimates.

I sort of hate the "svy" command, but that is because it is really hard for me to figure out exactly how it deals with things like clustering and stratification, and what exactly the "subpop" option is doing.

Report message to a moderator

Re: Setting up pooled DHS data as panel [message #2536 is a reply to message #2535]

Wed, 09 July 2014 04:29

dan7580
Messages: 5
Registered: June 2014

Member

The "if" command does indeed work. Thank you for clarifying and for all of your help!

Report message to a moderator

Previous Topic:	Combined men/women/hiv dataset
Next Topic:	Replicating vaccination rate

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Thu Dec 11 22:44:29 Coordinated Universal Time 2025