The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Dataset use in Stata » how to reproduce malawi 2004 published sampling errors and deft values
Re: how to reproduce malawi 2004 published sampling errors and deft values [message #2154 is a reply to message #2105] Mon, 12 May 2014 14:39 Go to previous messageGo to previous message
Trevor-DHS is currently offline  Trevor-DHS
Messages: 770
Registered: January 2013
Senior Member
Thanks for your post and your detailed documentation on your questions. Firstly, for the design effect (Deft) estimation for proportions and means, you do not expect to get the exact numbers published in the DHS final report if you use a different software. Theoretically, the Deft is the standard error of an estimator with current survey design over the standard error of the same estimator if the sample was a simple random sample. The problem depends on how to estimate the standard error of the estimator if a simple random sample would be drawn. There is no standard method to produce this estimate, so the calculated Deft would have small differences.

Secondly, DHS surveys calculate the standard errors of simple estimators (proportion, mean, ratio) using Taylor linearization method as explained in the Appendix B of the DHS final report. Jackknife is only used for complex statistics that don't come down to a simple X/Y form. There are only a few statistics that we use Jackknife for - Age specific and total fertility rates, child mortality rates, and maternal mortality rates primarily. For the examples you gave we used Taylor linearization.

When I reviewed your approach to calculating the sampling errors, I came across a few issues:

1) We would now recommend that stratification to use is the stratification used in the sample design. The final report seems to indicate that the stratification was urban and rural areas within districts, with 17 of the districts grouped together into a "rest of Malawi" group. The dataset has a variable with the district codes used in the design (sdist) that can be used to create these strata, e.g.

egen strata = group(sdist v025)

Then you would use your svyset command:

svyset [pweight=weight], psu(v021) strata(strata)

We don't use the single unit parameter, and you don't need to with this specification as there are no strata with single PSUs in them..

This produces a slightly different (and more conservative) estimate of the standard errors than published.

2) If you wanted to produce the standard errors in the same manner as they were produced for the 2004 report, then you would need to use the implicit stratification approach that was previously used by DHS in earlier surveys, that is grouping neighboring PSUs into pairs or groups of three to form implicit strata (by contrast the stratification presented above is sometimes known as explicit stratification). V022 is supposed to contain that implicit stratification that was used at that time. Unfortunately I discovered an error in the creation of V022 in the recode dataset. I was though able to reconstruct the variable, and I have attached a .do file that contains the instructions for recoding v021 to produce v022 correctly. (See attached file: - make sure you drop the variable strata before using this file).

Once I use that file to recode v022 (creating the strata variable too), I can then set up my svyset command as before:

svyset [pweight=weight], psu(v021) strata(strata)

(note that we don't use v025 with v022 in the creation of the strata - it is just based on recoding v021).

Now when I use
svy: mean v201
I get a standard error and DEFT that matches the final report (the confidence intervals are slightly different as DHS uses +/-2SE, while Stata and other software often use +/-1.96SE).

Concerning your comment about the number of replicates when using Jackknife, there are two reasons for the difference: from combining v022 with v025 (which is not necessary), and v022 is incorrectly coded. With the correct coding of v022 given in the attached do file you should find a replicate for every cluster when using Jackknife.

I hope this helps.
Read Message
Read Message
Read Message
Read Message
Previous Topic: Data analysis using multiple countries different survey years
Next Topic: Creating a household level variable
Goto Forum:

Current Time: Fri Jun 24 21:10:29 Coordinated Universal Time 2022