The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Weighting data » De-normalizing weights and svyset command in Stata
Re: De-normalizing weights and svyset command in Stata [message #3620 is a reply to message #3570] Fri, 16 January 2015 11:24 Go to previous messageGo to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3035
Registered: February 2013
Senior Member
Following is a response from Senior DHS Stata Specialist, Tom Pullum
Any analysis using hiv03 (result of the HIV test) should be weighted with hiv05. Once you have merged with the AR file, you should ignore v005 or mv005 or hv005.

Your within-survey analyses are fine with the original hiv05 as the weight. Any adjustment to the weights related to pooling would be by a survey-specific multiplier and would have no effect on within-survey estimates.

My preferred way to handle the renumbering of clusters and strata in a pooled file is to use the "egen group" command. Within each survey, the cluster variable is always v001 (which is duplicated as v021). The stratum variable does not always have the same number and it is not always even named correctly. The strata are virtually always the combinations of region x v025 (v025 is urban/rural). I would find or construct that variable and then rename it as "strata", e.g. "gen strata=v022". You also need a unique identifier for "survey". You cannot rely on v000 for this, because v000 is a 3-character string such as "NG5", where "NG" is the country id and "5" is the phase of DHS. Sometimes there will be two surveys in the same phase, and v000 will be the same for both of them. (This is not an issue if you are using just one survey per country.) Anyway, you will need a line such as

egen cluster_pooled=group(survey v001)
egen strata_pooled=group(survey strata) and then you will have the unique identifiers.

To give equal weight to each survey, you need lines such as these FOR EACH SURVEY SEPARATELY:

scalar TOTWT=1000000
quietly summarize hiv05
scalar T=r(sum)
gen hiv05r=hiv05*TOTWT/T

You can do this adjustment before the pooling, or put those lines in a loop after the pooling, but just be sure that the recoding is survey-specific. These lines will remove the arbitrary factor of 1000000 from the original hiv05 and will give each survey an arbitrary TOTAL weight of 1000000. (That number could be anything you want.) This approach will give the same weight to every survey, regardless of the population of the country or the size of the sample. You need to make it very clear that your regional estimates were calculated that way. If, say, you wanted to weight each survey in proportion to its population size, you would replace "TOTWT" with the country's total population or the population age 15-49 or something like that.

I have not used "binreg" but have been using "glm..., family(binomial) link(log)" for many years. The two should be equivalent and I know glm works with svyset and svy. I'd be surprised if binreg doesn't, but if that's the case, you can switch to glm.

Let us know if any questions remain.

 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: When to use iweight and pweight in stata
Next Topic: Weighting in Namibia 2013
Goto Forum:
  


Current Time: Fri Apr 19 11:02:16 Coordinated Universal Time 2024