The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Weighting data » Weighting in pooled data
Weighting in pooled data [message #9771] Mon, 16 May 2016 07:54 Go to next message
owraza is currently offline  owraza
Messages: 31
Registered: December 2013
Location: Tehran
Member
I have tried to read extensively about weighting posted on this forum (also, re-watched the DHS webinar on weighting) and while doing so I found one comment by Tom Pullum (DHS) in message # 6672, quoting:

"If you construct a cluster-level variable using the collapse command, it is not necessary to use weights at all, because everyone in the same cluster has the same weight. To confirm this, you could collapse WITH weights and then collapse WITHOUT weights, and compare the two sets of numbers. They should be exactly the same.

However, if you want to collapse for a larger aggregate, such as a district or region, which includes more than one cluster, you definitely should use weights as part of the collapse."

My concern in this regard is, does this information still valid for a scenario where I have to pool various countries together (after collapsing at cluster level) and run regression analyses? Any suggestion where do I apply weighting?
Re: Weighting in pooled data [message #9776 is a reply to message #9771] Mon, 16 May 2016 19:39 Go to previous messageGo to next message
Reduced-For(u)m
Messages: 292
Registered: March 2013
Senior Member

Owraza,

I think this comes down to a matter of interpretation, not a specific right/wrong way of doing it.

Lets suppose you have 2 countries and 10 regions each. After collapsing, you have 20 observations, 10 from each country. Each of these is representative of a particular region. You know want to infer some parameter value from the data you have - say, the effect of cluster-aggregrated variable X.

As an extreme example, suppose that your two countries were Nigeria (population 180 million) and Burkina Faso (population 17 million). If you wanted "average stunting rate" for children, you would want to weight the Nigerian data to be about 10 times more influential than the BF data. Or, more specifically, you'd want to weight each region by the relevant population (say, children under 5). In that case, you'd just give each region (after collapsing) a new weight that was equal to its population.

Now, suppose you are interested instead if the effect of variable X. If the effect of X is the same everywhere, you don't need to do any further weighting if you don't want to. However, there are two reasons you could still weight. 1) the estimates from larger sample sizes within regions are less variable (remember, your region-level aggregates are really region-level "estimates"), and so weighting-up the higher-sample-size regions can give you some statistical power; 2) we might think that there are variations in the effect of X for different people due to unobserved factors, and what we want to know is the "average treatment effect" of the whole population. In that case, if you can't model the heterogeneity in the effect of X, you might want to weight by population again in order to back out the average effect on the population.

One way to think about it is: am I interested in doing inference on levels/effects for Regions or Individuals? If regions, then once you collapse you are fine (each region is its own observation, and another region from whatever country is adding one new, equally important data point). If you want to know about average effects across individuals in the whole population of the two countries (taken as one population for purposes of inference) then you'd probably want to weight these regions by population in some manner.

But like I said, I don't think this is a "right/wrong" kind of thing. It is an "it depends" kind of thing. Once you collapse to region, you have a region-level-estimate that is weighted to be representative of that region. What you do from there depends on what you want that region to tell you and what you want your analysis to capture. In the world of estimating causal effects, we tend to pretend that our constant-effect model is right, in which case you don't need to weight further. But in worlds interested in estimating population-level characteristics, then you do want to acknowledge that different regions are telling you information about different numbers of people.

Help? Or just obscure things more?
Re: Weighting in pooled data [message #9783 is a reply to message #9776] Tue, 17 May 2016 08:28 Go to previous message
owraza is currently offline  owraza
Messages: 31
Registered: December 2013
Location: Tehran
Member
First of all, I should apologize for the duplication of my question & (in other post) Tom Pullum has replied to my question as well. No, it has not obscure anything. And in the light of what Pullum has suggested for re-scaling the weights and acknowledging size of base population, your comment is very helpful. As you noted out that this is an "it depends" kind of thing, it is more difficult to go in one direction and defend that point of view to reviewers.
Previous Topic: Regarding de-normalizing and weighting procedures in Stata
Next Topic: Weighting in pooled data
Goto Forum:
  


Current Time: Thu Jun 30 08:45:49 Coordinated Universal Time 2022