The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Countries » India » Districts as cluster-level for multi-level model (Weighting for multi-level modelling)
Districts as cluster-level for multi-level model [message #18796] Sun, 23 February 2020 05:28 Go to next message
dgodha
Messages: 44
Registered: November 2016
Location: India
Member
Hello,

I will appreciate your expert guidance on my query. We usually use 'psu' as the cluster level in DHS data. In my case, the group size is too small if I use 'psu'.
  	
Group Variable |     #Groups    Minimum    Average    Maximum
           psu |     25,063          1        3.1         16

Since NFHS-4 is representative at the district level and we have to anyway create a variable for the cluster-weight, I am wondering if it is possible to use district as the cluster-level. I tried changing my weighting command for psu to district but as you can see in the output, I don't get the p-values and CIs.
*Rescaling of weights
	gen wt=v005/1000000
	
*Level 1 weights using scaling method 1: New weights sum to district sample size
	gen sqw = wt*wt 
	egen sumsqw = sum(sqw), by(sdistri) 
	egen sumw = sum(wt), by(sdistri) 
	gen pwt11 = wt*sumw/sumsqw 

* Survey setting
	gen wt2=1
	svyset sdistri, weight(wt2) strata(v023) , singleunit(centered) || _n, weight(pwt11)

*Output
*******
Number of strata   =     2,509                  Number of obs     =  1,538,126
Number of PSUs     =     2,509                  Population size   =  1,438,715
Subpop. no. obs   =     78,446
Subpop. size      =  73,653.12
Design df         =          0
F(   0,      0)   =          .
Prob > F          =          .


Linearized
 y           Coef.    Std. Err.      t       P>t     [95% Conf. Interval]

_cons     -1.585093   .0192937   -82.16       .            .           .

sdistri      
var(_cons) .1527032   .0153514                             .           .

Note: 5 strata omitted because they contain no subpopulation members.
Note: Strata with single sampling unit centered at overall mean.

I am not sure what is going wrong and will appreciate any understanding.
Thank you
Deepali


Deepali
Re: Districts as cluster-level for multi-level model [message #18953 is a reply to message #18796] Tue, 24 March 2020 14:36 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3016
Registered: February 2013
Senior Member

Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

The purpose of the svy adjustment is to compensate for the similarities of respondents within clusters, the under- and over-weighting of clusters, and stratification.

I would not recommend that you change to districts as the sampling units. The adjustments for clustering and sampling weights will be seriously thrown off.

The clusters, by definition, are the primary sampling units. If you shift to districts you will capture some if the intra-class correlation that goes into the svy calculation, but not nearly all of it, and the weighting adjustment, no matter how you do it, will be incorrect. The new weights would affect all the estimates and tabulations.


Re: Districts as cluster-level for multi-level model [message #18959 is a reply to message #18953] Wed, 25 March 2020 04:57 Go to previous messageGo to next message
dgodha
Messages: 44
Registered: November 2016
Location: India
Member
Many thanks for your response.
I do have a follow-up question. If I don't use survey weights, then I can go ahead with using districts as clusters. Is that correct? I need to use districts because 85% of my PSUs have 5 or less observations.


Deepali
Re: Districts as cluster-level for multi-level model [message #18982 is a reply to message #18959] Mon, 30 March 2020 16:05 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3016
Registered: February 2013
Senior Member

Following is another response from DHS Research & Data Analysis Director, Tom Pullum:

If you ignore the weights entirely, your estimates won't mean anything. They will not be corrected for the under- and over-sampling in the survey design. They will not be unbiased estimates of population values.

I often ignore the survey design for a data quality assessment or for initial data exploration or for testing a program. However, if you want to do more than that, you need to use the weights to get unbiased estimates and use the clustering and stratification adjustments to get robust standard errors.

In other words, I recommend that you do not treat districts as clusters.
Previous Topic: Average cost of delivery
Next Topic: Hosehold Member
Goto Forum:
  


Current Time: Thu Mar 28 10:30:09 Coordinated Universal Time 2024