The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Sampling » Clustering
Clustering [message #9819] Wed, 25 May 2016 09:17 Go to next message
ahmed89o is currently offline  ahmed89o
Messages: 26
Registered: August 2013
Location: Germany
Member
Hello DHS user,
My key regressor is at the city level and my outcome at the individual level. I cluster my standard errors to the city level. Is it better to take into account the cluster effect of the DHS variable v001 made by dhs as well and how to do it stata? How to take into account the impact of the two clusters?
Re: Clustering [message #9820 is a reply to message #9819] Wed, 25 May 2016 12:15 Go to previous messageGo to next message
ahmed89o is currently offline  ahmed89o
Messages: 26
Registered: August 2013
Location: Germany
Member
To clarify more the survey design is the following: strata based on urban or rural. the cluster based on district or village level and then household to women in the household. I study the impact of city level variable, which is something between the strata and cluster (bigger than cluster but smaller than strata), on individual level variable. I do not use multilevel modeling but the standard logistic model with standard errors clustered at the city level. So my question should I cluster for district and village (the original cluster) too. Ahmed Rashad
Re: Clustering [message #9839 is a reply to message #9820] Sun, 29 May 2016 17:53 Go to previous messageGo to next message
Reduced-For(u)m
Messages: 291
Registered: March 2013
Senior Member

I think the general rule-of-thumb here is to cluster at whatever is larger: the level of your aggregate variable (city) or the level you need to account for the sampling design (PSU). It sounds like your "city" variable can include multiple PSUs, at which point I would suggest clustering at the city level. You do not need to cluster up to the level of the strata.

I don't know what you mean by "account for the impact of two clusters". Obviously, you need many clusters in order to "cluster" your standard error estimates (many like more than 40 or something, at least more than 15 or 20). Do you mean what if city includes more than one "cluster"? In that case, the answer is above: you want to cluster at the city level.

This way of thinking about it might help: if everyone in a city has the same value for your variable of interest, then are you getting N=# of observations amount of information, or C=# of cities worth of information? Probably we want to think "something in between" but it is probably closer to C than N... you are getting much less information from additional observations all with the same value of the right hand side variable than you would by getting a whole new city. Additional observations within a city (that already has many observations) gives you a little bit more information, but not a full new observation's worth. At the far extreme, you could imagine collapsing everything down to the city level and running your regression on those C observations... each new piece of information within a city would just make each of those C observations slightly less variable (a slightly better estimate of the outcome for that city).
Re: Clustering [message #18285 is a reply to message #9839] Tue, 29 October 2019 07:01 Go to previous message
Faiza is currently offline  Faiza
Messages: 2
Registered: October 2019
Member
Dear DHS user
Can you please define the cluster number variable available in DHS.I used it in my multilevel logistic regression model.But my professor wasn't satisfied.He asked to define the basis of cluster formation.On what grounds clusters were generated?
Previous Topic: Combining individuals DHS datasets in Python
Goto Forum:
  


Current Time: Mon Nov 18 15:01:15 Eastern Standard Time 2019