Home » Data » Dataset use in Stata » Using communitylevel variables in regression models
Using communitylevel variables in regression models [message #6790] 
Thu, 16 July 2015 10:25 
Lizzynaija
Messages: 12 Registered: February 2015 Location: United States

Member 


Dear DHS researchers,
I am analyzing the association of communitylevel variables with my outcome, neonatal death, in the 2013 Nigeria DHS. Most of these variables do not already exist within the dataset, so I created them using the, collapse command to obtain the means/aggregates of the individual level variables at cluster level. I am now trying to work with them in logistic regression models, and I am not sure if I am using them in the correct way.
For example, I created a variable to represent the proportion of people that are uneducated within in a cluster: by creating a 0/1 variable (where 1 = uneducated). Collapsing on this variable gave me the mean of the 0/1 variable, which is the proportion of people within each cluster that are uneducated. And so on for the other variables.
I would now like to use these within my regressions, but not sure about whether to use as a continuous var., or whether to categorize? I tried using the communitylevel variables as continuous variables, but was not too sure about the interpretation. However, if yes to categorizing, should I use a median split vs. tertiles vs. quartiles? And also how to create these categories correctly  I tried using the xtile command, but I am not sure if this is doing what I need it to do.
Also, I would like to ask if it is mandatory to use the svy: logit for my regression analyses?
Finally, could you help me with the correct commands to turn off the Stata scientific notation? I keep getting output like "1.2e+04" which is making it difficult to properly calculate my rates.
Thank you in advance for your help,
Elizabeth



Re: Using communitylevel variables in regression models [message #6792 is a reply to message #6790] 
Thu, 16 July 2015 17:31 
ReducedFor(u)m
Messages: 292 Registered: March 2013

Senior Member 


It seems like investigating the determinants of neonatal mortality in Nigeria has been a popular thing to do lately. Some comments on your analysis plan:
1) Continuous/Categorical: this is up to you. In general, I don't think there is much to gain from turning a perfectly good continuous variable into a categorical one, but that is just my opinion. If you decide to go categorical, you should probably use no more than a few categories, otherwise you are likely to lose a lot of power. Plus, at the cluster level, there will be a lot of noise in your community estimates, and categorizing them across lots of bins is probably not helpful, because the more bins you have the more likely any given cluster is placed in the wrong bin.
2) You need to use the "svy" prefix for two reasons: in part to get the proper weights so that oversampled populations aren't overly influential in your regressions, but more than that (in this context) to get standard error estimates that are appropriate (without accounting for the clustering, your standard errors and pvalues will be too small.
3) You usually can't directly interpret the results from a logistic regression on either a continuos OR categorical variable without transforming them in some way. You need to turn them into something like marginal effects or relative risk ratios or something like that. I like marginal effects, but that is a preference and not universal. Stata can do this if you use the mfx command* or some other options. If you don't know how to interpret these, you will need to do some background reading. If you are getting coefficients in the tens of thousands range, you are likely either misspecifying something or looking at a coefficient that still needs to be transformed in some way to be interpretable.
*http://www.stata.com/support/faqs/statistics/marginaleffec tsmethods/







Goto Forum:
Current Time: Wed Jun 23 14:35:59 Coordinated Universal Time 2021
