Strata with single PSU [message #1602] |
Mon, 17 March 2014 09:53 |
nina
Messages: 2 Registered: February 2014
|
Member |
|
|
Hello,
I'm doing an analysis with Stata by pooling all DHS surveys (children recode) from 1900 - 2013.
When I specifiy surveyset to account for the survey design while running the regression I get no standard errors due to the fact that there are Strata with only one PSU. In some cases this is based on the fact that observations are deleted by Stata while running the regression, due to missing outcome or explanatory variables. In other cases this is based on the sample design.
What should I do with Strata with a single PSU?
There is the option in Stata so that I can treat all of those PSU as PSU's that were selected into the sample with a propability of 1. Is this the right way to handle with all the Strata with only one PSU?
Thanks a lot for your help!
|
|
|
Re: Strata with single PSU [message #1603 is a reply to message #1602] |
Mon, 17 March 2014 10:25 |
|
user-rhs
Messages: 132 Registered: December 2013
|
Senior Member |
|
|
Hi Nina,
A couple of things. First, are you sure that the ones that were dropped because of missing outcome truly had missing outcome? That is, it is missing because the question was not asked for that person or in that survey round for that country. Have you looked in the country-specific variables to make sure that the variables were not stored in there?
Second, if people are being dropped out of your model because of missing explanatory variables, you have bigger issues than the svyset problem you are describing. Perhaps they are missing values on just one, but not all of your explanatory variables. If you have substantial amounts of people being dropped from the model because of missingness of values, I suggest you find out what variables are causing people to drop out of your model. If they are dropping out in large amounts because of missing values on a small amount of variables, you can create a flag for missing for each variable where the value is 1 if the value is missing and 0 otherwise, and recode the missing values in your explanatory variables to 0. That way, you get to keep all of your observations that are not missing on the outcome in the model. The coefficient on the missing flags may not have meaningful interpretation, but at least you are not selectively losing people over one or two variables with missing values out of the 10 or however many you have in your model.
Either way, seems like you might be dealing with selection issues here. I would investigate the explanatory variables first and see if creating those flags helps. Then, I would look at the outcome variable. Are they missing at random or is there endogeneity/self-selection into answering/not answering the question used for your outcome?
HTH,
RHS
[Updated on: Mon, 17 March 2014 10:29] Report message to a moderator
|
|
|
|
|
|