Home » Data » Weighting data » Comparing coefficients across years in the same country
|Re: Comparing coefficients across years in the same country [message #483 is a reply to message #481]
||Mon, 27 May 2013 15:13
Registered: March 2013
I think this is just about right. A couple of quick points:
1 - Re-normalizing weights: In this case, I think you do want to re-normalize weights. You want each survey to represent the whole country at one point in time, and that is what each survey, re-normalized, will do. I would do it slightly differently than you do though (they will come out very similar). Here's my method:
*First - trim your sample to it's final set of observations, assuming you'll lose some for missing covariates and the like, then:
egen totalweight = sum(weight), by(survey) /*survey being survey round
gen pweight = DHSweight/totalweight
So now all your weights sum to 1 within a survey. Implicitly, you're ignoring changes in population size over time.
2 - Are you worried about the standard errror/weighting problem here? As in, how to get svyset/svy set up right? Here's one way to think about it. You want to basically do what the DHS FAQs say, but you want to make sure that you have region/psu generated separately for each survey round. I usually do something like concatenate the year of the survey to the PSU, and use that as the "cluster" in my svyset. The weights you computed above would be the right weights. So the example code from the website is:
Example Stata code:
generate weight = YOUR NEW WEIGHT
*make unique strata values by region/urban-rural (label option automatically labels the results)
egen strata = group(v024 v025), label
*tell Stata the weight (using pweights for robust standard errors), cluster (psu), and strata:
svyset [pweight=weight], psu(v021) strata(strata)
....but you want to replace the "strata" line with something like: gen strataUse = strata+surveyyear*10000 or something...just some way so that strata are separate by survey round. also, you'd want to replace the v021 in the "psu" bit with something that made sure the psu number wasn't repeated over rounds (like, you don't want v021=45 in two different rounds...I can't remember exactly what form v021 is in digit-wise).
3 - I think that is close, but not exactly right, but maybe I'm understanding it wrong. So, your idea is to know how urban males education levels have changed over time. One thing to do is just:
egen meanEd = mean(Education) if male==1 & urban==1, by(round)
but you want to use regression for standard errors and weighting and what not, so cool.... so suppose your model has all the year dummies, an urban and a male dummy, and then YearXUrbanXMale dummies too (but no YearXUrban and YearXMale dummies). The coefficient on YearXUrbanXMale would look at how education for urban men was different than education for the rest of the population (women, and rural men) in that year. I'm not sure that is exactly what you are going for. Maybe think about the problem like this: you have 4 potential groups of MaleXUrban - men in cities, rural men, and the corresponding women's groups. One of these groups, within each year, will always be a reference population (right now, implicitly I think you have are comparing urban men to all the other three groups), and the coefficient on the YearXcovariate dummies will be the difference between the mean education for the excluded group(s) in that year, and the group of interest.
But I think maybe you want to know, say, has the gap between rural/urban education levels for men been increasing or decreasing? In that case, first drop all the women, then regress Education on Year dummies, and dummies for SurveyXUrban (and no Urban dummy alone), then when you graph SurveyXUrban across survey round, you'll have the difference between rural men and urban men in education level for that survey year.
If you just wanted to know how urban men's education was changing over time....just drop all the non-urban men, and look at the survey dummies (for mean) or survey dummies interacted with X (for marginal effect of X in that year). Note, if you include both X itself and Survey*X, the effect in any year is the sum of the two coefficients.
That was a really long response, huh? Sorry, had a big cup of coffee with me. If I was unclear let me know.
Current Time: Tue Sep 26 04:57:08 Coordinated Universal Time 2023