Creating panel [message #12922] |
Thu, 10 August 2017 01:43 |
Lukresha
Messages: 14 Registered: July 2016
|
Member |
|
|
Hello,
I want to do an analysis using data from the 2003, 2008-09 and 2014 KDHS. I have appended the files to have one big file.
I would like clarification on whether setting the xtset in stata and going ahead to carry out a panel analysis is feasible with the data I am using.
|
|
|
|
Re: Creating panel [message #12943 is a reply to message #12942] |
Thu, 17 August 2017 13:03 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
This is a reasonable way to do it using that recode. But essentially you are just creating "year of birth" variables - it is unclear why this is being used to generate a "pseudo-panel". That said, if you just want to include effects for year of birth, the variable you created will work pretty well. The only issue is that some surveys (I don't know about yours) conduct interviews that cover more than one calendar year (say, Nov-Feb). Then you would be a little bit off on birth year for some households, but maybe not in a way that is problematic (it would depend a lot on how you are structuring your regressions).
All that said, your code should get you something very close to "year of birth".
|
|
|
|
Re: Creating panel [message #12945 is a reply to message #12944] |
Thu, 17 August 2017 15:05 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
I don't think you gain much by collapsing the data in that way. Why not just use the individual-level observations?
Usually when you collapse data like that it is because you are doing something like bringing in external data that is merged to the cohort (say in your case, information on when schools were built; or maybe cohort variation in exposure to some mandatory schooling laws). Even then you don't necessarily need to collapse the data down, and can just merge the variables into the individual-level data.
As for the problem of getting the birth year wrong - does it really matter? Isn't a mother born in December of, say, 2005 very similar to one born January of 2006? It isn't clear you would be wrong to lump these two groups together in one time effect. But again, it depends a lot on the data setup, such as if Dec/Jan born women had different "exposures" to something important.
Also, once you collapse the data down into averages, it wouldn't be a 0/1 variable on the left - it would be a proportion. In which case the probit model wouldn't be right. You would want to run a probit on the individual-level data...so again I don't see the need to create this pseudo-panel. You could just run a least-squares regression of some sort on the aggregate data.
That said, if there is a reason to generate the pseudo-panel, it is straightforward to do using the "collapse" command in Stata and the "by()" option as "by(cohort)" or "by(cohort region)" or whatever is appropriate. You would also perhaps want to collapse using the DHS sample weights (to get representative estimates), which is explained in various places on the forum and on Stata help forums. But it doesn't seem clear that you really want or need to do that.
|
|
|
|
Re: Creating panel [message #12947 is a reply to message #12946] |
Thu, 17 August 2017 16:35 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
Whether the estimate is biased or not depends to a great extent on the setup and what exact parameter you are trying to estimate. But in general, yes, you can pool multiple survey rounds together and use a probit, just include controls for survey round. You could also do the analysis on each dataset separately, and then test whether the coefficient you are interested in is changing from round-to-round. There are lots of interpretation issues and problems with aggregation, but none of those are solved by aggregating the data as you describe. Just pool the individual-level data together and control for cohort and/or time effects as you see fit. But there is no INHERENT bias in the probit version that wouldn't be there in an aggregate regression...just the same bias you'd have either way, which depends on exactly the model you are fitting and the parameter you are estimating.
|
|
|