Creating unique household id in Women's data [message #11533] |
Mon, 09 January 2017 09:59 |
priyoma
Messages: 7 Registered: January 2017
|
Member |
|
|
Hello.
I am a student, I am working on my thesis with data for India-NFHS Round 2. The data file is IAIR42FL, ie the women's daya. In the data there is a unique caseid for each of the 90303 women surveyed.
I am doing an analysis of the effect of female education on the fertility of women in India. Now I need to create a unique Household ID for this data, can you please give me a STATA code which does that.
When I tab v002 (ie household number) I get frequency as 588 and even higher in some,so it is not that there are 588 individuals in a household right? So I need a unique HHID ie tells me which of the 90303 women belong to which household.
Awaiting a quick reply.
|
|
|
Re: Creating unique household id in Women's data [message #11561 is a reply to message #11533] |
Tue, 10 January 2017 16:49 |
Bridgette-DHS
Messages: 3214 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Stata Specialist, Tom Pullum:
Quote:In IR files, v001 is the cluster number; v002 is the household number (within v001) and v003 is the woman's line number (within v001 and v002). Together, v001 and v002 identify the household; v001 and v002 and v003 identify the individual woman. In IAIR42FL.dta, caseid is a character string that includes v001, v002, and v003. To get a household id, you can remove the last three columns (which give v003) from caseid. That is, use this line in Stata:
gen hhid=substr(caseid,1,12)
|
|
|
|
Re: Creating unique household id in Women's data [message #11591 is a reply to message #11581] |
Fri, 13 January 2017 07:55 |
Bridgette-DHS
Messages: 3214 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Stata Specialist, Tom Pullum:
Quote:In that example, the household has two women, who are on lines 4 and 7. The hhid is a combination of cluster id (hv001 or v001) and household id (hv002 or v002). (The line number is hvidx or v003.)
Most households will have one eligible respondent--if they have any at all. Only a few will have more than one. You will use up all your degrees of freedom if you have fixed effects for household. I would not even try random effects for households. The density of women per household is too low. Fixed or random effects for cluster would be as far down into the data as I would go.
You would definitely use "xi." for fixed effects--no need to construct dummy variables (fortunately!). For random effects, use an "me" model.
However, we really cannot advise on your choice of model, just on data-related issues and to a limited degree on Stata syntax.
|
|
|