Keeping caseid in and keeping missing observations out when using Stata "collapse" [message #5611] |
Tue, 16 June 2015 10:54 |
Lizzynaija
Messages: 12 Registered: February 2015 Location: United States
|
Member |
|
|
Dear DHS Researchers,
I am analyzing contextual determinants of neonatal mortality using the Nigeria 2013 DHS. I am currently trying to aggregate individual-level statistics to create community-level variables using the Stata collapse command.
I would be grateful if I could get some guidance on how to tackle the following challenges:
1)According to the Stata manual,"collapse" will, by default, use all my observations to calculate the summary statistics; if I want to exclude missing observations for variables, I am to specify the "cw" option. However, when I included this option, Stata returned an error message: "no observations" and I am not sure how to get around this.
2) I want to collapse by PSU (v001) so as to get the community means for my variables. However, I am running into problems with keeping my caseid variable in the collapsed dataset. I need to the caseid variable to stay in the dataset so that I can merge the community level means back onto the original IR dataset. However, when I put it in the by() portion of "collapse", it causes the dataset to collapse by the caseid, and not the PSU.
Below is the code I have been working with:
#delimit;
collapse(mean) commresid=wherelives commregion=region meancommeduc=comm_educlvl communemp=unemployed
commpoverty=poverty commpovlevel=povlevel commwealth=v190 commanc=ancvisits commpostnatal=postchk commsba=birthassist
commdelivery=birthplace commfemeduc=femeduc commeneduc=meneduc commfemjob=femjob commenjob=menjob
commworkprev=workprevyr commfirstmarr=agefirstunion commfirstbirth=matagefirstbirth commallkids=parity
commidealkids=idealkidnum commsons=xxsonsalive commgirls=xxgirlsalive commdecision=all_decision
commviolence=violence commcontrol=control, by(v001 caseid) cw;
#delimit cr
Please what are the steps I should take to get this to work properly?
Thank you very much,
Elizabeth
|
|
|
Re: Keeping caseid in and keeping missing observations out when using Stata "collapse" [message #5617 is a reply to message #5611] |
Wed, 17 June 2015 09:26 |
Bridgette-DHS
Messages: 3201 Registered: February 2013
|
Senior Member |
|
|
Following is a response from DHS Senior Stata Specialist, Tom Pullum:
If you collapse by v001, you cannot include caseid in the "by" part of the collapse command. You should replace "by(v001 caseid)" with "by(v001)". The collapsed file will have one record per cluster.
caseid is a combination of v001, v002, and v003. They are numeric variables but caseid is a string with embedded blanks.
To merge the collapsed data back onto the individual records in the IR files, you only need to sort both files on v001. However, when I do this I sort the IR file on v001 v002 v003, even though it's not really required. Since your cluster-level file does not contain v002 and v003, they are irrelevant for the merge.
So I recommend lines such as the following:
[your sort command]
sort v001
save temp.dta, replace
use IRdata.dta, clear
sort v001 v002 v003
merge v001 using temp.dta
keep if _merge==3
Like many Stata users, I prefer the old version of the merge command, but the newer one will also work.
|
|
|
|
|
Re: Keeping caseid in and keeping missing observations out when using Stata "collapse" [message #6672 is a reply to message #6666] |
Thu, 25 June 2015 11:53 |
Bridgette-DHS
Messages: 3201 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Stata Specialist, Tom Pullum:
If you construct a cluster-level variable using the collapse command, it is not necessary to use weights at all, because everyone in the same cluster has the same weight. To confirm this, you could collapse WITH weights and then collapse WITHOUT weights, and compare the two sets of numbers. They should be exactly the same.
However, if you want to collapse for a larger aggregate, such as a district or region, which includes more than one cluster, you definitely should use weights as part of the collapse.
|
|
|
|
|
|
|
|