The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Dataset use in Stata » Keeping caseid in and keeping missing observations out when using Stata "collapse"
Re: Keeping caseid in and keeping missing observations out when using Stata "collapse" [message #5617 is a reply to message #5611] Wed, 17 June 2015 09:26 Go to previous messageGo to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3039
Registered: February 2013
Senior Member
Following is a response from DHS Senior Stata Specialist, Tom Pullum:

If you collapse by v001, you cannot include caseid in the "by" part of the collapse command. You should replace "by(v001 caseid)" with "by(v001)". The collapsed file will have one record per cluster.

caseid is a combination of v001, v002, and v003. They are numeric variables but caseid is a string with embedded blanks.

To merge the collapsed data back onto the individual records in the IR files, you only need to sort both files on v001. However, when I do this I sort the IR file on v001 v002 v003, even though it's not really required. Since your cluster-level file does not contain v002 and v003, they are irrelevant for the merge.

So I recommend lines such as the following:

[your sort command]
sort v001
save temp.dta, replace
use IRdata.dta, clear
sort v001 v002 v003
merge v001 using temp.dta
keep if _merge==3

Like many Stata users, I prefer the old version of the merge command, but the newer one will also work.
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: merging wealth index from Ethiopia DHS 2000 to women's file
Next Topic: What is the difference between hw70_1 hw70_2 hw70_3 etc?
Goto Forum:
  


Current Time: Thu Apr 25 06:50:54 Coordinated Universal Time 2024