duplicate caseid [message #24397] |
Mon, 09 May 2022 02:51 |
adis
Messages: 20 Registered: May 2022
|
Member |
|
|
hi would you please forward me the stata command for identifying the duplicate caseid?
Is there any way that we can change the string caseid to numeric? it was important var for our analysis but we failed to change it?
please hepl us
|
|
|
|
|
Re: duplicate caseid [message #24424 is a reply to message #24397] |
Fri, 13 May 2022 10:59 |
Janet-DHS
Messages: 852 Registered: April 2022
|
Senior Member |
|
|
Following is response from DHS Research & Data Analysis Director, Tom Pullum:
What survey and what file are you using? In the KR file, for example, there is a record for every child born in the past five years. caseid is the mother's ID code. Because many women had more than one child in the past five years, there will be several records with the same value of caseid, but children of the same mother will have different values of bidx (1, 2, etc.). In the IR file, there should never be a repeat of the same caseid, although very rarely we will find a duplicate. To check for duplicates in the IR file, in Stata, enter "gen ncases=1", then "collapse (sum) ncases, by(caseid)", then "tab ncases" and "list if ncases>1, table clean".
|
|
|
|
Re: duplicate caseid [message #24480 is a reply to message #24397] |
Fri, 20 May 2022 15:45 |
Janet-DHS
Messages: 852 Registered: April 2022
|
Senior Member |
|
|
Following is response from DHS Research & Data Analysis Director, Tom Pullum:
I have to tell you that a comment such as "I appreciate swift responses" will not accelerate our response to a forum question.
You are looking for a way to extract the different columns of caseid (or hhid) and convert them from strings to numeric. The response to a recent forum question (#24358) describes how to do this. "destring caseid, gen(Ind_ID)" will not work because embedded blanks should not be interpreted as zeroes.
Usually, caseid just combines v001 and v002 and v003, and hhid combines hv001 and hv002. You can identify cases just as easily with those components, which are numeric, as with caseid or hhid.
The usual variables in the KR file for having received the basic vaccines are h0, h2, h3, h4, h5, h6, h7, h8, h9, h9a (you should check your survey). These variables are coded 0 if the child did not receive a specific vaccine. You could do something like "drop if (h0+h2+h3+h4+h5+h6+h7+h8+h9a+h9b)==0". I recommend caution with dropping cases from the file. An alternative would be "gen condition=0" and "replace condition=1 if (h0+h2+h3+h4+h5+h6+h7+h8+h9a+h9b)==0". Then you can exclude those cases from a specific command with something like "tab A B if condition==0"
|
|
|
|