The DHS Program User Forum

Today's Messages (on) | Unanswered Messages (off)

Topic: Strange Issues w/ Data Formatting from DHS

Re: Strange Issues w/ Data Formatting from DHS [message #29156 is a reply to message #29145]

Wed, 01 May 2024 12:02

tednoel
Messages: 6
Registered: April 2024

Member

Hi, thanks so much for the response. I've been a bit stumped by this merging process because there are three different datasets (IR, PR, and Wealth Index) that I have to combine and I'm a bit confused as to which one should be my base for merging. I was going to make the IR my base for merging because I've seen code that allows for the renaming of the cluster number, household number, and respondent's line number such as this:

use "/Users/tbear/Desktop/THESIS DATA/Tanzania_1999/Tanzania_1999_dta/TZIR41FL.dta", clear

* keep the variables you want
keep v0*
sort v001 v002 v003
save e:/Users/tbear/Desktop/THESISDATA/Tanzania_1999/Tanzania_199 9_dta/TZIR41FL.dta, replace

* Prepare PR file and merge
use "/Users/tbear/Desktop/THESIS DATA/Tanzania_1999/Tanzania_1999_dta/TZPR41FL.dta", clear
* reduce to women who are eligible for the IR file
keep if hv117==1

* keep the variables you want
keep hv0* sa33 sh*

rename hv001 v001
rename hv002 v002
rename hv003 v003

sort v001 v002 v003
merge v001 v002 v003 using ***Not entirely clear what I should be using here
tab _merge
******

BUT the problem is the wealth index only has the hhid variable that I can use to merge- and the IR file does not have this, only the PR file does. Should I be using the PR file as my base, merging the IR file, and then appending the wealth index?

Thank you so much for all of your help.

Report message to a moderator

Re: Strange Issues w/ Data Formatting from DHS [message #29157 is a reply to message #29156]

Wed, 01 May 2024 13:01

Trevor-DHS
Messages: 792
Registered: January 2013

Senior Member

Hi
A few notes:
1) It looks like you are opening the IR file, then keeping just a few variables and sorting the file, and then overwriting the original file. This is generally not considered good practice as you are modifying the original file. Generally, you should start with your original file, but save to an interim file with a different name or in a different folder (or both).
2) The naming of your THESISDATA folder seems to vary - in two cases it has a space between THESIS and DATA and in one case it doesn't (this may be a display issue in the user forum as it occasionally puts extra blanks into the text).
3) In terms of the order of merging, I would start by merging the wealth index to the PR file and saving your output to an intermediate file. They both should have hhid so you should be able to merge those without problem. Then merge the info from the PR/wealth data onto the IR file. Below is a rough outline of the process (I haven't tested this, so there may be some bugs - this is just to give you the order of operations):

use TZWIxxxx.dta
sort hhid
save TZWIxxxx.dta, replace

use TZPRxxxx.dta, clear
* keep the variables you want from the PR file
keep hhid hv0* ...
sort hhid
merge m:1 hhid using TZWIxxxx.dta
clonevar v001 = hv001
clonevar v002 = hv002
clonevar v003 = hvidx
sort v001 v002 v003
save TZPRxxxx_temp.dta

use TZIRxxxx.dta, clear
sort v001 v002 v003
merge 1:1 v001 v002 v003 using TZPRxxxx_temp.dta

It is also possible to construct hhid from hv001 and hv002 or from v001 and v002, and vie versa, but I don't think you need to.

[Updated on: Wed, 01 May 2024 13:02]

Report message to a moderator

Re: Strange Issues w/ Data Formatting from DHS [message #29159 is a reply to message #29157]

Wed, 01 May 2024 14:40

tednoel
Messages: 6
Registered: April 2024

Member

Hi Trevor, thank you SO much for the guidance :). The DHS data is amazing but definitely a little tricky to navigate at first. I've adapted the code to meet the needs of what I'm trying to do but I've been a bit stuck since your previous message because I keep receiving an error message on STATA telling me that "variable hhid does not uniquely identify observations in the using data"

I'm not sure if this is because I had to clone whhid and set it equal to hhid at first (because the case identifier in these older wealth indexes doesn't match exactly the case identifier in the PR file) but in any case I know for a fact that hhid uniquely identifies cases in the PR file.. Could this be a situation wherein I have to construct hhid from hv001 and hv002 or from v001 and v002 as you alluded in your previous message (really hope not lol)... Below is my code just in case you might be able to see any problems I haven't picked up on so far.

use TZWI41FL.dta
sort whhid
save TZWI41FL.dta, replace

use TZPR41FL.dta, clear
* keep the variables you want from the PR file
clonevar whhid = hhid
keep hhid hv005 hv007 hv025 hv219
sort hhid
merge m:1 hhid using TZPR41FL.dta
clonevar v001 = hv001
clonevar v002 = hv002
clonevar v003 = hvidx
sort v001 v002 v003
save TZPR41FL_temp.dta

use TZIR41FL.dta, clear
sort v001 v002 v003
merge 1:1 v001 v002 v003 using TZPR41FL_temp.dta

As always thank you, thank you for any guidance you might be able to provide!

Attachment: Screen Shot 2024-05-01 at 9.40.04 PM.png
(Size: 24.97KB, Downloaded 1 time)

Report message to a moderator

Re: Strange Issues w/ Data Formatting from DHS [message #29161 is a reply to message #29159]

Wed, 01 May 2024 15:50

Trevor-DHS
Messages: 792
Registered: January 2013

Senior Member

The following line:

merge m:1 hhid using TZPR41FL.dta

should refer to the WI file, not the PR file.

Report message to a moderator

Current Time: Wed May 1 21:34:21 Coordinated Universal Time 2024