The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Dataset use in Stata » Link the DHS Individuals (IR) with DHS HIV (AR)
Link the DHS Individuals (IR) with DHS HIV (AR) [message #2850] Tue, 02 September 2014 08:39 Go to next message
DHS user is currently offline  DHS user
Messages: 111
Registered: February 2013
Senior Member
I am a Demography and Population Studies student at the University of Witwatersrand, South Africa. As part of our curriculum we are required to conduct some research using the DHS data. Our current topic is on age at first sex and HIV infection among women in Swaziland. We would like to know of the possible ways in which we can link the DHS Individuals Data with the DHS HIV data since we would like to check the profiles of these people. We know on the Individuals Data there is a variable serves as an indicator of whether a blood sample was taken and also on the HIV Data there is a unique barcode identifier for each person tested. We will be using STATA as our analysis tool.
Re: Link the DHS Individuals (IR) with DHS HIV (AR) [message #2851 is a reply to message #2850] Tue, 02 September 2014 08:40 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3016
Registered: February 2013
Senior Member
Following is a response from DHS Senior Stata Specialist, Tom Pullum:

Inserted below are the lines to do this. You will have to change the paths.

use c:\DHS\DHS_data\AR_files\SZar51fl.dta, clear
ren hivclust v001
ren hivnumb v002
ren hivline v003

sort v001 v002 v003

save c:\DHS\DHS_data\scratch\temp.dta, replace

use c:\DHS\DHS_data\IR_files\SZIR51fl.dta, clear
sort v001 v002 v003
merge v001 v002 v003 using c:\DHS\DHS_data\scratch\temp.dta
tab _merge

keep if _merge==3

* hiv03 is the result of the test
* all analysis of the hiv data should use hiv05 for weights, not v005

[Updated on: Tue, 02 September 2014 08:42]

Report message to a moderator

Re: Link the DHS Individuals (IR) with DHS HIV (AR) [message #4009 is a reply to message #2851] Tue, 17 March 2015 09:57 Go to previous messageGo to next message
bwbennett09 is currently offline  bwbennett09
Messages: 3
Registered: March 2015
Location: Providence, RI
Member
I realize the above answer was a few years ago, but I do have a question regarding the code. Why only keep if _merge==3? Also, is this specific to one country or all countries?

I am following a similar procedure for 7 different nations looking at HIV status and measures of women's empowerment. Thanks!

[Updated on: Tue, 17 March 2015 10:06]

Report message to a moderator

Re: Link the DHS Individuals (IR) with DHS HIV (AR) [message #4021 is a reply to message #4009] Wed, 18 March 2015 07:56 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3016
Registered: February 2013
Senior Member
Following is a response from DHS Senior Stata Specialist, Tom Pullum:

The answer you refer to is about six months old, not several years.... Like many Stata users, I have stayed with old syntax for the merge command, rather than the one that was introduced with Stata 11 or 12. Both the old and the new versions produce a diagnostic variable called "_merge". The variable has three values of most interest, 1, 2, and 3. "1" means the case was only in the "master" file, i.e. the first one in the sequence. "2" means the case was only in the "using" file, i.e. the second one in the sequence. "3" means the case was in both files.

Usually, when I do a merge, I want just the cases with _merge==3, but that is definitely not always the case. Sometimes I also want the cases with 1 or 2 (almost never 1, 2, and 3, but even that could happen). You have to think about what makes sense. Say, for example, that you wanted to merge the children under 5 in the household survey (the PR file) with the children in the KR file. Say you started with the KR file and then merged the PR data (again for children under 5) with the KR file. The PR and KR files would be the master and using files, respectively. The PR file includes all children in the household, including children whose mother is not a resident of the household. Those children will get _merge=1. The KR file includes children who do not live in the same household as the mother, and they will get _merge=2. If the mother and child both live in the household, the child will get _merge=3. They are the only ones for whom you will get both PR and KR variables, so you probably would want to keep just them--but that's not necessarily the case.
Previous Topic: Svy, subpop
Next Topic: Merging datasets from multiple countries
Goto Forum:
  


Current Time: Thu Mar 28 09:53:46 Coordinated Universal Time 2024