The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Topics » HIV » Merging Women-Men-HIV Data: Different Countries, Different Years
Merging Women-Men-HIV Data: Different Countries, Different Years [message #16860] Sun, 10 March 2019 17:27 Go to previous message
Yawo is currently offline  Yawo
Messages: 45
Registered: February 2019

I am getting started on big project investigating stigmatization among those living with HIV in sub-Saharan Africa. Given the relatively low rates of testing in SSA, I needed to pool data across countries to ensure my analyses (HLM) models have enough power.

Consequently, I am pooling data from 33 countries with HIV data: Angola Benin Burkina Faso Burundi Cameroon Chad Congo Congo Democratic Republic Cote d'Ivoire Ethiopia Gabon Gambia Ghana Guinea Kenya Lesotho Liberia Malawi Mali Mozambique Namibia Niger Rwanda Senegal Sierra Leone South Africa Swaziland Tanzania Togo Uganda Zambia Zimbabwe

For some countries, like Kenya, we have HIV data for 2 years: 2003 and 2008, while for others like Zimbabwe, we have data for 2005, 2010 and 2015

So, for each country, I need to:

1. append men and women's data, then merge this with HIV test results for a specific year;
2. repeat the same for the next year,
3. append data of year1, year2, year3;
4. Then finally, pool all these data together to create a dataset for all countries, for all years for which they HIV data.

I have already:
a) selected a subset of variables I needed to be sure they are consistent across countries, and years; including the various survey-specific variables - stratum, psu, etc
(b) renamed the male variables, from mv* to v*;
(c) dumped all the datasets (men, women, HIV) into one main datafile, that I will use as my working directory.

Here is the process I have outlined, and I will appreciate some comments and suggestions:

/* sort Kenya2003 women's data by key variables */

use, clear
sort v001 v002 v003
save Kenya2003_women1.dta, replace

/* sort Kenya2003 men's data by key variables - note, variables already renamed from mv* to v* */
use Kenya2003_male.dta, clear
sort v001 v002 v003
save Kenya2003_male1.dta, replace

/* call Kenya2003 HIV data, rename key variables, then sort */
use Kenya2003_HIV.dta
ren hivclust v001
ren hivnumb v002
ren hivline v003
sort v001 v002 v003
save Kenya2003_HIV1.dta, replace

/* Append Kenya2003 men to women */
use Kenya2003_women1.dta, clear
append using Kenya2003_male1.dta
save KenyaWomenMen.dta, replace

/* Merge HIV data into the combined Kenya 2003 men and women file */
use KenyaWomenMen.dta, clear
sort v001 v002 v003
merge merge v001 v002 v003 using Kenya2003_HIV1.dta
save, replace.

/* Repeat same steps to create */
use Kenya2003_HIV_MenWomen.dta
append using Kenya2008_HIV_MenWomen.dta
save Kenya2003-2008_MenWomenHIV.dta, replace

Cycle the same process through all countries. At the end, append, country datasets to each other.


1. Assuming everything is correct, how do I handle the psu, stratum, etc variables needed to create my svyset? Do I do this for each dataset (within a country, within a specific year) or wait until the end, and create a grouping variable to do this?

2. Any other advice to help / facilitate this process?

Thanks very much in advance for any assistance -

best- Yy
Read Message
Read Message
Previous Topic: HIV testing response rate
Next Topic: SVYSET with HIV data
Goto Forum:

Current Time: Mon Jun 17 06:53:36 Coordinated Universal Time 2024