The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Topics » Biomarkers » Individual IDs for 2008 Egyptian Hepatitis C Blood Sample Testing (Attempting to find a way to link individual IDs to hepatitis C blood tests to see correlations with age/sex/etc.)
Individual IDs for 2008 Egyptian Hepatitis C Blood Sample Testing [message #24072] Fri, 18 February 2022 17:05 Go to next message
mlaradji is currently offline  mlaradji
Messages: 4
Registered: February 2022
Member
Hi all,

I am currently struggling to find a way to link individual demographics to the hepatitis C blood sample results in the 2008 survey. In the specific dataset that contains the hepatitis C results, EGOD5ADT.dta, only the cluster number, household number, and line number are provided, none of which are usable immediately to determine demographics of those that tested positive or negative (other than their cluster GPS coordinates). Does anyone know of a way to find a possible link here to individual IDs?


Re: Individual IDs for 2008 Egyptian Hepatitis C Blood Sample Testing [message #24090 is a reply to message #24072] Tue, 22 February 2022 08:25 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3199
Registered: February 2013
Senior Member

Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

The cluster, household, and line number are all you need to identify individuals in any of the data files. For example, in the PR file they are given by hv001, hv002, and hvidx. In the IR file they are given by v001, v002, and v003. You can do a merge (there are actually alternatives) to attach the blood sample results to the right individuals.
Re: Individual IDs for 2008 Egyptian Hepatitis C Blood Sample Testing [message #24279 is a reply to message #24090] Tue, 12 April 2022 03:12 Go to previous messageGo to next message
mlaradji is currently offline  mlaradji
Messages: 4
Registered: February 2022
Member
Tom Pullum,

Thank you for responding. While the cluster number, household number, and line number are all present in the PR and OD dta files for 2008, the OB dta file (which contains the hepatitis C blood test results), does not contain cluster numbers and is unable to match with the PR or OD dta files due to an inability to match uniquely. Is there an alternative variable that should be used for this purpose?

I am also slightly confused on variation between h and w prefixes for variables, such as household or line (wnumber vs hnumber, wline vs hline, etc.) Could you explain this briefly?
Re: Individual IDs for 2008 Egyptian Hepatitis C Blood Sample Testing [message #24299 is a reply to message #24279] Fri, 15 April 2022 15:58 Go to previous messageGo to next message
admin is currently offline  admin
Messages: 50
Registered: November 2012
Senior Member
Administrator
Following is response from DHS Research & Data Analysis Director, Tom Pullum:

I am looking at EGOB5AFL.dta and see obclust, obnumb, and obline, which are the cluster id, household id, and line number, respectively. All three are coded for all cases. Have you tried merging with these codes? Please let us know if you still have difficulty.


We now have repositories of code written in Stata and SPSS available on Github. Please reference these code repositories as a resource for code for matching or calculating DHS indicators. The code repositories can be found at:

https://github.com/DHSProgram/DHS-Indicators-Stata
https://github.com/DHSProgram/DHS-Indicators-SPSS

Re: Individual IDs for 2008 Egyptian Hepatitis C Blood Sample Testing [message #24378 is a reply to message #24299] Thu, 05 May 2022 00:37 Go to previous messageGo to next message
mlaradji is currently offline  mlaradji
Messages: 4
Registered: February 2022
Member
Tom Pullum,

Thank you for responding again, I have explained the issue further below. I also miswrote the issue in my earlier reply but I have corrected it below.

Identifying variables in each dataset:

EGOB5AFL.DTA : household number (obnumb), line number (obline), cluster number (obclust)

EGOD5AFL.DTA : household number (wnumber and hnumber), line number for women only (wline), and no cluster number

EGPR5AFL.DTA : household number (hv002), line number (hvidx), and cluster number (hv001)

I am attempting to merge all 3 through first merging the OB dataset with the the PR dataset along the household number, line number, cluster number, then merging this new dataset ideally along the same variables but there isn't a cluster number for the OD dataset and there is also only a womens line number for OD dataset. I have redownloaded all the datasets above to double check that it isn't just my version that is missing the variable, could you help with this?
Re: Individual IDs for 2008 Egyptian Hepatitis C Blood Sample Testing [message #24381 is a reply to message #24378] Thu, 05 May 2022 10:54 Go to previous messageGo to next message
mlaradji is currently offline  mlaradji
Messages: 4
Registered: February 2022
Member
Update: I was able to figure it out, the variable hpsu is the cluster number in the OD dataset and the wline variable in the OD dataset, despite being labeled as the women's line number, contains both male and female observations. Through using both of these I was able to make the final dataset I wanted with 12,008 observations.
Re: Individual IDs for 2008 Egyptian Hepatitis C Blood Sample Testing [message #24384 is a reply to message #24378] Thu, 05 May 2022 12:39 Go to previous message
Janet-DHS is currently offline  Janet-DHS
Messages: 891
Registered: April 2022
Senior Member
Following is response from DHS Research & Data Analysis Director, Tom Pullum:

In EGOD5AFL.dta, hpsu=hv001; hnumber=hv002; wline=hvidx

In EGOB5AFL.dta, obclust=hv001; obnumb=hv002; obline=hvidx.

I think the problem is just that you did not realize that the PSU (primary sampling unit) and cluster are one and the same.
Previous Topic: Child overweight and obesity (HC 72) from DHS Sierra Leone 2019
Next Topic: DHS manual for biomarker
Goto Forum:
  


Current Time: Wed Nov 27 00:47:42 Coordinated Universal Time 2024