Mon, 16 November 2020
So for my undergrad economics dissertation I am hoping to look at the impact of HIV on children's education or children's health in South Africa.

Really struggling to wrap my head around how to accomplish this for the household recode. Firstly, trying to merge the HIV dataset with the household recode is obviously difficult because the barcode for HIV tests is listed as multiple variables in the hh recode, but as a single variable in the HIV data. How would I go about merging these?

Furthermore, how would I run a regression on this hh recode to see the impact of a hh with a HIV positive member on the education level/health of children in the household, given that each household (as a single observation) has multiple subset variables for each (as each member of the household)

Thanks in advance!
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

This is a very ambitious project! The main problem you have is that the data probably do not include enough cases for you to have any chance of statistical significance.

Merging the AR file with other files is done with the household ID code, not the barcode. The merge can be tricky. The household ID code is a string variable that includes hv001 and hv002, which are numeric. Let us know if you still have difficulty with that step. hhid is contained within caseid in the IR and KR files.

You want to compare the child outcomes for children in a household with an HIV positive person with child outcomes for children without an HIV positive person. The health variables for children are in the KR file. School attendance is in the PR file, which has one record for each person in the household. I suppose I would break this into two analyses. One would be for children age 0-4, using the KR file with the HIV status of adults in the household (having the same values of v001 and v002) merged onto the children's records. The other would be for school-age children, using the PR file with the HIV status of adults in the household (having the same values of hv001 and hv002) merged onto the records of the school age children.

Your analysis should control for other characteristics of the household that may be associated with both HIV prevalence and the child outcomes.

Note that you do not know how long the adult has been HIV positive or how long they have been in the same household, although there are ways to check whether they are the mother or father of the child (using the line numbers given by hv112 and hv114). Good luck.
