The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Countries » India » Merging BR and PR data (Unmatched Observations)
Merging BR and PR data [message #24727] Wed, 29 June 2022 06:14 Go to next message
hvs0013 is currently offline  hvs0013
Messages: 11
Registered: April 2022
Member
Hi,

I am trying to merge the PR and the BR datasets for India. I used the variables v001, v002, b16 in the BR file and hv001, hv002, hvidx in the PR file to merge the data. Furthermore, I dropped b16=0 and b16=. before the merge. However, when I try to perform the merge, some observations from the BR file do not get merged to the PR file. I understand that many observations in the PR file will not get merged to the PR file as some people born have died or are not living in the household. But I am confused regarding why some observations in the BR file will not get merged to the PR file. Thank you
Re: Merging BR and PR data [message #24752 is a reply to message #24727] Wed, 06 July 2022 09:33 Go to previous messageGo to next message
Janet-DHS is currently offline  Janet-DHS
Messages: 666
Registered: April 2022
Senior Member
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

As has been posted several times on the forum, merges of different files from the India surveys always require that you include state, which is v025 (or hv025 or mv025). In the India surveys, the clusters (or EAs or PSUs) are numbered separately within each state. Just include v025 in the ID and you should be ok.
Re: Merging BR and PR data [message #24766 is a reply to message #24752] Thu, 07 July 2022 04:16 Go to previous messageGo to next message
hvs0013 is currently offline  hvs0013
Messages: 11
Registered: April 2022
Member
Hi,

Thanks for your response. I tried to include the state too, and the problem still persists. I actually used the variable v023 (state) instead of v025 (type of place of residence). I am still left with 13 unmatched observations (the same case as before).

[Updated on: Thu, 07 July 2022 04:17]

Report message to a moderator

Re: Merging BR and PR data [message #24786 is a reply to message #24766] Mon, 11 July 2022 10:13 Go to previous messageGo to next message
Janet-DHS is currently offline  Janet-DHS
Messages: 666
Registered: April 2022
Senior Member
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

My apologies for a typo in my previous response. I should not have said v025, which is place of residence (1=urban, 2=rural). State is given by v024. The following merge will work. You need to remove from the BR file the cases with b16=0 or ".", which identify children who have died or left the household. The cases you want are those with _merge=3. You will want to include variables other than those I have included.

use v0* b* using "...IABR7AFL.DTA", clear
drop if b16==0 | b16==.
rename v024 state
rename v001 cluster
rename v002 hh
rename b16 line
sort state cluster hh line
save ...IABRtemp.dta, replace


use hv0* hvidx using "...IAPR7AFL.DTA", clear
rename hv024 state
rename hv001 cluster
rename hv002 hh
rename hvidx line
sort state cluster hh line

merge state cluster hh line using e:\DHS\DHS_data\scratch\IABRtemp.dta
tab _merge
keep if _merge==3
Re: Merging BR and PR data [message #24805 is a reply to message #24786] Thu, 14 July 2022 01:54 Go to previous messageGo to next message
hvs0013 is currently offline  hvs0013
Messages: 11
Registered: April 2022
Member

Thanks for your response, and for your code. I tried it and the software says the following:

"variables state cluster hh line do not uniquely identify observations in the using data"

I am happy to send my output to show that I have used the exact same code. Thank you.
Re: Merging BR and PR data [message #24817 is a reply to message #24805] Fri, 15 July 2022 08:49 Go to previous messageGo to next message
Janet-DHS is currently offline  Janet-DHS
Messages: 666
Registered: April 2022
Senior Member
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

You have made a very interesting discovery. I ran the Stata lines that I sent you, but didn't even look for the warning "variables state cluster hh line do not uniquely identify observations in the using data" because I was so sure that the code was correct. I should have been more careful. That warning means that there are repeats or duplicates of some ID codes.

To find the duplicates, I ran these lines on IABR7Atemp.dta:

use e:\DHS\DHS_data\scratch\IABRtemp.dta, clear
gen n=1
collapse (sum) n, by(state cluster hh line)
tab n


Those lines give the number of cases with each ID code (for the merge). Here's what comes up:

. tab n

(sum) n | Freq. Percent Cum.
------------+-----------------------------------
1 | 1,008,647 99.99 99.99
2 | 58 0.01 100.00
------------+-----------------------------------
Total | 1,008,705 100.00


Apparently there are 58 duplicates in a file of more than a million children. This is a DP error. Such duplicates can arise for different reasons but they should have been checked and resolved during data processing. I then enter the following line:

list state cluster hh line if n==2, table clean noobs nolabel

I won't reproduce the full list but here is the first duplicate:

state cluster hh line
1 834 26 9

Using this example, let's look at the all the children in this household--that is, the cases in the BR file who have a code for b16. Here they are:

list v024 v001 v002 v003 b16 b3 b4 b8 if v024==1 & v001==834 & v002==26, table clean noobs

v024 v001 v002 v003 b16 b3 b4 b8
jammu & 834 26 2 9 1301 male 11
jammu & 834 26 2 7 1191 male 20
jammu & 834 26 2 8 1174 female 21
jammu & 834 26 2 3 1132 male 25
jammu & 834 26 4 6 1428 female 0
jammu & 834 26 4 5 1363 female 6
jammu & 834 26 8 11 1336 female 8
jammu & 834 26 8 10 1311 male 10
jammu & 834 26 8 9 1287 male 12

There are 3 children age 18+ and 6 children under age 18. ("Child" just applies to someone in a birth history.) The problem is that in this list there are two children with b16=9: a boy age 11 whose mother has line number 2 and a boy age 12 (b8 is age) whose mother has line number 8. (There's something else wrong here, because the mother with v003 has a daughter age 21 who has b16=8. A 21-year old daughter of the woman on line 2 is not going to have a 12-year old son.) Next, let's look at the household listing in the PR file and find these same individuals.

use "C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata\IAPR7AFL.DTA", clear
. label list HV101
HV101:
1 head
2 wife or husband
3 son/daughter
4 son/daughter-in-law
5 grandchild
6 parent
7 parent-in-law
8 brother/sister
9 co-spouse
10 other relative
11 adopted/foster child
12 not related
13 niece/nephew by blood
14 niece/nephew by marriage
15 brother-in-law or sister-in-law
16 niece/nephew
17 domestic servant
98 don't know

. list hv024 hv001 hv002 hvidx hv101 hv104 hv105 hv112 if hv024==1 & hv001==834 & hv002==26, table clean noobs nolabel

hv024 hv001 hv002 hvidx hv101 hv104 hv105 hv112
1 834 26 1 1 1 50 .
1 834 26 2 2 2 45 .
1 834 26 3 3 1 25 .
1 834 26 4 4 2 20 .
1 834 26 5 5 2 6 4
1 834 26 6 5 2 0 4
1 834 26 7 3 1 24 .
1 834 26 8 3 2 30 .
1 834 26 9 5 1 12 8
1 834 26 10 5 1 10 8
1 834 26 11 5 2 8 8
1 834 26 12 10 2 14 0

Here are the relation to head codes for hv101:

. label list HV101
HV101:
1 head
2 wife or husband
3 son/daughter
4 son/daughter-in-law
5 grandchild
6 parent
7 parent-in-law
8 brother/sister
9 co-spouse
10 other relative
11 adopted/foster child
12 not related
13 niece/nephew by blood
14 niece/nephew by marriage
15 brother-in-law or sister-in-law
16 niece/nephew
17 domestic servant
98 don't know

hv112 is the line number of the mother if the child is under age 18. hv104 is sex, hv105 is age. It's pretty clear that this household has three mothers. First, the woman on line 2, age 45, has three grown children in the household: a son age 25 on line 3, whose wife age 20 is on line 4, a son on age 24 on line 7, and a daughter age 30 on line 8. I see that the ages for lines 7 and 8 in the PR file do not agree with the ages for lines 7 and 8 in the BR file, but this does not imply a DP error. Priority would be given to the ages in the BR file because they come from the individual interview with the mother but the values given in the household interview are retained, not over-written. You should give priority to b8 for age, rather than hv105, following this merge.

The second mother in the household is the woman on line 4; her children are on lines 5 (a girl age 6) and 6 (a girl age 0). The third mother is the woman on line 8, whose children are on lines 9 (a boy age 12), 10 (a boy age 10), and 11 (a girl age 8). The last person in the household is a 14 year old girl who is an "other relative" of the household head but whose mother is not in the household.

The children of the woman on line 8 have ages 9, 10, and 11 in the BR file, but ages 8, 10, and 12 in the PR file. The children of the woman on line 4 have ages 0 and 6 in both the BR and PR files.

The PR file clearly shows that the child on line 9 is the 12-year old son of the woman on line 8. This is the child at the bottom of the list from the BR file. The child at the TOP of that list, a boy age 11 who also is stated to have line 9, and whose mother is stated to be the woman on line 2, is incorrect. However, I don't have time to push further with this detective work. I just wanted to demonstrate that the merge command SHOULD work, and the only reason why Stata gives that warning is that the DP steps to reconcile such inconsistencies, which originate during data collection, did not extend quite far enough. 58 unresolved inconsistencies with such a large file is not serious. I recommend that you do the merge exactly as I first suggested. By saving cases with _merge==3, you will retain 58 children who probably genuinely were in the sample, and that's better than deleting them. Moreover, if you were to drop one child in each duplicate pair, which one would you drop? I will inform the DP staff of this issue.
Re: Merging BR and PR data [message #24833 is a reply to message #24805] Mon, 18 July 2022 02:46 Go to previous messageGo to next message
hvs0013 is currently offline  hvs0013
Messages: 11
Registered: April 2022
Member
Thank you for your explanation! I understand this. I am facing two questions following your explanation:

(1) For the PR dataset, you mention "First, the woman on line 2, age 45, has three grown children in the household: a son age 25 on line 3, whose wife age 20 is on line 4, a son on age 24 on line 7, and a daughter age 30 on line 8. I see that the ages for lines 7 and 8 in the PR file do not agree with the ages for lines 7 and 8 in the BR file, but this does not imply a DP error." I am unable to understand how one can tell , for example , the wife(age 20) of the son is on line 4 since the variable hv101 only tells the relationship of the person to the head of the household and not within household members.

(2) For India, the datasets (birth recodes) that I have access to are IABR74FL.dta (2015-2016) and IABR7BDT (the recently released 2019-2021) . However, I do not have access to the dataset that you have used to demonstrate the example that is IABR7AFL.DTA
Re: Merging BR and PR data [message #24848 is a reply to message #24833] Wed, 20 July 2022 16:34 Go to previous messageGo to next message
Janet-DHS is currently offline  Janet-DHS
Messages: 666
Registered: April 2022
Senior Member
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:

I was using hv112, which is the line number of the mother within the household. For some other purposes you can use hv114, the line number of the father. These variables are coded for children under 18. In the BR file, the line number of the mother is given by v003.

IA7B is a revision of IA7A. I see that IA7B was released earlier this month, on July 3. The data files folder I was using had not yet been updated to IA7B. I doubt that IA7B differs from IA7A for this specific issue. You should use the latest version, IA7B.
Re: Merging BR and PR data [message #24870 is a reply to message #24848] Sat, 23 July 2022 04:33 Go to previous message
hvs0013 is currently offline  hvs0013
Messages: 11
Registered: April 2022
Member
Thank you for the explanation!
Previous Topic: Treamtment of disputed areas
Next Topic: Where can I find the data for Sibling composition in India
Goto Forum:
  


Current Time: Thu Mar 28 16:32:03 Coordinated Universal Time 2024