Home » Countries » Kenya » Mismatch When Merging KR and PR recode files due to disparities in age variables (Investigating umatched records between KR files and PR recode files for Children under 5)
|
Re: Mismatch When Merging KR and PR recode files due to disparities in age variables [message #29447 is a reply to message #29434] |
Thu, 20 June 2024 10:36   |
Bridgette-DHS
Messages: 3230 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS staff member, Tom Pullum:
Matching of records in different files should be done solely with ID codes. In the case of children, you match hv001 hv002 hvidx in the PR file with v001 v002 b16 in the KR file. There is no other reliable way to do a merge.
To get into the KR or BR files, a child must appear in the birth history of a woman in the IR file. Children who have died or are not living with the mother will appear in the KR/BR file but not in the PR file.
Children in the household whose mother is not in the household, because she has died or lives elsewhere, will appear in the PR file but not in the KR/BR file.
The b variables are forced to be internally consistent during data processing. For example, in the older surveys b8 is calculated from cmc of birth and cmc of interview as "b8=int((v008-b3)/12)". In the newer surveys b8 takes day of interview and day of birth into account. To show in two steps, "age_in_days=mdy(v006,v016,v007)-mdy(b1,b17,b2)" and b8=int(age_in_days/365.25)". (I am using the Stata date function mdy here.) The reported age of the child in the household survey, hv105, is ignored. It is not at all unusual for b8 and hv105 differ, and when they differ, priority is given to b8. Note that the information in the birth histories is provided by the mother. The information in the household survey is provided by the household informant (whose line number is given by hv003). The household informant may be someone other than the mother and be less informed than the mother. Even if the household informant is the mother, her responses in the birth history are considered to be more reliable than her responses in the household interview.
Hope this is helpful.
|
|
|
Re: Mismatch When Merging KR and PR recode files due to disparities in age variables [message #29491 is a reply to message #29447] |
Thu, 27 June 2024 12:56   |
RobertB
Messages: 4 Registered: June 2024
|
Member |
|
|
Yes this is very helpful and have successfully merged my data. Thanks for the detailed explanation.
My next step is to conduct analysis and I am using data from KDHS 2003, KDHS 2008_9, KDHS 2014 and KDHS 2022. I am aware that i need to account for the weights-complex survey design and have reviewed several posts in the forum around this topic. It seems there were mixed views around the need to weight data in regression analysis from the earlier posts but now more people are adopting the weights based on the questions posted. I also see some published work that have not accounted for weights. In my case I intend to fit a multilevel logistic regression at individual level.
Could you please confirm if the current recommended practise is to include the weights/survey design? Additionally is there a DHS documentation that elaborates on how to apply the weights if i combine data from the 4 surveys?- from the forums it is not clear to me when to rescale the weights if pooling data across multiple surveys from one country.
Thank you.
Best wishes,
Robert B.
[Updated on: Fri, 28 June 2024 03:53] Report message to a moderator
|
|
|
Re: Mismatch When Merging KR and PR recode files due to disparities in age variables [message #29499 is a reply to message #29491] |
Fri, 28 June 2024 08:24   |
Bridgette-DHS
Messages: 3230 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS staff member, Tom Pullum:
DHS strongly recommends using weights, as well as the other svy adjustments for clusters and strata. If you include weights, the estimates become unbiased. The other adjustments provide robust standard errors because they take into account the stratified two-stage sample design.
When you pool the four surveys into a single large file, you need to construct a variable to distinguish the surveys, for example survey=1, 2, 3, 4. The clusters and strata have to be re-numbered to distinguish all four surveys. For example, you could have "egen clustered=group(survey v001)" and something similar for strata. (The stratum code is v023 in the two most recent Kenya surveys but may be different in the earlier two.) The svyset statement would be something like "svyset clusterid [pweight=v005], strata(stratumid) singleunit(centered)". Note that in this statement v005 is NOT altered. This will be ok for all analyses I can think of EXCEPT for analyses that do not distinguish the surveys. For example, staying with the original weights would be questionable if you tried to calculate the mean of v201 (children ever born) in the four surveys, because that mean would be biased toward the largest of the four samples, and that's not desirable. But I would say that there is no reason to calculate such a mean. The mean of v201 (for example) should be calculated within each survey, but not for all surveys pooled, because you need a reference date for each estimate of v201.
You can find more discussion of weights in the Guide to DHS Statistics ( https://www.dhsprogram.com/Data/Guide-to-DHS-Statistics/inde x.cfm), in the FAQ for the user forum, and within the forum itself, if you search topics and keywords.
|
|
|
|
Goto Forum:
Current Time: Mon Apr 7 03:33:09 Coordinated Universal Time 2025
|