Appending Multi-phase Nigerian DHS Surveys [message #29961] |
Fri, 30 August 2024 03:41 |
Oby
Messages: 1 Registered: August 2024
|
Member |
|
|
Hello,
I'm working with multiple NDHS datasets from different survey years (2003, 2008, 2013, and 2018) and have encountered some challenges. I'd appreciate your guidance on the following issues:
1. Inconsistent Value Labels Across Years:
When attempting to append these datasets using bind_rows(), I receive warnings about conflicting value labels for certain variables.
Question: Should I convert these labelled variables to factors using a function like as_factor() before appending, or is there a better approach to standardize value labels across these different datasets? What is the best practice for ensuring that the labels are consistent before appending?
2. Creating a Survey Design with Different Sampling Designs:
Each of the four surveys has a different sampling design. After appending the datasets, I need to create a combined survey design object for analysis.
Questions: a.) How should I go about creating a survey design object that appropriately accounts for the different sampling designs across the four surveys?
b.) Are there other specific adjustments or considerations I need to make when combining these datasets for analysis?
Thank you for your assistance! I look forward to your advice on these issues.
Regards,
Oby
|
|
|
Re: Appending Multi-phase Nigerian DHS Surveys [message #29965 is a reply to message #29961] |
Fri, 30 August 2024 15:35 |
Bridgette-DHS
Messages: 3152 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS staff member, Tom Pullum:
There have been many forum postings on both of these topics. Please search with keywords.
There is no automated way to reconcile changes in coding. Sometimes there is just a change in the numbering, but often the differences are due to consolidation or division of categories and there is no simple way to recode. Sometimes there is a change in classification--for example, sources of water that are classified as "improved" in one survey may be "unimproved" in another survey. This is a complication when analyzing multiple surveys that cannot be avoided.
You can more easily handle the different sample designs. It is simplest to construct a categorical variables for "survey" that takes the values 1, 2, 3, 4, for example. The clusters are v001 (or v021). Then you construct a combined variable with "egen cluster_id=group(survey v001)". You need to do the same thing with the strata. Then use cluster_id and stratum_id in the svyset command.
|
|
|