The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » Appending Multi-phase Nigerian DHS Surveys (Handling Inconsistent Value Labels and Creating a Survey Design for Multi-phase DHS Surveys)
Appending Multi-phase Nigerian DHS Surveys [message #29961] Fri, 30 August 2024 03:41 Go to next message
Oby is currently offline  Oby
Messages: 1
Registered: August 2024
Member
Hello,

I'm working with multiple NDHS datasets from different survey years (2003, 2008, 2013, and 2018) and have encountered some challenges. I'd appreciate your guidance on the following issues:

1. Inconsistent Value Labels Across Years:
When attempting to append these datasets using bind_rows(), I receive warnings about conflicting value labels for certain variables.
Question: Should I convert these labelled variables to factors using a function like as_factor() before appending, or is there a better approach to standardize value labels across these different datasets? What is the best practice for ensuring that the labels are consistent before appending?

2. Creating a Survey Design with Different Sampling Designs:
Each of the four surveys has a different sampling design. After appending the datasets, I need to create a combined survey design object for analysis.
Questions: a.) How should I go about creating a survey design object that appropriately accounts for the different sampling designs across the four surveys?
b.) Are there other specific adjustments or considerations I need to make when combining these datasets for analysis?

Thank you for your assistance! I look forward to your advice on these issues.

Regards,
Oby
Re: Appending Multi-phase Nigerian DHS Surveys [message #29965 is a reply to message #29961] Fri, 30 August 2024 15:35 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3152
Registered: February 2013
Senior Member

Following is a response from Senior DHS staff member, Tom Pullum:

There have been many forum postings on both of these topics. Please search with keywords.

There is no automated way to reconcile changes in coding. Sometimes there is just a change in the numbering, but often the differences are due to consolidation or division of categories and there is no simple way to recode. Sometimes there is a change in classification--for example, sources of water that are classified as "improved" in one survey may be "unimproved" in another survey. This is a complication when analyzing multiple surveys that cannot be avoided.

You can more easily handle the different sample designs. It is simplest to construct a categorical variables for "survey" that takes the values 1, 2, 3, 4, for example. The clusters are v001 (or v021). Then you construct a combined variable with "egen cluster_id=group(survey v001)". You need to do the same thing with the strata. Then use cluster_id and stratum_id in the svyset command.
Previous Topic: Merging IR, KR and HR files
Goto Forum:
  


Current Time: Wed Sep 11 10:38:06 Coordinated Universal Time 2024