The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » Mismatch of number of clusters
Mismatch of number of clusters [message #22944] Wed, 09 June 2021 12:37 Go to next message
MrZ is currently offline  MrZ
Messages: 3
Registered: April 2021
Member
Dear DHS - Team,

I'm trying to merge GPS data, household and children data files for a number of countries. I've noticed that sometimes there is a mismatch between the number of clusters (as given in v001 and hv001) across different survey files. For example, in the Namibia 2013 data, there are 550 clusters in the household file and 537 clusters in the children file. However, it may well be the case that some clusters do not contain any children, which would explain a slightly smaller number of clusters in the children file.

In addition, the number of locations in the GPS files do match the number of clusters in the survey files relatively seldom, even though the discrepancies are usually rather small. However, in some cases the difference is large, e.g., the Democratic Republic of Congo GPS file from 2013-14 holds 492 locations of clusters while the household file lists 536 clusters.

I am a bit worried that these "missing" clusters could cause a mismatch between GPS coordinates and survey information. Can we be reasonably sure that, e.g., "cluster no. 52" will be the same cluster in all survey and GPS files, even when the number of clusters are not perfectly aligned?

Thank you!
Re: Mismatch of number of clusters [message #22954 is a reply to message #22944] Thu, 10 June 2021 16:21 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 2537
Registered: February 2013
Senior Member

Following is a response from DHS Senior Sampling Specialist, Mahmoud Elkasabi and DHS Geospatial Technologist, Tom Fish:

There can be fewer clusters in some other files than there are in the HR and PR files just because of how the eligible cases are distributed. For example, as you say, in "the Namibia 2013 data, there are 550 clusters in the household file and 537 clusters in the children file". That's because there were 13 clusters with no children under five. This difference does not indicate an error.

For the question in your last paragraph, we are confident that the clusters in the DHS and GPS files are correctly matched. Someone from the GIS team will respond to your question about locations and clusters and the DRC example.

There are no discrepancies between the published GPS file and the recode; the GPS file has 536 clusters listed just like the recode. There are, however, 44 clusters that we could not verify the location of and are at 0, 0 and are marked at missing (MIS). This process does not effect other clusters. We are sure that the 492 clusters with verified locations (GPS and GAZ) are in the correct places. Please note that all clusters have been displaced by a small amount to protect the privacy of our respondents. For more information on the process, please see Spatial Analysis Report 7 on The DHS Program's Website.


Re: Mismatch of number of clusters [message #22958 is a reply to message #22954] Fri, 11 June 2021 02:07 Go to previous message
MrZ is currently offline  MrZ
Messages: 3
Registered: April 2021
Member
Thanks for the elaborate and helpful answer!
Previous Topic: Merging variables from original DHS with IPUMS DHS
Next Topic: Help!! Different Value Labels in Men and Women (iR) dataset for Kenya
Goto Forum:
  


Current Time: Thu Jun 30 21:30:10 Coordinated Universal Time 2022