The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » Merging HR data and GE data (Mismatch when merging household survey data with GPS data)
Merging HR data and GE data [message #31869] Mon, 04 May 2026 04:19
hwangha is currently offline  hwangha
Messages: 1
Registered: May 2026
Member
Dear DHS Team,

I am currently working with DHS data and encountered a substantial mismatch when merging Peru household survey data with GPS data.

Specifically, I used:

Household data: PEHR51
GPS data: PEGE52

I merged the datasets using hv001 (in the HR dataset) and DHSCLUST (in the GE dataset) as the cluster identifiers. The merge structure appears appropriate (many-to-one).

However, I observed that approximately 25% of clusters from the household data do not have corresponding entries in the GPS dataset.

In addition, the number of unique clusters differs between the datasets:

HR (hv001): 1,851 clusters
GE (DHSCLUST): 1,414 clusters

I would like to understand the reason for this discrepancy.

Additionally, I observed a different pattern in some countries (e.g., India, Kenya, Mozambique, and the Philippines), where cluster IDs exist in the GPS dataset but do not have corresponding observations in the household dataset (using the same merge key).

Could you please clarify:

For Peru (PEHR51 / PEGE52), is it expected that a substantial number of clusters in the household dataset are not included in the GPS dataset?
Under what circumstances would clusters be excluded from the GPS dataset?
In the opposite case, why might cluster IDs exist in the GPS dataset without matching observations in the household dataset?
Are these discrepancies due to survey design, data processing, or GPS data availability?

Any clarification or references would be greatly appreciated.

Thank you!
Previous Topic: KR duplicates in household line number (b16)
Goto Forum:
  


Current Time: Wed May 13 23:55:27 Coordinated Universal Time 2026