The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Merging data files » What is the underlying criteria for merging datasets from various waves?
What is the underlying criteria for merging datasets from various waves? [message #9968] Thu, 09 June 2016 08:47 Go to next message
sufi is currently offline  sufi
Messages: 3
Registered: June 2016
Member
Hello all,

I'm working on a large piece of data which contains virtually all African datasets in the DHS.
I'm having some trouble figuring out the exact criteria for merging individual recode files with male recode, hiv, and GPS files.
I fully understand that matching individuals works at the cluster-householdno-personal number level. However, I'm unsure as to the criteria for merging files of the same country, in the same year.
Take Cameroon 2011 for example:
Would all files merge by cluster-householdno-personal number when using data from Cameroon 2011? The reason I'm unsure is because, for example, the hiv data is called CMAR61FL, while the male recode data is called CMMR60FL
One has the number '61' and the other has the number '60'. Should they still merge?

I would appreciate any help and clarification
Thanks in advance
Re: What is the underlying criteria for merging datasets from various waves? [message #9989 is a reply to message #9968] Tue, 14 June 2016 10:28 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3214
Registered: February 2013
Senior Member
Following is a response from Senior DHS Stata Specialist, Tom Pullum:


The fifth character of the file name is the phase of DHS (0 through 7, but 0 refers to phase 1; we are currently in phase 7). The sixth character is the version. It is usually 0 for the first version of the data file, when the survey is first released, and then if corrections or additions (almost always of negligible importance) are made, the version is updated to 1, 2, etc. CMAR61FL and CMMR60FL are completely compatible. The AR file had one update but the MR file did not.

Sometimes this happens--one of the files from a survey has had more updates than another one of the files from the same survey. This should not affect a merge.

If there was a second survey in the same phase of DHS, then the first version of the second survey will be A, which is updated to B, etc. If there was a third survey, then the first version of the third survey will be G, which is updated to H, etc.

Obviously the file naming system is sometimes awkward. It goes back to a time when the file name (before the dot) could be at most 8 characters long. That's no longer a restriction, and I expect the whole system will be revised before long.....
Previous Topic: merging household and individual data
Next Topic: ID variables for Children
Goto Forum:
  


Current Time: Sat Dec 21 23:19:13 Coordinated Universal Time 2024