Home » Data » Dataset use in Stata » Data name and data do not match (data title and actual data information don't match)
|
Re: Data name and data do not match [message #22995 is a reply to message #22989] |
Mon, 21 June 2021 09:23 |
Bridgette-DHS
Messages: 3214 Registered: February 2013
|
Senior Member |
|
|
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:
The 5th character in the file name is the phase of the DHS coding manual and the 6th character is the "version" if there are updates. For example, when the files are first issued, the 5th-6th characters for a Phase 6 survey could have been "60". If there was some correction or update, the files were re-issued with "61". Next time, "62". Usually, ALL the files from a survey (HR, PR, IR, MR, CR, BR, KR) will have their file names updated, even if the update does not affect the specific file. Sometimes (more in the past than in the present), if a file is not affected, it will not be renamed. In the examples you give, the MR file must not have been affected, so that file was not re-issued or re-named.
This is annoying for all of us who use the data, but when it happens, you have to adapt. For example, if you have a combination of IR62 and MR61, you just have to treat the MR61 file as if it were labelled MR62. The two files come from the same survey.
When there are two surveys in the same phase if DHS, the 6th character will begin as a letter, and then any updates will be the next letter. For example (these may not be the actual letters that would be used) the first survey may be in a sequence 60, 61, 62, as described above, but a second survey would be 6H, 6I, 6J... Then if there is a third survey within the same phase of DHS, the sequence will be 6R, 6S, 6T...
This numbering system leaves much to be desired. It goes back to decades ago, when file names were restricted to 8 characters. Eventually, I am sure, a more flexible and intuitive naming system will take over.
|
|
|
Re: Data name and data do not match [message #22999 is a reply to message #22995] |
Mon, 21 June 2021 21:25 |
JaneQuan
Messages: 11 Registered: June 2021
|
Member |
|
|
Hi,
First of all, thank you so much for explaining the data (which is the issue about Column V in the attachment).
And Yes, I have relabelled MR61 as MR62 and also other files when I combine them.
But what I really wanted to ake is why the data's filename does not match the actual data in the dta.file.
For example, Ethiopia (Row1), the filename is "Ethiopia: Standard DHS, 2011", but after I downloaded it and open the data file in the STATA, you will find the survey year is 2003.
Also, Burundi should only contain a one-year dataset if based on the filename, but the actual data contains a two-year dataset.
Another example, the Chad(Row 8), the survey phase in My Dataset Account is presented as VII (also can tell from the extracted filenames), but when you open the data in the STATA, the survey phase showed as TD6.
Thank you^^
|
|
|
|
|
Re: Data name and data do not match [message #23040 is a reply to message #23039] |
Wed, 30 June 2021 07:57 |
Bridgette-DHS
Messages: 3214 Registered: February 2013
|
Senior Member |
|
|
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:
DHS surveys are household-to-household surveys with teams of interviewers in sampled geographic areas, working over a period of several months. Sometimes the fieldwork all takes place within one calendar year. Sometimes, as with this survey, it straddles two consecutive years. "2010" or "2010-11" is just part of the label for the survey. You definitely do not need to separate the survey into two parts for the two calendar years. If, say, the data came from a birth registration system, and births are reported by calendar year, then you would want to separate the two years, but that would not be appropriate for survey data.
|
|
|
Re: Data name and data do not match [message #23043 is a reply to message #23040] |
Wed, 30 June 2021 10:30 |
JaneQuan
Messages: 11 Registered: June 2021
|
Member |
|
|
Dear Doctor Pullum,
Thank you for your reply, but I am still confused.
I quote from your previous answer"You definitely do not need to separate the survey into two parts for the two calendar years. "
-my response is that I didn't separate the data, it is the original data includes both years while it should've only had one year of data according to the filename.
and I totally understand that "sometimes, as with this survey, it straddles two consecutive years".
but the problem for me is that "year" is one of the variables in my analysis model and since the sample size of the two consecutive years is quite different, maybe I should combine those data into one year as the filename implies.
Regards.
PS. This confuses me is because the year on other filenames and actual data are consistent, except for only 5 countries' among 40.
[Updated on: Wed, 30 June 2021 10:34] Report message to a moderator
|
|
|
|
Re: Data name and data do not match [message #23048 is a reply to message #23044] |
Thu, 01 July 2021 02:02 |
JaneQuan
Messages: 11 Registered: June 2021
|
Member |
|
|
Hi Dr. Pullum,
Thank you so much for the answer. So I should just select one year of data that relatively has more interviewers if the survey is conducted in two consecutive years (or the year of fieldwork spanned both years).
and I have one more question to ask on this thread: how often do you conduct a Standard DHS survey in Africa countries?- is it every five years?
Regards.
[Updated on: Thu, 01 July 2021 03:19] Report message to a moderator
|
|
|
Goto Forum:
Current Time: Wed Dec 18 17:38:52 Coordinated Universal Time 2024
|