The DHS Program User Forum: India » Probable errors in Data and its rectification

Home » Countries » India » Probable errors in Data and its rectification (Errors in Data)

Show: Today's Messages :: Show Polls :: Message Navigator

Re: Probable errors in Data and its rectification [message #20242 is a reply to message #20033]

Wed, 14 October 2020 15:11

Trevor-DHS
Messages: 808
Registered: January 2013

Senior Member

In response to Shekhar_GS,
1) It is not unusual to find errors in survey data. While the DHS Program does a fair amount of data cleaning for each survey, we do not check or edit all possible issues in datasets, and in fact for many issues we do not make corrections to the data. However a lot of data have been checked for consistency, at the time of data collection, and during editing of the data, and many issues are resolved that those stages. The data are considered reasonably clean, but no survey data will ever be completely clean (if it is then you know it has been fabricated).
2) Yes, it is expected that analysts will make rational decisions about the treatment of outliers in their analysis.
3) The data is generally pretty clean, but we do expect researchers to review and decide for themselves whether they need to make decisions for their analysis which might include further cleaning or might include treatment of outliers.

The case you mention is indeed impossible, but it is 1 case in more than 112,000 and will have very little effect on your analysis. There are a few others that look questionable too for similar reasons. A simple solution would be to exclude these cases from your analysis. Alternatively, you can make decisions about how you want to edit the data. Perhaps you don't believe that the person had 45 children, but they also said they had 25 boys and 20 girls. They also said that they had their first child at age 25 and the youngest is 24, which is clearly impossible, but perhaps they misunderstood the question and gave the age of their oldest rather than their youngest. This is just to note that there are lots of things to consider when making decisions about further cleaning. The simplest and easiest to defend is simply to exclude extreme outliers from your analysis.

Report message to a moderator

[Message index]

		Probable errors in Data and its rectification By: Shekhar_GS on Mon, 21 September 2020 03:27
		Re: Probable errors in Data and its rectification By: Trevor-DHS on Wed, 14 October 2020 15:11

Previous Topic:	explanation of variable v463z
Next Topic:	Anthropometric estimates for children between 0-59m

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Jan 25 10:17:44 Coordinated Universal Time 2026