The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Dataset use in Stata » Missing data
Missing data [message #1306] Fri, 07 February 2014 01:39 Go to next message
Dsisso is currently offline  Dsisso
Messages: 6
Registered: February 2014
Location: Montréal,QC
Member
Hello everybody,
I am working on an immunization dataset in which I like to impute missing values on DTP1,2 and 3 doses and to calculate the prevalence of unimmunized by each vaccine dose. Is someone experienced the same problem? In STATA, I am able to impute multiple imputed datasets but I am experiencing difficulties in combining all imputed dtatsets to one containing complete information per observation. So far, in my understanding, I can only combine estimates (e.g, coefficients by using logistic regressionand Rubin rule or combination) while I want to rather generate categorical DTP1,2,3 with missing values filled by multiple imputation procedure.
Thank you for our help,
Dsisso
Re: Missing data [message #1437 is a reply to message #1306] Wed, 26 February 2014 09:03 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3017
Registered: February 2013
Senior Member
We are working on a response to your posting.

Thanks
Re: Missing data [message #1463 is a reply to message #1437] Fri, 28 February 2014 10:20 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3017
Registered: February 2013
Senior Member
Following are comments from DHS Specialist, Tom Pullum:

DHS does some imputation or reconciliation of incomplete or conflicting dates or ages, for example in birth histories. We may shift out-of-range codes into a not-stated code, as with the height and weight measurements. But we never, or hardly ever, make imputations for missing data and we don't have a policy on how users should make such imputations.

With multiple imputation procedures the interest is in the coefficients produced by a model. The model is run many times, with alternative individual-level imputations, to optimize the values of the coefficients, but multiple imputation is not a device for identifying optimal individual-level imputed values.
Re: Missing data [message #1511 is a reply to message #1463] Wed, 05 March 2014 12:09 Go to previous messageGo to next message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3017
Registered: February 2013
Senior Member
Following is another response from one of our DHS experts, Shea Rutstein.

In the DHS we do not impute whether each child received a vaccination. If a vaccine dose is not recorded on the the child's immunization card, then the mother is asked whether the child received the vaccination. For DPT, the questions asked are in the attached file.

The age at which the vaccine is given is assumed to be the same as that for which dates have been given (an aggregate assignment, not individual, done during tabulation). Since the outcome of whether a child is given a vaccination is dichotomous, either logistic or probit regression is appropriate. The predicted value will be the probability that the child was given the vaccination. I would combine the several imputations to get the average probability for each child and then randomly select a number so that you assign 1 or 0 (got vaccination or not) according to the number selected, e.g. if the probability of receiving DPT1 is 0.60 then a randomly selected number between 0 and 5 would indicate that the vaccine was given and between 6 and 9 would indicate that the vaccine was not given.

Another way to go about it would be to use hot deck imputation according to the characteristics correlated with each vaccination, such as child's sex, birth order, age, area of residence, province, place of birth, wealth quintile, etc. but the list of variables may be long.

As I understand it, Rubin's procedure for combining multiple imputations is to produce a more robust standard error by taking account of both interval variance and variance between the estimates. In this case, I would take the calculated probability of each child from the estimating equation and then randomly vary by selecting a deviation using the normal distribution of the standard error, and then apply the adjusted probability for each child to a randomly selected number to determine wether the vaccine was given.

Let me know if this helps.

Shea
Re: Missing data [message #1523 is a reply to message #1511] Fri, 07 March 2014 01:55 Go to previous message
Dsisso is currently offline  Dsisso
Messages: 6
Registered: February 2014
Location: Montréal,QC
Member
Thank you every for your helpful replies. I will take i in consideration in order to rsolve this situation. Ihave ever tried the hotdeck procedure which reveals less robust and I seems more interesting to focus on estimates fro regressions rather than individual doses.
Many thanks
D. Sissoko
Previous Topic: DHS 2012 Ecuador
Next Topic: HELP!: Analysis on youth-specific age group only
Goto Forum:
  


Current Time: Thu Mar 28 17:58:11 Coordinated Universal Time 2024