The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Dataset use in Stata » Imputation of missing data
Imputation of missing data [message #29911] Wed, 21 August 2024 16:05 Go to next message
Ashlesha Pal is currently offline  Ashlesha Pal
Messages: 3
Registered: July 2024
Member
I have been doing an analysis for which I need to generate women empowerment scores, based on "Swapper scale for women empowerment" (Ewerling F, Raj A, Victora CG, Hellwig F, Coll CV, Barros AJ. SWPER Global: A survey-based women's empowerment index expanded from Africa to all low- and middle-income countries. J Glob Health. 2020 Dec;10(2):020343. doi: 10.7189/jogh.10.020434. PMID: 33274055; PMCID: PMC7699005.) using IR file of India, Pakistan and Bangladesh. I am unable to generate these scores for a subpopulation as the husband's education in about 41 observations in that subpopulation are missing or don't know. IS there a way I can some how impute those values without missing those observations?
Re: Imputation of missing data [message #29921 is a reply to message #29911] Thu, 22 August 2024 12:28 Go to previous message
Bridgette-DHS is currently offline  Bridgette-DHS
Messages: 3190
Registered: February 2013
Senior Member

Following is a response from Senior DHS staff member, Tom Pullum:

In this kind of a situation the researcher (you) must make a "judgment call" about what to do. I can suggest potential strategies. One possibility is to drop those cases. Naturally, we don't like to do that. Another would be to assign these men the mean or median or modal value for all men or for all men in some subpopulation that includes these men, such as their district. I would not recommend anything more complicated. Elaborate methods, such as multiple imputation, do exist, but with only 41 cases that would be a waste of effort.

Whatever you do, it would be good to include a comment or footnote describing it, so someone could potentially match your results. If you look at the tables in DHS final reports, you will sometimes find a footnote that says what was done with missing (distinct from Not Applicable) values.

You can also look at whether any estimates appear to change or differ, depending on how you handled such cases. With only 41 questionable cases, you will probably find that the results are not sensitive to whatever option you choose.


Previous Topic: Poolled logistic regression
Next Topic: Issues with Pooled Multi-Country DHS Data
Goto Forum:
  


Current Time: Sat Nov 9 01:52:33 Coordinated Universal Time 2024