The DHS Program User Forum: Dataset use in Stata

Home » Data » Dataset use in Stata » Imputation of missing data

Show: Today's Messages :: Show Polls :: Message Navigator

Imputation of missing data [message #29911]

Wed, 21 August 2024 16:05

Ashlesha Pal
Messages: 3
Registered: July 2024

Member

I have been doing an analysis for which I need to generate women empowerment scores, based on "Swapper scale for women empowerment" (Ewerling F, Raj A, Victora CG, Hellwig F, Coll CV, Barros AJ. SWPER Global: A survey-based women's empowerment index expanded from Africa to all low- and middle-income countries. J Glob Health. 2020 Dec;10(2):020343. doi: 10.7189/jogh.10.020434. PMID: 33274055; PMCID: PMC7699005.) using IR file of India, Pakistan and Bangladesh. I am unable to generate these scores for a subpopulation as the husband's education in about 41 observations in that subpopulation are missing or don't know. IS there a way I can some how impute those values without missing those observations?

Report message to a moderator

Re: Imputation of missing data [message #29921 is a reply to message #29911]

Thu, 22 August 2024 12:28

Bridgette-DHS
Messages: 3230
Registered: February 2013

Senior Member

Following is a response from Senior DHS staff member, Tom Pullum:

In this kind of a situation the researcher (you) must make a "judgment call" about what to do. I can suggest potential strategies. One possibility is to drop those cases. Naturally, we don't like to do that. Another would be to assign these men the mean or median or modal value for all men or for all men in some subpopulation that includes these men, such as their district. I would not recommend anything more complicated. Elaborate methods, such as multiple imputation, do exist, but with only 41 cases that would be a waste of effort.

Whatever you do, it would be good to include a comment or footnote describing it, so someone could potentially match your results. If you look at the tables in DHS final reports, you will sometimes find a footnote that says what was done with missing (distinct from Not Applicable) values.

You can also look at whether any estimates appear to change or differ, depending on how you handled such cases. With only 41 questionable cases, you will probably find that the results are not sensitive to whatever option you choose.

Report message to a moderator

Previous Topic:	Poolled logistic regression
Next Topic:	Issues with Pooled Multi-Country DHS Data

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Dec 14 14:02:10 Coordinated Universal Time 2025