The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Topics » Nutrition and Anthropometry » Missing values for anemia
Missing values for anemia [message #9188] Fri, 19 February 2016 16:36 Go to next message
clarapd is currently offline  clarapd
Messages: 8
Registered: November 2015
Member
Hi everyone, I'm new in this forum.

I'm trying to use the prevalence of anemia among women of reproductive age in several Sub-Saharan countries in a regression.
First, I merged the individual dataset with the household dataset with v001 and v002. Now, to match the anemia data (ha57_x, I understant that each one of these variables in linked to a woman in reproductive age of the household) with each woman I'm using the variable v003 (respondent's line number) and the ha0_x (Index to the household schedule):
gen anemia=ha57_1 if ha0_1=v003
replace anemia=ha57_2 if ha0_2=v003 ...
I'm doing something wrong because the number of missing values I get is always higher than in the report of the survey.
Can someone help me?

Many thanks,

Clara


Clara Pons Duran

[Updated on: Mon, 22 February 2016 11:00]

Report message to a moderator

Re: Missing values for anemia [message #9234 is a reply to message #9188] Thu, 25 February 2016 18:15 Go to previous messageGo to next message
Liz-DHS
Messages: 1516
Registered: February 2013
Senior Member
Dear User,
A response from one of our experts, Dr. Tom Pullum:
Quote:

You are apparently merging an IR file with an HR file. The HR file has one very long record per household. It is much easier to merge the IR file with the PR file, which has one record per household member, as follows. Open the IR file first, then gen hv000=v000; gen hv001=v001; gen hv002=v002; gen hvidx=v003; sort hv000 hv001 hv002 hvidx; save as temp.dta, replace. Then open the PR file; sort hv000 hv001 hv002 hvidx; merge hv000 hv001 hv002 hvidx using temp.dta.

However, you do not even have to do this. v457 in the IR file is the woman's anemia level. It's not necessary to do ANY merging. For each woman, you can also get the anemia levels of her children (up to 6 children born in the past five years) with hw57_1 through hw57_6. Here the subscripts refer to the woman's children. In the HR file, the subscripts refer to the line number in the household survey.




Re: Missing values for anemia [message #9244 is a reply to message #9234] Fri, 26 February 2016 09:26 Go to previous messageGo to next message
clarapd is currently offline  clarapd
Messages: 8
Registered: November 2015
Member
Thanks for you response, it's very easy now to generate this variable without merging.
However, the results I obtain now are the same I had and there is always a big number of missing values (.). Only a few are coded as 9.
I understand that all women in reproductive age (15-49) were tested for anemia.

Thanks for your help,

Clara


Clara Pons Duran
Re: Missing values for anemia [message #9261 is a reply to message #9244] Mon, 29 February 2016 08:38 Go to previous messageGo to next message
Liz-DHS
Messages: 1516
Registered: February 2013
Senior Member
Dear User,
Dr. Pullum asks:
Quote:

Can you give me a specific survey that has this problem? I just looked at Ghana 2008, more or less at random, and it does not have this problem. In general, "." should not be described as "missing" (although I sometimes do this myself!). It means "not applicable". Something in the skip pattern or in the eligibility for the question is responsible for this code. Tell me the survey and I should be able to reply right away.


Re: Missing values for anemia [message #9262 is a reply to message #9261] Mon, 29 February 2016 09:04 Go to previous messageGo to next message
clarapd is currently offline  clarapd
Messages: 8
Registered: November 2015
Member
For example, Benin 2011-2012. I see many (.) in the anemia variable in the individual recode and I understand that all women in reproductive age are elegible.

Thanks


Clara Pons Duran
Re: Missing values for anemia [message #9294 is a reply to message #9262] Fri, 04 March 2016 16:10 Go to previous messageGo to next message
Liz-DHS
Messages: 1516
Registered: February 2013
Senior Member
Dear User,
A response from Dr. Tom Pullum:
Quote:

I have talked with some people in the DHS office who are familiar with the Benin 2011-12 survey. They say that only a 1/3 subsample of women 15-49 were given the hemoglobin test. I see from the data file that there were 16,599 women and 5,513 had the test, which is almost exactly 1/3. Of the 5,513 women, 454 have code "9", which is not part of the label but would be the actual number "missing". The code ".", as I said before, means "not applicable". These numbers are unweighted.

I have looked very briefly at the report and the questionnaire. I cannot see anything about the subsampling or eligibility for this test. But maybe I am missing it because my French isn't good enough.

Apart from the subsampling, I have to warn you that DHS had problems with the implementing agency for this survey. Height and weight data were collected but are omitted from the report because of evidence of poor quality. That's very unusual. The anemia data should probably also have been suppressed. I think there should be some more explicit warnings, but I believe the report does describe, somewhere, the field problems and concerns about quality. Appendix C includes tables that indicate relatively high levels of genuinely missing responses (indicated by "9", "99", etc., rather than ".").

I believe that Benin is a special case. If you find other surveys with high levels of ".", I hope the report will provide documentation of subsampling. Let me know if you have other related questions.

Re: Missing values for anemia [message #9312 is a reply to message #9294] Wed, 09 March 2016 06:42 Go to previous messageGo to next message
clarapd is currently offline  clarapd
Messages: 8
Registered: November 2015
Member
Thanks for you help. I did not understand this because some of these questions are only asked to subsamples of the populations and this is country specific and it's not explained in the questionnaires.

Now, knowing this, I'm performing the analysis for 29 different countries. When I perform the analysis at a country level, I undestand that I need to use the individual weights even if only a subsample answered the questions. My doubt is: when I append all datasets (29 countries) I need to rescale the weights to make each country representative of their population size, is that right? And for the rescale proces, I think that I need to drop (or to not take into account) the (.) values to make the subsample representative of the whole country population.

Thanks you very much.


Clara Pons Duran
Re: Missing values for anemia [message #9453 is a reply to message #9312] Tue, 29 March 2016 16:14 Go to previous messageGo to next message
Liz-DHS
Messages: 1516
Registered: February 2013
Senior Member
Dear User,
Do you still need assistance with this post?
Please let us know if you do.
Thanks!
Re: Missing values for anemia [message #9464 is a reply to message #9453] Wed, 30 March 2016 03:10 Go to previous messageGo to next message
clarapd is currently offline  clarapd
Messages: 8
Registered: November 2015
Member
Yes, my only doubt is if I need to rescale the weights of the countries I appended to make each country representative of their population size. And then, if I need to drop (or not to take into account) the (.) values to rescale.

Thanks you



Clara Pons Duran
Re: Missing values for anemia [message #9468 is a reply to message #9464] Wed, 30 March 2016 10:53 Go to previous message
Liz-DHS
Messages: 1516
Registered: February 2013
Senior Member
Dear User,
Response from Dr. Tom Pullum,
Quote:

Are you planning to pool multiple surveys? I would not recommend that. It will be best to calculate the prevalence of anemia separately for each survey, and not to calculate an overall prevalence for all surveys pooled. The combination of surveys does not correspond with a well-defined population and the dates of the surveys are different. But if you do want to pool, I recommend that you give equal weight to each survey. If, instead of that, you weight in proportion to the size of the country, your pooled estimate will be overwhelmed by the largest one or two countries.

If you want to weight in proportion to the country sizes (which I do not recommend) then you can use the population estimates for the number of children 0-4 (or other subpopulations) provided by the Population Division of the UN, for July 1 nearest to the mean or median date of the survey. The relevant number of cases in the survey would be the non-missing cases. That would also be the relevant number if you want to weight each survey equally.

The Stata commands for weighting in proportion to population size or for weighting each survey equally have already been posted on the forum, but let us know if you can't find them.


Previous Topic: Exclusive breastfeeding in Zambia 2013-14 dataset
Next Topic: Children age Namibia DHS VI
Goto Forum:
  


Current Time: Thu Apr 18 09:31:43 Coordinated Universal Time 2024