Forum: India
 Topic: Districts as cluster-level for multi-level model
Districts as cluster-level for multi-level model
I will appreciate your expert guidance on my query. We usually use 'psu' as the cluster level in DHS data. In my case, the group size is too small if I use 'psu'.
Group Variable |     #Groups    Minimum    Average    Maximum
           psu |     25,063          1        3.1         16

Since NFHS-4 is representative at the district level and we have to anyway create a variable for the cluster-weight, I am wondering if it is possible to use district as the cluster-level. I tried changing my weighting command for psu to district but as you can see in the output, I don't get the p-values and CIs.
*Rescaling of weights
	gen wt=v005/1000000
*Level 1 weights using scaling method 1: New weights sum to district sample size
	gen sqw = wt*wt 
	egen sumsqw = sum(sqw), by(sdistri) 
	egen sumw = sum(wt), by(sdistri) 
	gen pwt11 = wt*sumw/sumsqw 

* Survey setting
	gen wt2=1
	svyset sdistri, weight(wt2) strata(v023) , singleunit(centered) || _n, weight(pwt11)

Number of strata   =     2,509                  Number of obs     =  1,538,126
Number of PSUs     =     2,509                  Population size   =  1,438,715
Subpop. no. obs   =     78,446
Subpop. size      =  73,653.12
Design df         =          0
F(   0,      0)   =          .
Prob > F          =          .

 y           Coef.    Std. Err.      t       P>t     [95% Conf. Interval]

_cons     -1.585093   .0192937   -82.16       .            .           .

var(_cons) .1527032   .0153514                             .           .

Note: 5 strata omitted because they contain no subpopulation members.
Note: Strata with single sampling unit centered at overall mean.

I am not sure what is going wrong and will appreciate any understanding.
Thank you

Forum: Other countries
 Topic: Myanmar 2020 data
Myanmar 2020 data
It looks like the 2015-2016 Myanmar DHS was completed in July 2016.

What is the anticipated date that the 2020 data will be available?
 Topic: Issues with Honduran DHS dataset 2011-12
Issues with Honduran DHS dataset 2011-12
Dear all,

We are having issues using the 2011-12 Honduran dataset.

When you see the flowchart (attached in file), one can see that n is 5,487 which means that the observations in the database are 5,627 minus the 140 women that have children with missing nutritional status values.

Using the command tab1 which is part of the epicalc package in R, one can see that the number of women living in the rural and urban areas is 3878 and 1749, respectively (attached in file). These numbers add up to 5,627 women which is n before excluding women who have children with missing nutritional status values. Using this command we have not adjusted for survey design.

When we generated a 2x2 table (using the variables place of residence (urban and rural) and child stunting category) with the survey package in R and adjusting for survey design, one can notice that n is 4,248 mothers with children and not 5,487 mothers with children as we have defined in our flow chart (attached in file).

We also noticed that when we use the svyby commands to generate prevalence, the results are difficult to interpret, as they are not prevalences but look more like integers (e.g. 1 and 2 for place of residence.

Please note the several commands we used to generate the above results:

1) To adjust for survey design:
dhsdesign <- svydesign(id= violnutr_3R$prisam, strata = violnutr_3R$stratasam, weights= violnutr_3R$samweight/1000000, data= violnutr_3R)

-prisam is v021
-Stratasam is v022
-Samweight is d005 (weight for domestic violence module)

2) To generate a frequency of the variable place of residence


-violnutr_3R is the name of the dataset
-plares is the variable name for the places of residence (urban/rural)

3) To generate a 2x2 table (place of residence and child stunting category)

svyby(~chstunting_cat,~plares,dhsdesign, svymean, na.rm=TRUE)

We would appreciate your support in sorting out our issue.

Best regards,


 Topic: Mexico and urban/rural
Mexico and urban/rural
In the Mexico DHS, the variable v102 has four possible outcomes based on the number of people in the area, rather than the standard urban/rural. Is there a way of matching that variable or any of the other variables to urban/rural? The de facto variable, v134, is a copy of v102, so that does not help.

Claus Portner
Forum: Merging data files
 Topic: Are the merging commands and results correct
Are the merging commands and results correct
Dear all,

Please I need your help in checking if the commands below are valid.

I am merging the IR file (in 2018 the Nigerian Demographic and Health Survey). This is because, I wanted the socio-demographics, economic characteristics and nutritional status of women (BMI) in the IR file appearing in the PR file since the PR file contains dejure and defacto respondents and should be linkable to the IR. I intend examining the relationship between maternal nutritional status and U5 children's nutritional statust (wasting, underweight etc). among others.

These are my commands:

//Preparing PR recode for merging
***Preparing women recode for merging
cd " C:\Users\USER\Documents\NG_2018_DHS_11292019_1323_116593\NGP R7ADT "
use "NGPR7AFL.DTA",clear
rename hv001 v001
rename hv002 v002
sort v001 v002 hvidx
duplicates list v001 v002 hvidx
save "C:\Users\USER\Desktop\Papers, Journals, project\Lawal\New project\PR.dta", replace

//Preparing IR recode for merging
cd " C:\Users\USER\Documents\NG_2018_DHS_11292019_1323_116593\NGI R7ADT "
use caseid v000 v001 v002 v003 v004 v131 v130 v717 v445 v106 v013 v190 v717 v025 v130 v131 v024 using "NGIR7AFL.DTA", clear
rename v003 hvidx
sort v001 v002 hvidx
duplicates list v001 v002 hvidx
save "C:\Users\USER\Desktop\Papers, Journals, project\Lawal\New project\Data 2.dta", replace

//merging IR to PR file
cd "C:\Users\USER\Desktop\Papers, Journals, project\Lawal\New project"
use "PR.dta",clear
merge 1:1 v001 v002 hvidx using "Data 2.dta"

//keeping cases (matched and unmatched)
keep if _merge==1 |_merge==3

I noticed that total observations for variables are so different.

I will be glad to hear from anyone. Thank you very much.

Forum: General Data Questions
 Topic: DHS 8 HIV Questions
DHS 8 HIV Questions
**SORRY never mind I answered my own question--I didn't realize these are still DHS 7, not 8--my apologies!!**


First, as always, I must express my immense gratitude for this forum and to the users and experts of this DHS community who have helped me over the years.

My question today is about the new DHS 8 data for the HIV section in the women's and men's surveys. I looked at both the Zambia 2018 and Guinea 2018 data sets and could not find the variables for several of the new questions, specifically the questions asking about self-reported HIV status and the follow up questions (about ART use and stigma) for the folks who report being positive.

Was this data not collected for these two countries?

Thanks in advance for any clarification that can be provided!


Forum: Ethiopia
 Topic: Re-message18741
Re-message18741
Dear DHS data experts,
I am using KR file to replicate Table 11.3 page 205 in the Ethiopian DHS 2016 final report. As exclusive breastfeeding (EBF) and Non-Exclusive BF report on the final 2016 Ethiopian DHS.
I have re-categorized:
 M4 variable (Duration of breastfeeding) into '95' still breastfeeding (CBF) as '1' and all other as Not still BF/ Not-breastfeeding (Non.BF),'0'. Other variables:
 Breastfeeding and consuming plain water only (M4+V409), (BFPW)
 Breastfeeding and consuming non milk liquids (m4+V410+V412C+V413), (BFNML)
 Breastfeeding and consuming other milk (m4+V411), (BFOM)
 Breastfeeding and consuming complementary food (m4+m39a), (BFCF)
 b5 (child is alive)
 b9 (child lives with whom= 0)
 b19 (current age of child in months < 24 months)
In this way, I have extracted and found the following results which is not consistent with the above table 11.3.
Please, give me direction where is my failure; at variable extraction process or transforming and computing using SPSS?

Table 11.3 on page 205 final 2016 Ethiopian DHS i found as:
EDHS 1 2 3 4 5 6
0 5 month 50 1042 350 70 108 107 1092

Also, the total is added to 1185 in the report which is different from 1092 I found.

I have also read Guide to DHS 7 statistics and many forums on EBF. But I didn't get about Ethiopian DHS 2016 command to find Exclusive breastfeeding vs. Non-Exclusive BF. The questions was also asked by other users in 2017 but no reliable information given on the forum page at that time.

1. Are the variables identified are correct and the calculation is alright to transform using SPSS software?
2. Would you give me clue, how to extract the variables and make suitable for analysis using SPSS or STATA software's, Please?
I am waiting your keen response as soon as possible, please. As my outcome variable is to look at Exclusive breastfeeding and Non-exclusive braestfeeding mothers using the selected variables from the 2016 Ethiopian DHS.
Best regards,

Forum: Kenya
 Topic: Srata for 1989 survey
Srata for 1989 survey
Dear Colleagues
I want to ask about DHS, 1989 Kenya. The dataset does not have the Strata variable V022, do you know which variable I could use for strata in this case.

