My next question pertains to the sampling weights. Based on my understanding, only men and women age 35-64 in the half of households that were selected for the men's questionnaire (i.e., 2584 women and 2163 men; unweighted) were eligible to participate in the blood glucose measures. Of those who were eligible, 1937 women and 1379 consented for blood glucose testing and were asked about diagnosis of diabetes. I do not understand how 2,621 women (2.6%) and 2,091 men (2.6%) self-reported having diabetes (Table 17.5 from the 2013 Namibia Final Report). It is puzzling, because the counts also do not seem to indicate any use of de-normalized weights.

I am using SAS 9.4, and below was my syntax for duplicating the total number of people self-reported having diabetes for each sex:

proc surveyfreq data=nm.NMPR61FL2;

tables hv104*SH328;

cluster Hv021;

strata Hv023;

weight newWeight; /*newWeight = hv028/1000000*/

run;

Any insights into this matter would be greatly appreciated.

]]>

Is "weight_all = v005/1000000(Total number of households during each survey year/sample households in each survey)" a verbal description of a Stata command? If so, what command? It's not clear what you are doing to the weights, other than dividing by 1000000, which will not affect the results at all.

The "Population size" in the results has been distorted by svyset, particularly by the components other than the weight. You can ignore it.

]]>

But after using the correct weight variable, I still have a challenge: I am running a logistic regression model (CSLOGISTIC) to study the stunting risk factors, using the KR file of the Rwanda DHS 2014-2015.

The challenge is that the sample design information table in the output says that for the unweighted cases, only 859 are valid and 2679 are invalid, which makes the sample size used for logistic regression only 857 instead of 3538!

Could you help me and clarify where I am getting it wrong?

I am running the model in SPSS and I have attached here the output table.

Thank you for your help!

Hope]]>

I want to do a pooled analysis of BDHS 2007, 2011 and 2014 (KR file for children). As per forum discussion, during regression analysis of pooled data, I need to de-nomalize the sampling weight. I did this using

gen wgt = weight_all *** weight_all = v005/1000000(Total number of households during each survey year/sample households in each survey) and append three surveys data gen psu = cluster *** each survey clusters are unique eg. 2007_1, 2011_1 and so on svyset psu, weight(wgt) strata(strat) , singleunit(centered) || _n *** each survey strata are unique

When I fitted weighted logistic regression, after adjusting weight this way, I found:

svy: logit y x Number of strata = 63 Number of obs = 19,896 Number of PSUs = 1,561 Population size = 44,882,311

Could anyone please suggest that the process is correct? Is the population size reliable or not?

Thank you very much. ]]>

Regards,

Aliza]]>

I have pooled data from about 10 countries, and ready to generate new strata/cluster variables. Consistent with advice from this board, I created a new variable, "survey_year" which takes the value of 1, 2, 3, 4 (to represent each of the survey rounds within a country.

Here are the two commands I intend to use:

egen cluster=group(survey_year v001)

egen stratum=group(survey_year v024 v0025)

But I am not sure if the survey_year variable is uniquely representing the survey rounds. For example, I have the following rounds of surveys, and their corresponding "survey_year" values:

Congo2007: survey_year=1

Congo2013: survey_year=2

Gabon2012: survey_year=1

Ghana2003: survey_year=1

Ghana2014: survey_year=2

So, survey_year=1 can refer to different years. Does this matter, for generating new strata and cluster variables?

What if I create two variables: "survey_round" (with values: 1, 2, 3, 4) - to capture the different rounds of survey, and a second variable: "survey_year" (to consecutively number the survey years: 2003, 2004, 2005, 2006, .......2017).

The two new strata variables could be:

egen cluster=group(survey_year survey_round v001)

egen stratum=group(survey_year survey_round v024 v0025)

Would these new approach be ok?

Thanks - Yy]]>

For years 1993-2008, you can use v022 for stratification.

For 2013, it looks like both v022 and v023 are not coded in a correct way. Therefore, you need to create the stratification variable yourself. Fortunately the stratification is well documented in the final report https://dhsprogram.com/pubs/pdf/FR352/FR352.pdf in page 197.

You will need to create a new stratification variable based on the following variables v025 and shprovin.

Another proxy for the stratification variable might be based on v025, v024 and shregion, however this will result in 31 strata, 5 strata less than the original 26 strata. Below is the Stata code for the proxy version:

egen strata = group(v024 shregion v025)