I am currently working on a project that will utilize the 2017 Ghana Maternal Health Surveys. I have noticed that most published articles I have come across, which used the said data, did not apply the weights. I am aware that the size of the unweighted sample is the same as the weighted sample for the individual (woman) data, but not for characteristics or variables (which I guess may be due to probability proportional to size).

My first question is, is it right not to apply the weights? If no, why do I see a lot of chi-square test for association p-values (my understanding is that you'd rather need a design-based F-statistic p-value in this case).

Also, would it be necessary for me to apply the weights if I only intend to utilize the kids or children file (asking this because, unlike recent GDHS files, weights were not included in the children file in the 2017 GMHS)?

Thanks.]]>

We cannot provide much support for this survey. It only has "raw" data files, rather than standard recode files, because it was not a standard DHS survey.

We always recommend using weights in order to get unbiased estimates. We also recommend the svy adjustments for clustering and stratification in order to get robust estimates of standard errors. I do see that the CH file does not include weights. Ideally you would copy qweight into the CH file from the IQ file. The weights are the same for all women and children in the same cluster, so this is an easy merge.

The weights are needed because of variation in the sample size across strata and because of adjustments for nonresponse. They do not depend specifically on sampling PSUs with probability proportional to size, although that is a characteristic of DHS samples.

A case can be made (economists argue this) that weights are not needed, but that's when you are more interested in significance tests than in estimates of means, proportions, etc. It is especially important to use weights if you want to compare estimates from successive surveys. Otherwise, differences between surveys can just be due to changes in the pattern of under- and over-sampling the different strata.

]]>