Secondly, the PDHS 2017-18 has 580 PSUs and 16 strata, while the PDHS 2012-13 has 500 PSUs and 10 strata. Would it be statistically okay to append both data files and conduct trend analysis?

Kindly guide!

Thank you.

]]>

The Pakistan surveys are not easily combined because of their differences. First, you cannot simply append the two surveys data because of the normalization of the sampling weight, especially with the different coverage and different normalization polices. The 2012-13 survey did not include AJK and FATA regions, but the two regions are small, they together represent only about 5% of Pakistan. Second, the 2017-18 survey had a separate normalization policy, requested by the Government of Pakistan. GB and AJK were normalized separately and independently. If you want to use the 2017-18 survey data for all Pakistan, the denormalization of must be done in 3 parts, separately for GB, AJK, and the rest of Pakistan. We cannot help with that.

The recommendation is that you analyze trends using just the areas that were included in both the surveys. You can append the data files but note that the PSUs (clusters) in the two surveys were different.

]]>

So, using the appended datafile, would it be correct to look at trends by looking at each province/region individually (or two provinces/regions combined) that were collected in both surveys by using the Stata commands:

. svy: logit i.sample d105a if v024 == 1

. svy: logit i.sample d105a if v024 == 1 2

* The newly created variable in the appended file "sample" will be coded as 0 if data from PDHS 2012-13, and 1 if data from PDHS 2017-18

* Variable v024 to be recoded as binary (0/1)

Finally, would it be statistically correct to analyze trends even if, for the same province/region, the number of clusters is different in the two surveys? Is there any rule-of-thumb when the number of clusters is different in two surveys?

Thank you.

]]>

I agree with the approach you describe. It's what I do when looking at trends. Regarding the question about clusters---these are census enumeration areas drawn at random from a sampling frame. The number of clusters within a region, as well as the specific clusters themselves, varies from one DHS survey to the next. The specification of svyset will build in adjustments for sample weights, clusters, and strata. After including svyset and svy, you don't need any further adjustments related to the clusters.

]]>

Using the approach described below for trend analysis (.svy: logit i.sample d105a if v024 == 1); is it possible that 95% confidence intervals of two proportions overlap but the p-value of the trend is statistically significant? If not, then what could be the possible explanation please?

Thanks]]>

You are correct. Two means (or proportions) can be significantly different even if the confidence intervals overlap. The reason for this is that the standard error of a difference is less than the sum of the two standard errors (the widths of the confidence intervals are proportional to the standard errors). In most situations, the standard error of a difference is equal to the square root of the sum of the squares of the two standard errors. It's like a right triangle, in which the hypotenuse is always shorter than the sum of the other two sides. Check a statistics text on tests and confidence intervals for a difference. ]]>

]]>