* do e:\DHS\programs\U5_mortality\child_mortality_ci_21March2013_do.txt cd e:\DHS\programs\U5_mortality set logtype text log using child_mortality_ci_21March2013_log.txt, replace set more off /* Tom Pullum, tpullum@icfi.com, Nov. 30, 2011, then updated This program estimates 95% confidence intervals for the DHS child mortality rates produced by the log probability model in child_mortality_do_date.txt. In all of the following, "standard error" refers to an estimated standard error and "confidence interval" refers to a 95% confidence interval The input file for this program is an output file produced by the child_mortality program. This program will require some modification if there is a departure from the basic 8 age intervals normally used by DHS, for months 0, 1-2, 3-5, 6-11, 12-23, 24-35, 36-47, 48-59. The file "q_output_logprob...dta" produced by child_mortality_do_date.txt includes the following variables Interval Months q std. error Low end of 95% ci High end of 95% ci 1 0 q_month_0 sterr1 L_q_month_0 H_q_month_0 2 1-2 q_month_1to2 sterr2 L_q_month_1to2 H_q_month_1to2 3 3-5 q_month_3to5 sterr3 L_q_month_3to5 H_q_month_3to5 4 6-11 q_month_6to11 sterr4 L_q_month_6to11 H_q_month_6to11 5 12-23 q_year_1 sterr5 L_q_year_1 H_q_year_1 6 24-35 q_year_2 sterr6 L_q_year_2 H_q_year_2 7 36-47 q_year_3 sterr7 L_q_year_3 H_q_year_3 8 48-59 q_year_4 sterr8 L_q_year_4 H_q_year_4 In this list, sterr1 is the standard error of the natural log of q_month_0, and NOT of q_month_0 itself, for example, and L_q_month_0 = L_q_month_0/exp(1.96*sterr1) and H_q_month_0 = H_q_month_0*exp(1.96*sterr1) These confidence intervals are adjusted for the weights, clustering, and stratification (if possible), using the svyset command. The q's are estimated with a log probability model (GLM with binomial error and log link) and match the standard DHS estimates exactly. These eight basic rates are converted to a set of five standard rates: neonatal = q_month_0 prob_1q0 = 1-(1-q_month_0)*(1-q_month_1to2)*(1-q_month_3to5)*(1-q_month_6to11) postneonatal = prob_1q0-neonatal prob_4q1 = 1-(1-q_year_1)*(1-q_year_2)*(1-q_year_3)*(1-q_year_4) prob_5q1 = 1-(1-prob_1q0)*(1-prob_4q1) = 1-(1-q_month_0)*(1-q_month_1to2)*(1-q_month_3to5)*(1-q_month_6to11)* (1-q_year_1)*(1-q_year_2)*(1-q_year_3)*(1-q_year_4) The ci for the neonatal rate is simply (L_q_month_0 , H_q_month_0) The ci for the postneonatal rate is complicated by the fact that the postneonatal rate is calculated as a difference between the two other rates, and those rates are not at all independent of each other. I would recommend calculating pseudo-postnatal = 1-(1-q_month_1to2)*(1-q_month_3to5)*(1-q_month_6to11), then calculating a ci for pseudo-postneonatal using the methods described below, and then re-centering that ci around the point estimate of the postneonatal rate. To summarize, we need confidence intervals for the following compound rates: prob_1q0 = 1-(1-q_month_0)*(1-q_month_1to2)*(1-q_month_3to5)*(1-q_month_6to11) pseudo postneonatal = 1-(1-q_month_1to2)*(1-q_month_3to5)*(1-q_month_6to11) prob_4q1 = 1-(1-q_year_1)*(1-q_year_2)*(1-q_year_3)*(1-q_year_4) prob_5q1 = 1-(1-q_month_0)*(1-q_month_1to2)*(1-q_month_3to5)*(1-q_month_6to11)* (1-q_year_1)*(1-q_year_2)*(1-q_year_3)*(1-q_year_4) The following procedure is proposed for estimating these confidence intervals. Alternative approaches are definitely possible. My procedure is simply to "map" confidence intervals for basic q's into a confidence interval for a function of those q's. The logs of the 8 basic rates are the coefficients estimated by the log probability model. They are labeled lnq1 through lnq8. For example, lnq1=log(q_month_0), etc. Refer to 1q0=IMR as just q, without a subscript. Then q=1-(1-q1)*(1-q2)*(1-q3)*(1-q4)=1-p1*p2*p3*p4, where pi=1-qi (i=1,2,3,4) or p=p1*p2*p3*p4, where p=1-q. Taking logs, lnp=lnp1+lnp2+lnp3+lnp4, where lnpi=log(pi) (i=1,2,3,4) and lnp=log(p) It appears (evidence can be provided) that the q's (and p's) for the age intervals are close to being independent. Their covariances are very small, and sometimes positive and sometimes negative, so if they were included, they would tend to cancel out. I think it will be a good approximation to say that the variances are related simply as var(lnp)=var(lnp1)+var(lnp2)+var(lnp3)+var(lnp4) Therefore, for the IMR: begin with the a confidence interval for ln(qi), which is given in the Stata output as the ci for the coefficient in the log probability model. Step 1--exponentiate this ci to get a ci for qi (i=1,2,3,4) (already calculated) 2--subtract from 1 to get a confidence interval for pi=1-qi (the low and high ends will reverse) 3--log this ci to get a ci for ln(pi) 4--subtract L from H and divide by 2(1.96) to get an se for ln(pi) 5--square to get variance of ln(pi) 6--add up the variances of ln(p1), ln(p2), ln(p3), ln(p4) 7--take the square root of the sum; this will be the est. se. of ln(1-IMR) 8--use this se to calculate a ci for ln(1-IMR) 9--exponentiate this ci to get a ci for 1-IMR 10--subtract from 1 to get a ci for IMR (the low and high ends will reverse) */ ********************************************************* program define make_ci_1q0 *list L_q_month*, table clean *list H_q_month*, table clean * Step 1 gen L_q1=L_q_month_0 gen L_q2=L_q_month_1to2 gen L_q3=L_q_month_3to5 gen L_q4=L_q_month_6to11 gen H_q1=H_q_month_0 gen H_q2=H_q_month_1to2 gen H_q3=H_q_month_3to5 gen H_q4=H_q_month_6to11 *list L_q1-L_q4, table clean *list H_q1-H_q4, table clean * Step 2 gen L_p1=1-H_q1 gen L_p2=1-H_q2 gen L_p3=1-H_q3 gen L_p4=1-H_q4 gen H_p1=1-L_q1 gen H_p2=1-L_q2 gen H_p3=1-L_q3 gen H_p4=1-L_q4 *list L_p*, table clean *list H_p*, table clean * Step 3 gen L_lnp1=log(L_p1) gen L_lnp2=log(L_p2) gen L_lnp3=log(L_p3) gen L_lnp4=log(L_p4) gen H_lnp1=log(H_p1) gen H_lnp2=log(H_p2) gen H_lnp3=log(H_p3) gen H_lnp4=log(H_p4) *list L_lnp*, table clean *list H_lnp*, table clean * Step 4 gen selnp1=(H_lnp1-L_lnp1)/(2*1.96) gen selnp2=(H_lnp2-L_lnp2)/(2*1.96) gen selnp3=(H_lnp3-L_lnp3)/(2*1.96) gen selnp4=(H_lnp4-L_lnp4)/(2*1.96) *list selnp*, table clean * Step 5 gen varlnp1=selnp1^2 gen varlnp2=selnp2^2 gen varlnp3=selnp3^2 gen varlnp4=selnp4^2 *list varlnp*, table clean * Step 6 gen varln1q0=varlnp1+varlnp2+varlnp3+varlnp4 *list varln1q0, table clean * Step 7 gen se_ln_one_minus_1q0=sqrt(varln1q0) *list se_ln_one_minus_1q0, table clean * Step 8 gen ln_one_minus_1q0=log(1-prob_1q0) gen L_ln_one_minus_1q0=ln_one_minus_1q0-1.96*se_ln_one_minus_1q0 gen H_ln_one_minus_1q0=ln_one_minus_1q0+1.96*se_ln_one_minus_1q0 *list L_ln_one_minus_1q0 H_ln_one_minus_1q0, table clean * Step 9 gen L_one_minus_1q0=exp(L_ln_one_minus_1q0) gen H_one_minus_1q0=exp(H_ln_one_minus_1q0) *list L_one_minus_1q0 H_one_minus_1q0, table clean * Step 10 gen L_1q0=1-H_one_minus_1q0 gen H_1q0=1-L_one_minus_1q0 list lw uw variable value L_1q0 prob_1q0 H_1q0, table clean end ********************************************************* program define make_ci_4q1 * same as make_ci_1q0 but with 4 age intervals of one year each * Step 1 gen L_q5=L_q_year_1 gen L_q6=L_q_year_2 gen L_q7=L_q_year_3 gen L_q8=L_q_year_4 gen H_q5=H_q_year_1 gen H_q6=H_q_year_2 gen H_q7=H_q_year_3 gen H_q8=H_q_year_4 * Step 2 gen L_p5=1-H_q5 gen L_p6=1-H_q6 gen L_p7=1-H_q7 gen L_p8=1-H_q8 gen H_p5=1-L_q5 gen H_p6=1-L_q6 gen H_p7=1-L_q7 gen H_p8=1-L_q8 * Step 3 gen L_lnp5=log(L_p5) gen L_lnp6=log(L_p6) gen L_lnp7=log(L_p7) gen L_lnp8=log(L_p8) gen H_lnp5=log(H_p5) gen H_lnp6=log(H_p6) gen H_lnp7=log(H_p7) gen H_lnp8=log(H_p8) * Step 4 gen selnp5=(H_lnp5-L_lnp5)/(2*1.96) gen selnp6=(H_lnp6-L_lnp6)/(2*1.96) gen selnp7=(H_lnp7-L_lnp7)/(2*1.96) gen selnp8=(H_lnp8-L_lnp8)/(2*1.96) * Step 5 gen varlnp5=selnp5^2 gen varlnp6=selnp6^2 gen varlnp7=selnp7^2 gen varlnp8=selnp8^2 * Step 6 gen varln4q1=varlnp5+varlnp6+varlnp7+varlnp8 * Step 7 gen se_ln_one_minus_4q1=sqrt(varln4q1) * Step 8 gen ln_one_minus_4q1=log(1-prob_4q1) gen L_ln_one_minus_4q1=ln_one_minus_4q1-1.96*se_ln_one_minus_4q1 gen H_ln_one_minus_4q1=ln_one_minus_4q1+1.96*se_ln_one_minus_4q1 * Step 9 gen L_one_minus_4q1=exp(L_ln_one_minus_4q1) gen H_one_minus_4q1=exp(H_ln_one_minus_4q1) * Step 10 gen L_4q1=1-H_one_minus_4q1 gen H_4q1=1-L_one_minus_4q1 list lw uw variable value L_4q1 prob_4q1 H_4q1, table clean end ********************************************************* program define make_ci_5q0 * same as make_ci_1q0 but with 8 age intervals * skip over steps 1-5 * Step 6 gen varln5q0=varlnp1+varlnp2+varlnp3+varlnp4+varlnp5+varlnp6+varlnp7+varlnp8 * Step 7 gen se_ln_one_minus_5q0=sqrt(varln5q0) * Step 8 gen ln_one_minus_5q0=log(1-prob_5q0) gen L_ln_one_minus_5q0=ln_one_minus_5q0-1.96*se_ln_one_minus_5q0 gen H_ln_one_minus_5q0=ln_one_minus_5q0+1.96*se_ln_one_minus_5q0 * Step 9 gen L_one_minus_5q0=exp(L_ln_one_minus_5q0) gen H_one_minus_5q0=exp(H_ln_one_minus_5q0) * Step 10 gen L_5q0=1-H_one_minus_5q0 gen H_5q0=1-L_one_minus_5q0 list lw uw variable value L_5q0 prob_5q0 H_5q0, table clean end ********************************************************* program define make_ci_postneonatal * same as make_ci_1q0 but with age intervals 2-4 only * strategy: construct an interval around the q for months 1-11, * then re-center on the postneonatal rate * skip over steps 1-5 gen prob_pnn=1-(1-q_month_1to2)*(1-q_month_3to5)*(1-q_month_6to11) * Step 6 gen varlnpnn=varlnp2+varlnp3+varlnp4 * Step 7 gen se_ln_one_minus_pnn=sqrt(varlnpnn) * Step 8 gen ln_one_minus_pnn=log(1-prob_pnn) gen L_ln_one_minus_pnn=ln_one_minus_pnn-1.96*se_ln_one_minus_pnn gen H_ln_one_minus_pnn=ln_one_minus_pnn+1.96*se_ln_one_minus_pnn * Step 9 gen L_one_minus_pnn=exp(L_ln_one_minus_pnn) gen H_one_minus_pnn=exp(H_ln_one_minus_pnn) * Step 10 gen L_pnn=1-H_one_minus_pnn gen H_pnn=1-L_one_minus_pnn list lw uw variable value L_pnn prob_pnn H_pnn, table clean gen L_pnn_true=postneonatal-(prob_pnn-L_pnn) gen H_pnn_true=postneonatal+(H_pnn-prob_pnn) list lw uw variable value L_pnn prob_pnn H_pnn, table clean list lw uw variable value L_pnn_true postneonatal H_pnn_true, table clean end ********************************************************* ********************************************************* *EXECUTION BEGINS HERE quietly make_ci_1q0 quietly make_ci_4q1 quietly make_ci_5q0 quietly make_ci_postneonatal list lw uw variable value L_q_month_0 q_month_0 H_q_month_0, table clean list lw uw variable value L_pnn_true postneonatal H_pnn_true, table clean list lw uw variable value L_1q0 prob_1q0 H_1q0, table clean list lw uw variable value L_4q1 prob_4q1 H_4q1, table clean list lw uw variable value L_5q0 prob_5q0 H_5q0, table clean replace L_q_month_0=0 if L_q_month_0<0 replace L_pnn_true =0 if L_pnn_true<0 replace L_1q0 =0 if L_1q0<0 replace L_4q1 =0 if L_4q1<0 replace L_5q0 =0 if L_5q0<0 list lw uw variable value L_q_month_0 q_month_0 H_q_month_0, table clean list lw uw variable value L_pnn_true postneonatal H_pnn_true, table clean list lw uw variable value L_1q0 prob_1q0 H_1q0, table clean list lw uw variable value L_4q1 prob_4q1 H_4q1, table clean list lw uw variable value L_5q0 prob_5q0 H_5q0, table clean save, replace /* Compare with naive estimate The following approach makes no adjustments for weights, clustering, stratification, segmentation of the time interval, etc. In general, the estimated standard error of a proportion p is sqrt[p(1-p)/n], where n is the denominator of the proportion. Here we have R=d/E, where a so-called rate is acually an estimated probability, with d deaths in the numerator and exposure or risk E in the denominator. It can be shown that in this case the usual formula can be written as s.e. = R*sqrt[(1-R)/d] . Go to the q_output_division....dta file and get d=died1+died2+died3+died4 and R=IMR=prob_1q0. Calculate the lower and upper ends of the 95% confidence interval and compare with the approach used above. use results_with_ci_ET51.dta, clear gen d=died1+died2+died3+died4 gen IMR=prob_1q0 gen se_IMR=IMR*sqrt((1-IMR)/d) gen L_IMR=IMR-1.96*se_IMR gen H_IMR=IMR+1.96*se_IMR list lw uw variable value L_IMR IMR H_IMR, table clean * Similarly for other rates The naive estimates are much narrower than the robust confidence intervals, partly because they do not take the design effect into account, but also because the DHS estimates are constructed from shorter segments with more uncertainty. */