This is outside the scope of the forum but I'll suggest what MAY be going on. Two conventions regarding the normalization of weights may be in conflict. In the page you attached, the weights are normalized to add to one. That's one convention. However, with pweights (are you using pweights?) Stata automatically normalizes the weights to have a MEAN of 1. So far as I know, with pweights it is actually impossible to over-ride that. Stata does that so the weighted and unweighted totals will match, which is usually desirable from a sampling perspective.

I don't have time to do this, but if I did, I would go over the algebra of the fractional weights to confirm mathematically that the mean should be 0.5 under the first convention and then see what happens when a small set of maybe 10 numbers is analyzed with Stata.

Another option would be to ask one of the authors of the World Bank report. Sorry we can't be more helpful.

]]>

Recently I posted a program to merge all the data for children, mothers, and fathers. It may go beyond what you need, but it combines the data in the BR ((and KR), PR, IR, and MR files with children age 0-17 as cases. For the NFHS-4 and -5 surveys, which had only a 1/6 subsample of men, you will lose a lot of children and mothers in order to get the fathers, but yes, you will still have a representative sample. The estimates will be unbiased. And because these surveys were so large, you will still have a large sample.

If you want to compare the significance of effects for the mothers and fathers, you need to be careful. To take a simple example, say you wanted to look at the effects of maternal and paternal education on child survival. If you use the full sample to estimate the maternal effect, and the 1/6 sample to estimate the paternal effect, both coefficients will be unbiased, and comparable. However, even if the effects were equal the t or z score for the mothers would be about sqrt(6)=2.4 times as large as the one for men, with much more potential to be statistically significant. You'd have to take that difference in statistical power into account if you inferred that the mother's education was significant, but the father's education was NOT.

]]>

I have checked with Senior DHS Researcher, Sara Riese, and we agree that you could use a two-level model with women at level 1 and region at level 2, with 1 as the level 2 weight, because regions were not sampled. If the SPA you are using was not a census, and the facilities were sampled, then there were probably different sampling fractions for different facility types. The facility weights are proportional to the inverse of the sampling fractions for the respective facility types. The client weight accounts for the selection of the facility.

Let us know if you have further questions. For your preliminary analysis, a simplified version of svyset will be ok, but you want to be confident in the final version.

]]>

I am attaching a screenshot of the content that I am referring to, which states that the mean of the fractional rank would be exactly 0.5 using the commands that I applied.

My analysis includes a semiparametric extension of the Wagstaff index comparing the values of indices across different districts. ]]>

Just to make sure I understood it correctly. If I merge the data from KR, MR and IR files and exactly get the information on the parents of the children and do my analysis. Though the sample size would drastically reduce to almost 1/6 of the total sample I can generate tables and do regression by applying mv005 weights. Can I still get results that will be representative? or it doesn't matter for causal inference for it to be representative?

Any help is appreciated. Thank you.]]>

The first issue I see is that children are linked with their mothers, and you cannot be sure that the mother's current partner is the child's father. To confirm that, you have to work with hv113-hv114 in the PR file, which identifies the father if he is alive and in the same household. Second, often there is subsampling of men for the interview with men. For example, in the NFHS-4 and -5, only 1/6 of men were interviewed. For these reasons, a study that includes the effect of the father's characteristics on the child's health and welfare can be challenging. (Important, but challenging.)

It is recommended that if you have a table, regression, etc., that includes variables from the survey of men, even if it also includes variables from the survey of women, you should use mv005 for the weight. The reason is that nonresponse is higher for men than for women. This would apply even if you are not using the CR file. It applies if you do any merge with the MR file and are including in your command any variables from the MR file.

At the same time it should be said that the estimates will not be very sensitive to which weight you use. Your conclusions will probably be robust with respect to the choice of weight. It's just considered to be "best practice."

]]>

I spent some time looking into your question but can't provide much help. Here are some thoughts.

First, wealth scores such as sv271 are household-specific and are constructed with the HR file. Then in the PR and other individual-level files they are exactly the same for everyone in the same household. When you calculate the fractional rank, using the PR file, you are basically dividing the household's rank by the number of people in the household. I don't know why you would do that. It would seem better to me to use the HR file and skip the calculation of the fractional rank.

Second, I don't know why you would expect the mean of the fractional rank to be 0.5. Is there a mathematical reason for this? Your formula for the fractional rank is not clear to me but I don't see a mathematical reason why the mean would be 0.5.

]]>

where j varies from 0 to i-1. for the same, I applied the following commands, but the mean of the fractional rank is not exactly 0.5. It is 0.4857. sv271 is the wealth index factor score for state-level studies. My study is on the Indian state of Punjab. I am following the world bank document "Analyzing health equity using household survey data" for your reference.

sort sv271s

egen raw_rank=rank(sv271s), unique

sort raw_rank

qui sum wgt_shweight_PR

gen wi = wgt_shweight_PR/r(sum)

gen cusum = sum(wi)

gen wj= cusum[_n-1]

replace wj=0 if wj==.

gen rank_CE=wj+0.5*wi

here wgt_shweight_PR is generated so that the mean is equal to 1. below are the commands used to normalize the weights in PR file:

gen unwtd=1000000

total unwtd shweight

matrix B=e(b)

matrix list B

scalar sfactor=B[1,1]/B[1,2]

scalar list sfactor

gen shweight_PR=round(sfactor*shweight)

gen wgt_shweight_PR= shweight_PR/1000000

Please let me know where I am going wrong.

]]>

I have much clearer vision, and Your advice will be taken into consideration.

]]>

My concerns are:

1. Can we use this sample and still get results that are representative?

2. The weights that need to be used is men's weights or women's weight as mentioned earlier they have low response rates than women's.

Kindly help me out.]]>