This query goes beyond what it is appropriate for DHS staff to answer. We can answer questions about the data sets or matching DHS estimates, but we cannot give advice on model specification. I hope other forum users can help.

]]>

Regards

Preshit]]>

For our analysis, we are using women's information, women's husband information and women'd children information. We are pooling those information from differenr countries from different regions, therefore we are using denormalised weights as instructed. Right now we are only using the women's denormalised weight. SHould we be doing anything else as we are combining information about the men (the women's husband)?

Thank you for your help!

]]>

I just merged the IR and PR files for the survey. There are 699,686 women in the IR file and all of them are in the PR file. The correlation between v005 and hv005 is 0.9982. Switching from v005 to hv005 for your weights cannot affect the problem you are having. I doubt that the problem has anything to do with the weights. Have you tried the decomposition without using weights at all? That's something to check.

In the PR file, the RSBY variable is sh55d. It is binary (0/1). I have not used the Fairlie decomposition but I see that it is specifically for binary variables. The caste variables, sh35 and sh36, respectively, have 4 and 5 categories, respectively. There are 4x5=20 combinations. There are 20x2 combinations with sh55d, and I see that none of them are empty, so your problem is not due to empty cells.

Is the Fairlie method designed for categorical predictors? Could that be the problem?

My recommendation, whenever you have an analytical problem, is that you simplify the data setup as much as possible. But a bigger question is whether you need to apply the Fairlie method, or any other decomposition method, in this setup. You can easily identify which categories or combinations of sh35 and sh36 have high enrollment in RSBY and which ones have low enrollment, just using cross-tabulations. If you want to account for other variables, you can use logit regression. I wonder what other forum users would suggest.

]]>

My analysis is confined only to observations which are present in the IR file i.e. eligible women. I wish to conduct my decomposition analysis using sample weights now. I have gone through previous forum postings and web-pages and YouTube videos posted; however, I am still not clear which weights to use in my case. The closest answer I could get is posted in the following link (screen-shot is attached as well)-https://userforum.dhsprogram.com/index.php?t=msg&g oto=16014&S=Google

As suggested by the Statisticians at the DHS program in the above link, I used weights from IR file(v005) as my model included few variables from IR dataset. However, I am getting weird results using these weights and the group difference explained by variables included in the model is going above 100% which doesn't make any sense. My question is that the pweight created using v005 is still correct weight in my case or I should be using some different weight for analysis? e.g. hv005?

Thank you so much in advance for your help.]]>

About the model

My dependent variable is neghaz (negative of height for age (cm/months)) which is continuous in nature. My regression specification includes several control variables including square terms and interaction terms. The specification also includes variables that have been calculated at PSU level (Mean Employment Rate in the Cluster, etc) and also variables at country level (GDP, Average Life Expectancy etc.). I have already de-normalized the weights.

Issue

I am trying to evaluate the following 3 level hierarchical model (respondents <- clusters <- surveys)

mixed neghaz $controlset [pw=weight] || psu: || survey:

Survey represents each of the 95 samples in the data.

The model failed to converge. After that I tried a null model. The null model also failed to converge. I am not able to understand why null model fails to converge when there are 95 surveys and every survey has 300 clusters at least.

I also tried the null model after converting neghaz in to a dichotomous variable (xtmelogit) stunted which takes the value 1 if the child is stunted. The convergence failed again. Can somebody help me to understand why is this happening and how to fix it?

Afterwards, I tried running 2 level models with PSUs and Surveys independently. The models worked with the full controlset. However, the standard errors were different in the two models.

ICC for model with PSU 0.98

ICC for model with survey 0.02

Can I safely neglect the survey random effects in this case?

Is there any other way of combining the survey effects along with the PSU random effects?

I also tried models with only survey fixed effects (i.survey with normal ols) However, the standard errors were different. What model shall i finally use in such a case?

Sorry for the long post.

]]>

I have some follow up questions. Sorry for the trouble but this is my first time with pooled data analysis.

1. My analysis has countries with widely varying no. of observations. For Ex:- Indian DHS 4 has around 200000 valid observations while some African country might only have around a few thousand observations. Is the weight perweight adjusted/applicable for such analysis? I read on some post in this forum that combining bigger surveys with smaller ones might give biased coefficients.

2. Regarding the clustering I got 2 different recommendations. As you can see Tom and Mahmoud suggested using a fixed effects model by adding a "i.survey" variable to my regression specification. On the other hand you have suggested a multilevel model. Can you please elaborate which model will be more suited for this analysis and how do i decide which one to use. I am sorry but i have never used multilevel models earlier.

]]>

Thanks for the question. Tom and Mahmoud's answer works with the regular DHS. Since you also posted this query on the IPUMS DHS user forum as well, here's a similar response that uses the already-integrated data of IPUMS-DHS.

1. Can i directly use the idhspsu variable to create the requite variables for the surveys where cluster and PSU are not the same?

Yes.

2. Do i need to use idhsstrata variable while using svyset command?

Yes. svyset would still perform the weighted estimate you do not specify the strata, but the standard errors will be wrong.

To weight IPUMS-DHS data in Stata, the command is:

svyset [pw=perweight], psu(idhspsu) strata(idhsstrata)

This establishes the weights in Stata; they are then applied to relevant commands by putting "svy:" at the beginning, such as:

svy: regress y x

svy: mean(y), over(x)

3. If yes, how do i deal with missing strata information?

This Forum has information on how to construct strata variables when they are missing. Fundamentally, it depends on the sampling design (which you can find in the appendices to the final reports). If the sample was stratified across urban/rural areas (typical), you can replace the strata variable (idhsstrata) with the urban/rural variable (urban).

4. Can i directly use the weight perweight for this analysis?

Yes.

5. In one post on this forum i read that in multi country analysis data must be clustered at country level. Do i need to do that for this analysis. If yes, how do i cluster data at two different levels i.e., country level and then individual psu level?

Whether it's necessary to cluster at the country level, the cluster level, or both depends on how much of the variation in your dependent variable is explained by these spatial variations. You can calculate this by running a null model, e.g.:

logit

estat icc

If the rho is large (greater than 0.15 or so), then a mixed or multilevel model is appropriate. I've seen people cluster at the country, region, and psu level. These days, the psu level seems to be more common.

If the analysis combines only a few countries, then a dummy variable for each country except one is probably the best approach, and there would be no need to cluster at the country level. To cluster a multiple levels, here are the commands:

regress

]]>

1. For certain countries cluster and PSU variables are not the same. In such cases which variable shall I use to create the otherwise PSU level variables.

2. Do I need to specify the strata variable while using svyset command in Stata. If yes, how do i deal with missing strata information.

3. I have already de-normalized the weights as suggested in earlier posts on this forum. Do I need to re-normalize the weights before using them? If yes, how shall I do it?

4. In one post on this forum I read that in multi country analysis data must be clustered at country level. Do I need to do that for this analysis? If yes, how do I cluster data at two different levels i.e., country level and then individual PSU level?

Someone else could prefer random effects for surveys, especially if there are MANY surveys in the analysis. That would require a 3-level hierarchical model (for respondents / clusters / surveys).

When combining surveys, the strata and cluster codes but be unique. For example, you do not want cluster 1 in survey 1 to be confused with cluster 1 in survey 2. For example, you could have "egen cluster_id=group(survey hv021)" and "egen stratum_id=group(survey hv022)."

I have some questions regarding the sampling and weighting.

1. For certain countries cluster and PSU variables are not the same. In such cases which variable shall I use to create the otherwise PSU level variables.

2. Do I need to specify the strata variable while using svyset command in Stata. If yes, how do i deal with missing strata information.

3. I have already de-normalized the weights as suggested in earlier posts on this forum. Do I need to re-normalize the weights before using them? If yes, how shall I do it?

4. In one post on this forum I read that in multi country analysis data must be clustered at country level. Do I need to do that for this analysis? If yes, how do I cluster data at two different levels i.e., country level and then individual PSU level? ]]>