Interpolated surfaces [message #24651] |
Tue, 14 June 2022 12:58 |
David34
Messages: 20 Registered: March 2022
|
Member |
|
|
Hello,
Am I correct that in order to do GIS analysis I would need to first calculate the weighted (after svy-setting in Stata) proportion of my variable of interest e.g. prevalence of variable xyz (coded as: 1=yes/0=no), and then with cluster number and geographic coordinate data, merge in a statistical program?
So, once I have the weighted proportion of my variable of interest, I calculate proportions by cluster (PSU) number and then join the proportions with the cluster, using the following code:
.svy: prop variableXYZ by v021
Thank you
|
|
|
|
|
|
|
Re: Interpolated surfaces [message #24801 is a reply to message #24790] |
Wed, 13 July 2022 09:31 |
Janet-DHS
Messages: 921 Registered: April 2022
|
Senior Member |
|
|
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:
The following Stata lines should get you started. I suggest restricting to children who are living with their mothers (b9=0) because the reporting is much more accurate for them. For the child illness variables (h11, h22, h31), treat responses 1 or 2 as "yes". The label for h11 is usually H11 but sometimes it is h11; "describe h11" will give you the label. You do not need to do anything more with the weights; svyset and svy will re-normalize them. Other variables of interest will be dropped in the "collapse" but you can keep cluster-level variables like v005, v024, v025, etc., by adding them into the collapse statement after "(first)". I have included "diarrhea" in variable names because you might want to do something similar for other illnesses or outcomes and you need a notation to distinguish between them. You could use "h11" rather than "diarrhea". Hope this works for you.
describe h11
tab h11
label list H11
gen nch_diarrhea_yes=1 if b9==0 & (h11==1 | h11==2)
gen nch_diarrhea_no =1 if b9==0 & h11==0
collapse (first) v005 (sum) nch_diarrhea*, by(v001)
gen nch_diarrhea=nch_diarrhea_yes + nch_diarrhea_no
gen prop_diarrhea=nch_diarrhea_yes/nch_diarrhea
gen wt_diarrhea=v005*nch_diarrhea
summarize
histogram prop_diarrhea
|
|
|
Re: Interpolated surfaces [message #24803 is a reply to message #24801] |
Wed, 13 July 2022 17:14 |
David34
Messages: 20 Registered: March 2022
|
Member |
|
|
Thank you so very much indeed for the generous help with the Stata code, I am very grateful! But I have few follow up questions please:
1 Could you help me better understand the statement "You do not need to do anything more with the weights; svyset and svy will re-normalize them" please? Does that mean, I would need to apply svyset and svy after running this Stata code to get weighted percentage of diarrhea at the cluster level?
2 After running this Stata code, I ended up with the variables: v001, v005, nch_diarrhea_yes, nch_diarrhea_no, nch_diarrhea, prop_diarrhea, and wt_diarrhea. I am somewhat unsure here as to which variable I would need to merge with GPS coordinates please? In order to have weighted percentage/roportions of diarrhea at the cluster (v001) level, should I use the newly created variable 'prop_diarrhea', and merge GPS coordinates to those clusters. Or do I need to first svyset the datafile to get the weighted proportions/percentage of diarrhea? And if so, how do I need to do that using Stata please?
3 Am I correct that the variable 'nch_diarrhea' was created to correctly calculate the proportion of diarrhea at the cluster level? But 'prop_diarrhea' is probably not the weighted proportion of diarrhea at the cluster level?
4 In this Stata code, is there a need to divide the weight with 1000000 i.e. v005 / 1000000? Or is it not necessary?
5 The newly created variable 'wt_diarrhea' gives the weighted number of children at the cluster level. However, for creating interpolated surface, I would only need the weighted proportion/percentage of children with diarrhea at the cluster level? Am I correct here?
Thanks again, and please guide.
[Updated on: Wed, 13 July 2022 17:25] Report message to a moderator
|
|
|
Re: Interpolated surfaces [message #24834 is a reply to message #24803] |
Mon, 18 July 2022 10:08 |
Janet-DHS
Messages: 921 Registered: April 2022
|
Senior Member |
|
|
Following is a response from DHS Research & Data Analysis Director, Tom Pullum:
The new (collapsed) file will have clusters as units. The two crucial variables are prop_diarrhea and wt_diarrhea. Prop_diarrhea would be the outcome variable and wt_diarrhea would be the weight for each cluster. It is a combination of the sampling weight for the cluster (every child in a cluster has the same value of v005) and the number of children for whom the individual-level outcome could be assessed. You don't need to do anything else with the weights, such as divide by 1000000. If you stay within Stata, and use pweights, Stata will renormalize so the mean weight is 1 for each unit (i.e. for each cluster).
I would expect you to carry along, in the collapse, other cluster-level variables such as place of residence (the same for everyone in a cluster), perhaps something related to wealth quintile, education level of the mothers, etc. You could have additional outcome variables, such as fever or cough, but because they would have different numbers of children in the denominator, they would have their own weights.
You would use the cluster GIS codes to merge with a file of cluster-level characteristics that are external to the DHS data. The clusters as geographic data points would be the basis for the interpolated surface. Good luck.
|
|
|
|
|
|
|
|