The DHS Program User Forum
Discussions regarding The DHS Program data and results
Home » Data » Weighting data » Weighting data after merging survey rounds with different levels of representation
Re: Weighting data after merging survey rounds with different levels of representation [message #11576 is a reply to message #11575] Wed, 11 January 2017 21:20 Go to previous messageGo to previous message
jswindle is currently offline  jswindle
Messages: 5
Registered: September 2016
Member
Hello Tom and Bridgett,

Thank you for your very helpful and prompt reply.

I ran the code you shared and got some results that surprised me. When I ran the final command of "tab strata survey, table clean" I got an error message saying that I could not use those options. When I instead ran "tab strata survey", I got these interesting results:


tab strata survey

group(surv
ey
strata_tem survey
p) 1 2 3 Total

1 226 0 0 226
2 416 0 0 416
3 607 0 0 607
4 320 0 0 320
5 17 0 0 17
6 253 0 0 253
7 190 0 0 190
8 470 0 0 470
9 27 0 0 27
10 447 0 0 447
11 767 0 0 767
12 174 0 0 174
13 556 0 0 556
14 172 0 0 172
15 438 0 0 438
16 433 0 0 433
17 611 0 0 611
18 187 0 0 187
19 459 0 0 459
20 195 0 0 195
21 340 0 0 340
22 800 0 0 800
23 105 0 0 105
24 121 0 0 121
25 450 0 0 450
26 331 0 0 331
27 186 0 0 186
28 193 0 0 193
29 28 0 0 28
30 188 0 0 188
31 435 0 0 435
32 185 0 0 185
33 239 0 0 239
34 67 0 0 67
35 22 0 0 22
36 591 0 0 591
37 193 0 0 193
38 787 0 0 787
39 95 0 0 95
40 614 0 0 614
41 285 0 0 285
42 0 420 0 420
43 0 283 0 283
44 0 47 0 47
45 0 850 0 850
46 0 40 0 40
47 0 732 0 732
48 0 81 0 81
49 0 693 0 693
50 0 263 0 263
51 0 690 0 690
52 0 78 0 78
53 0 625 0 625
54 0 31 0 31
55 0 789 0 789
56 0 101 0 101
57 0 705 0 705
58 0 307 0 307
59 0 403 0 403
60 0 42 0 42
61 0 735 0 735
62 0 230 0 230
63 0 3,553 0 3,553
64 0 0 92 92
65 0 0 754 754
66 0 0 825 825
67 0 0 318 318
68 0 0 33 33
69 0 0 789 789
70 0 0 35 35
71 0 0 786 786
72 0 0 60 60
73 0 0 718 718
74 0 0 45 45
75 0 0 821 821
76 0 0 32 32
77 0 0 781 781
78 0 0 138 138
79 0 0 650 650
80 0 0 76 76
81 0 0 832 832
82 0 0 480 480
83 0 0 646 646
84 0 0 53 53
85 0 0 723 723
86 0 0 55 55
87 0 0 746 746
88 0 0 44 44
89 0 0 786 786
90 0 0 41 41
91 0 0 823 823
92 0 0 127 127
93 0 0 668 668
94 0 0 197 197
95 0 0 755 755
96 0 0 29 29
97 0 0 706 706
98 0 0 42 42
99 0 0 778 778
100 0 0 70 70
101 0 0 747 747
102 0 0 81 81
103 0 0 737 737
104 0 0 63 63
105 0 0 831 831
106 0 0 37 37
107 0 0 782 782
108 0 0 35 35
109 0 0 767 767
110 0 0 90 90
111 0 0 761 761
112 0 0 66 66
113 0 0 723 723
114 0 0 85 85
115 0 0 778 778
116 0 0 137 137
117 0 0 746 746

Total 13,220 11,698 23,020 47,938


The part of these results that I found surprising is that the number of strata per survey vary in strange way. There are 41 categories for 2000, 22 categories for 2004, and 54 categories for 2010. The result for 2010 makes sense; there were 27 districts and when stratified by urban/rural you get 54. The result for 2004, I believe comes from 11 districts categories stratified by urban/rural; those 11 district categories are the ten largest districts that were sampled in a representative manner and then there is one big catch-all for the other 17 districts, hence the huge total of 3,553 respondents in the catch-all rural category (at least that is my guess). The 2000 results are perplexing. From what I can gather in the final report for the 2000 Malawi DHS, the sampling was done in the same manner as the 2004 survey, so I'm not sure why there are 41 categories here. Thoughts?

Once I have calculate the strata correctly, would the rest of this code (pasted below) work to appropriately survey set the data?

generate weight = v005/10000000
egen clusters=group(survey v021), label
svyset clusters [pweight=weight], strata(strata) singleunit(centered)

Or would you simply do:

generate weight = v005/10000000
svyset [pweight=weight], psu(v021) strata(strata)

In case it is relevant for deciding how to svyset the data, my ultimate goal is to do a three-level mixed effects model with the higher orders being the districtyear and district variables.

A final issue I am facing if I do this sort of mixed effects model is whether the 2000 and 2004 data from the 17 districts that are not sampled sufficiently to be representative could be appropriately incorporated into such a model. I realize that is outside the purvue of the DHS surveys, but I'm guessing you have faced these types of issue before in your own research. Any thoughts?

thank you kindly,
Jeff
 
Read Message
Read Message
Read Message
Read Message
Previous Topic: Post-stratification for DHS data
Next Topic: Disaggregating to lower Administrative Division
Goto Forum:
  


Current Time: Thu Jul 18 09:39:40 Coordinated Universal Time 2024