Home » Data » Sampling » Stratification and sampling in Haiti
Stratification and sampling in Haiti [message #10888] |
Wed, 28 September 2016 17:20 |
acolombo
Messages: 5 Registered: September 2016
|
Member |
|
|
Dear all,
I have some questions about the stratification and sampling strategy, and the conclusions I could draw from it, used for the 2005-06 DHS in Haiti. It looks a little bit different from the procedures I am used to.
The message will be a bit long, and I already apologize for that, but context and details are fundamental, I believe. Everything is based on the Household Recode dataset (hthr52dt).
In Haiti there are 10 departments (ADM1) and one metropolitan area (only urban). The stratification occurs at this level: each department is divided into urban and rural strata. In total there are 21 strata. For those interested and french-speaking (I can provide a translation in case), the text in the appendix of the final report explains:
L'échantillon de l'EMMUS-IV est un échantillon stratifié représentatif au niveau national tiré à deux degrés. Les onze départements sont stratifiés en parties urbaine et rurale pour former les strates d'échantillonnage. L'Aire Métropolitaine n'a qu'une partie urbaine. Donc, au total 21 strates d'échantillonnage ont été créées. L'échantillon au premier degré a été tiré indépendamment dans chaque strate, et l'échantillon au second degré a été tiré indépendamment dans chaque unité primaire tirée au premier degré.
The sampling strategy, instead, is the following. There are two stages: in the first stage a total of 339 clusters are selected from the strata proportionally to the number of household they host. This means that for each strata, highly populated clusters are oversampled. In the second stage, a "fix" number of households (tirage systematique a probabilite egale) is picked: 26 households from urban clusters and 34 households from rural clusters.
Now the questions:
1. why, if there are 21 strata, the variable hv022 (sample stratum number) assumes values from 1 to 163?
2. Given this sampling framework, can I:
- infere the proportion of urban population per region (by computing the share of urban households in one region)?
- the proportion of people with access to electricity for
- the whole country
- each region
- for urban and rural population in each region
I would be really grateful if you could help me, as I've been breaking my head on this dilemma for several days.
Thanks,
Andrea
[Updated on: Wed, 28 September 2016 17:21] Report message to a moderator
|
|
|
|
|
|
Re: Stratification and sampling in Haiti [message #10913 is a reply to message #10899] |
Wed, 05 October 2016 06:02 |
Bridgette-DHS
Messages: 3199 Registered: February 2013
|
Senior Member |
|
|
Here is a response from Trevor Croft and Tom Pullum:
Your first question is why, if there are 21 strata, does the variable hv022 (sample stratum number) assume values from 1 to 163? Here is a more complete answer. DHS previously used a procedure of constructing implicit strata (163 of them, for this survey) based on pairing (or in some cases groups of 3) clusters. These implicit strata were constructed within the explicit strata (the 21 strata) and were used to calculate sampling errors. DHS stopped using this procedure some years ago, but the dataset includes the constructed implicit strata variable.
Your question about representativeness below the national level often comes up. At the stratum level, there isn't really an issue. In essence, separate samples have been drawn within each stratum. Small strata tend to be over-sampled (conversely, large strata tend to be under-sampled) in order to have enough cases to be able to make good estimates of key indicators. "Representative" has two dimensions--bias and statistical uncertainty. Stratum level estimates are unbiased and have reasonable standard errors. Yes, and you can compare strata with one another as you described (the urban and rural parts of the same region), but you should check for statistical significance. If you go below the stratum level, for example to the second administrative level, generically called districts, the estimates are still unbiased, but the standard errors go way up. It is very important to include standard errors for these lower-level estimates, just as you would for categories of a covariate at the national level. If you compare two districts within the same region, it can be difficult to get a statistically significant difference because both estimates have high standard errors.
|
|
|
|
|
Goto Forum:
Current Time: Sat Nov 23 16:28:31 Coordinated Universal Time 2024
|