Weights and Strata for Pooled Samples [message #19761] |
Mon, 10 August 2020 12:27 |
Yawo
Messages: 45 Registered: February 2019
|
Member |
|
|
Hello,
I've used IPUMS to create a combined dataset for men and women for 23 African countries.
As the listing below shows, the weights are not constant within each PSU as they should be. In fact we have two different weights within each PSU, one for males and one for females.
list sample sex weight weight2 psupool stratapool in 115431/115435
+----------------------------------------------------------------+
| sample sex weight weight2 psupool strata~l |
|----------------------------------------------------------------|
115431. | Lesotho 2014 male .515554 1 4753 336 |
115432. | Lesotho 2014 female .48581 1 4753 350 |
115433. | Lesotho 2014 female .48581 1 4753 350 |
115434. | Lesotho 2014 male .515554 1 4753 336 |
115435. | Lesotho 2014 female .48581 1 4753 350 |
|----------------------------------------------------------------|
This issue has been the source of "weights not constant within PSU" error I have been receiving when running svy: melogit models.
However, I thought I can adjust the strata and the psu by taking gender into account. Doing so results in constant weights within each psu, as the code and listing below shows.
egen psupool= group(idhspsu sex)
egen stratapool= group(strata sex)
list sample sex idhspid idhspsu idhsstrata psupool stratapool weight in 10/15
+----------------------------------------------------------------------------------------------------------+
| sample sex idhspid idhspsu idhsstrata psupool strata~l weight |
|----------------------------------------------------------------------------------------------------------|
10. | Angola 2015 male 2401 00010001 3 2401000001 2401000018 1 321 .979475 |
11. | Angola 2015 male 2401 00010012 3 2401000001 2401000018 1 321 .979475 |
12. | Angola 2015 male 2401 00010026 1 2401000001 2401000018 1 321 .979475 |
13. | Angola 2015 female 2401 00010001 02 2401000001 240100018 2 18 1.001989 |
14. | Angola 2015 female 2401 00010002 03 2401000001 240100018 2 18 1.001989 |
|----------------------------------------------------------------------------------------------------------|
15. | Angola 2015 female 2401 00010002 02 2401000001 240100018 2 18 1.001989 |
+----------------------------------------------------------------------------------------------------------
My question:
1. Is this necessary to adjust the strata, psu and even the weights when we append male and female data into one file.
2. And if so, is my <egen> code the right way to readjust the strata. If not, is there any other way?
3. I was able to run my melogit models successfully after applying the adjustment to the strata and the psu, but again, I want to be sure the adjustment is ok, else I would have to revert to running separate models for men and women.
thanks - cY
|
|
|
Re: Weights and Strata for Pooled Samples [message #19762 is a reply to message #19761] |
Mon, 10 August 2020 14:57 |
Trevor-DHS
Messages: 802 Registered: January 2013
|
Senior Member |
|
|
Unfortunately you can't just merge the datasets together without adjusting the sample weights. The sample weights are relative weights and are normalized separately to the total number of women and total number of men in the sample. For example for Lesotho DHS 2014, 6621 women were interviewed, but only 2931 men (and that includes men age 15-59, not just 15-49). In Appendix A, it states that "In addition, in a subsample of households (every second household), all men age 15-59 who were usual residents of the households or stayed in the households on the night before the interview were eligible for interview". If you used the merged data without adjusting the weights you would be assuming that there were more than twice as many women in Lesotho than men!
Typically this adjustment is made by applying a constant factor to the weights for women and the weights for men. These constant factors are taken by using estimates of the male and female population age 15-49 in Lesotho from some external source, such as the UN's World Population Prospects or from census data, and dividing by the total sample size for women and for men age 15-49 from the survey.
This same issue applies when you pool data from multiple surveys or multiple countries, each of which have their own weights that will need adjusting.
As for the code adjusting the PSU and stata, I don't know melogit, but I cannot think of another solution if you need to include both women and men in the same model. Can you run separate models for women and for men? Otherwise I don't have a better option.
|
|
|
|
|
|
|