Home » Data » Weighting data » Sample Weight for Merged Dataset (Which sample weight, strata, and cluster variables should be used to create an SPSS Complex Samples Plan File)
Sample Weight for Merged Dataset [message #23736] |
Sat, 20 November 2021 16:39 |
Helen
Messages: 2 Registered: November 2021
|
Member |
|
|
Hi, I will appreciate your expert advice about the sample weight, strata, and cluster variables that should be used to create an SPSS complex Samples analytical plan file. I have merged the male and female datasets from the eight sub-Saharan countries that I am including in my study. I merged the sample weight variables, strata, and PSU variables for the male and female datasets. I am worried that there is no difference between the population estimate and unweighted counts from Complex samples frequencies. Here are the syntaxes I used to create the Complex Samples plan file. I have also attached the SPSS output for the frequencies.
Frequencies Variables=V005 MV005 GENDER.
Compute totalsamplewegiht=0.
If (GENDER=0) totalsampleweight=V005.
If (GENDER=1) totalsampleweight=MV005.
Execute.
Frequencies Variables=totalsampleweight V005 MV005 GENDER.
COMPUTE WGT= totalsampleweight/1000000.
* Analysis Preparation Wizard.
CSPLAN ANALYSIS
/PLAN FILE='D:\Angola\Angola_Complex_samples_File2.csaplan'
/PLANVARS ANALYSISWEIGHT=WGT
/SRSESTIMATOR TYPE=WR
/PRINT PLAN
/DESIGN STRATA=V023Merged CLUSTER=V021Merged
/ESTIMATOR TYPE=WR.
|
|
|
|
Re: Sample Weight for Merged Dataset [message #23749 is a reply to message #23747] |
Tue, 23 November 2021 15:38 |
Helen
Messages: 2 Registered: November 2021
|
Member |
|
|
Hi,
Thank you very immensely for the expedited response. Yes, I was referring to the weighted counts. The similar counts in the SPSS output now make more sense.
To clarify further, I plan to examine outcomes for all and also carry out a disaggregated analysis for males and females. So, I renamed the variables of interest in both data files, with similar data characteristics, created a gender variable in each dataset, and then merged (appended ?). Sure, I will appreciate more information about renumbering the cluster and stratum variables in SPSS. Also, I am curious to learn the available options for handling the sample weights.
Take care.
|
|
|
Re: Sample Weight for Merged Dataset [message #23753 is a reply to message #23749] |
Wed, 24 November 2021 13:11 |
Bridgette-DHS
Messages: 3199 Registered: February 2013
|
Senior Member |
|
|
Following is another response from DHS Research & Data Analysis Director, Tom Pullum:
First, regarding the re-numbering of v021 and v023. Here's a clumsy way to do it. Say that your surveys are numbered 1 through 8, and that the codes for v021 and v023 are always less than 1,000 (you have to check that, and modify if there are exceptions). Then you can construct new variables, cluster_id and stratum_id, defined with cluster_id=survey*1000+v021 and stratum_id=survey*1000+v023. These new variables will have unique numbers for the clusters and strata in the pooled data file.
If you leave the weights as they are now, then the total weight for each survey will just be the total sample size for that survey. This is probably the least acceptable thing to do, because sample sizes are determined by many different arbitrary considerations, such as the budget for the survey. There are two alternatives. One is to multiply v005 by a scaling factor such that the total weight for a survey becomes proportional to the population of the country at the time of the survey. The UN Population Division website gives population estimates. But if you do this, you will find that the results are dominated by the largest country, sometimes overwhelmingly. The second alternative is to re-weight with a survey-specific factor that gives equal weight to each country. I personally prefer this but not everyone does. There is a larger problem with pooled estimates. The surveys were done at different times and we almost never cover all the countries in a region or even a sub-region. Within DHS, we only construct a pooled estimate with many countries when studying something that is relatively rare within the individual countries. (Pooling successive surveys from the same country is more defensible.)
A colleague, Mahmoud Elkasabi, points out that men are often subsampled, and they often have a different age range than women. You need to take that into account. If the men have been subsampled, you need to scale up mv005. For example, it's a 50% subsample, multiply mv005 by 2. If you explore the forum, you should be able to find old postings that discuss all these issues. Good luck.
|
|
|
Goto Forum:
Current Time: Tue Nov 26 23:27:50 Coordinated Universal Time 2024
|