When to use iweight and pweight in stata [message #13156] |
Fri, 29 September 2017 09:15 |
chikhungulana
Messages: 7 Registered: April 2016 Location: Southampton
|
Member |
|
|
Hello
I recently encountered differences in statistical significance for my SPSS and STATA analysis. One explanation is that I am using different types of weights. In SPSS I simply click on weight by the Sampleweight variable but in STATA I was using svyset [pweight=Sampleweight]. I see that the DHS manual recommends one to use iweight, but how can one use iweight for a cross tabulation and regression modelling?
Your help will be greatly appreciated.
Kind regards
Lana
|
|
|
Re: When to use iweight and pweight in stata [message #13189 is a reply to message #13156] |
Mon, 02 October 2017 14:27 |
Bridgette-DHS
Messages: 3199 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Stata Specialist, Tom Pullum:
My rule is to always use pweight if it is accepted. Unfortunately there are some commands in Stata, such as tabulate and summarize, that will not accept pweight. Those commands will accept iweights, and for them I will use, say, iweight=v005/1000000. The division by 1,000,000 will give weights with an average value of 1. But if you want to use tabulate with an option such as chi2, you can't. Even if you use svyset and pweight, you cannot do tabulate and chi2. So far as I know, virtually all of the estimation commands will accept pweights. (There are some esoteric exceptions and I expect them to evolve to accept pweights in the future.)
You say that "the DHS manual recommends one to use iweight". Which manual are you referring to? I cannot find that recommendation.
I hope other users will add suggestions.
|
|
|
|
|
|
Re: When to use iweight and pweight in stata [message #13197 is a reply to message #13193] |
Mon, 02 October 2017 18:33 |
Reduced-For(u)m
Messages: 292 Registered: March 2013
|
Senior Member |
|
|
You are clustering your standard errors in Stata, because it says the number of clusters in your output. You probably set that up with your "svyset" command (not on printout how you set up svyset, but I'm guessing you set the PSU as a "cluster" which is what the DHS recommends).
But I'll bet your SPSS code does not do this. Essentially, "clustering" will allow your confidence intervals/p-values to account for within-group correlations in error terms (because people in the same village are more similar to each other than two total strangers). The result is that you have to inflate your standard errors (make your p-values bigger) to account for the within-cluster similarities among people, otherwise you get p-values that are much too small relative to the "true" value.
So I think the difference is that in Stata you are (correctly) allowing for correlations among error terms and within-cluster heteroskedasticity (via your svyset command that references the PSUs) and this is generating (appropriately) larger standard errors and p-values than your SPSS code which is treating each observation as independent.
I'd look up "clustered standard errors" in SPSS, add that to your code (clustering at the PSU level) and I suspect at that point your p-values will become much closer (that is, the SPSS estimates will look more like the Stata estimates). I don't actually know how to do it in SPSS, but it should be relatively straight forward.
Again - this is not about weighting variables, that only affects point estimates. This is about accounting for clustering in your standard error/p-value calculations.
|
|
|
|
Re: When to use iweight and pweight in stata [message #13261 is a reply to message #13199] |
Tue, 10 October 2017 10:34 |
Bridgette-DHS
Messages: 3199 Registered: February 2013
|
Senior Member |
|
|
A response from DHS Stata Specialist, Tom Pullum:
Below I will insert some Stata lines that you can run after you have opened a KR file. If you want to adjust for weights, clustering, and stratification in a table, the best way (in my opinion!) is with logit (if one of the variables is binary) or mlogit (if both variables have more than two categories). You cannot get a chi square, but you can get the p-value for an F, which will be an equivalent test of the significance of the association. Note that the correspondence is with a likelihood ratio chi-square, rather than a Pearson chi-square. The correspondences between the following approaches with Stata add to my confidence in how Stata handles weights. You could check whether you get the same correspondences with SPSS.
gen stunted=.
replace stunted=0 if hw70<600
replace stunted=1 if hw70<-200
gen age=b8
replace age=. if v008-b3<6
tab stunted age, lrchi2
scalar pvalue=r(p_lr)
scalar list pvalue
logit stunted i.age
scalar pvalue=e(p)
scalar list pvalue
svyset v021 [pweight=v005], strata(v022) singleunit(centered)
svy: logit stunted i.age
scalar pvalue=e(p)
scalar list pvalue
|
|
|