Home » Data » Weighting data » When to use iweight and pweight in stata
When to use iweight and pweight in stata [message #13156] 
Fri, 29 September 2017 09:15 
chikhungulana
Messages: 7 Registered: April 2016 Location: Southampton

Member 


Hello
I recently encountered differences in statistical significance for my SPSS and STATA analysis. One explanation is that I am using different types of weights. In SPSS I simply click on weight by the Sampleweight variable but in STATA I was using svyset [pweight=Sampleweight]. I see that the DHS manual recommends one to use iweight, but how can one use iweight for a cross tabulation and regression modelling?
Your help will be greatly appreciated.
Kind regards
Lana



Re: When to use iweight and pweight in stata [message #13189 is a reply to message #13156] 
Mon, 02 October 2017 14:27 
BridgetteDHS
Messages: 1416 Registered: February 2013

Senior Member 


Following is a response from Senior DHS Stata Specialist, Tom Pullum:
My rule is to always use pweight if it is accepted. Unfortunately there are some commands in Stata, such as tabulate and summarize, that will not accept pweight. Those commands will accept iweights, and for them I will use, say, iweight=v005/1000000. The division by 1,000,000 will give weights with an average value of 1. But if you want to use tabulate with an option such as chi2, you can't. Even if you use svyset and pweight, you cannot do tabulate and chi2. So far as I know, virtually all of the estimation commands will accept pweights. (There are some esoteric exceptions and I expect them to evolve to accept pweights in the future.)
You say that "the DHS manual recommends one to use iweight". Which manual are you referring to? I cannot find that recommendation.
I hope other users will add suggestions.






Re: When to use iweight and pweight in stata [message #13197 is a reply to message #13193] 
Mon, 02 October 2017 18:33 
ReducedFor(u)m
Messages: 290 Registered: March 2013

Senior Member 


You are clustering your standard errors in Stata, because it says the number of clusters in your output. You probably set that up with your "svyset" command (not on printout how you set up svyset, but I'm guessing you set the PSU as a "cluster" which is what the DHS recommends).
But I'll bet your SPSS code does not do this. Essentially, "clustering" will allow your confidence intervals/pvalues to account for withingroup correlations in error terms (because people in the same village are more similar to each other than two total strangers). The result is that you have to inflate your standard errors (make your pvalues bigger) to account for the withincluster similarities among people, otherwise you get pvalues that are much too small relative to the "true" value.
So I think the difference is that in Stata you are (correctly) allowing for correlations among error terms and withincluster heteroskedasticity (via your svyset command that references the PSUs) and this is generating (appropriately) larger standard errors and pvalues than your SPSS code which is treating each observation as independent.
I'd look up "clustered standard errors" in SPSS, add that to your code (clustering at the PSU level) and I suspect at that point your pvalues will become much closer (that is, the SPSS estimates will look more like the Stata estimates). I don't actually know how to do it in SPSS, but it should be relatively straight forward.
Again  this is not about weighting variables, that only affects point estimates. This is about accounting for clustering in your standard error/pvalue calculations.




Re: When to use iweight and pweight in stata [message #13261 is a reply to message #13199] 
Tue, 10 October 2017 10:34 
BridgetteDHS
Messages: 1416 Registered: February 2013

Senior Member 


A response from DHS Stata Specialist, Tom Pullum:
Below I will insert some Stata lines that you can run after you have opened a KR file. If you want to adjust for weights, clustering, and stratification in a table, the best way (in my opinion!) is with logit (if one of the variables is binary) or mlogit (if both variables have more than two categories). You cannot get a chi square, but you can get the pvalue for an F, which will be an equivalent test of the significance of the association. Note that the correspondence is with a likelihood ratio chisquare, rather than a Pearson chisquare. The correspondences between the following approaches with Stata add to my confidence in how Stata handles weights. You could check whether you get the same correspondences with SPSS.
gen stunted=.
replace stunted=0 if hw70<600
replace stunted=1 if hw70<200
gen age=b8
replace age=. if v008b3<6
tab stunted age, lrchi2
scalar pvalue=r(p_lr)
scalar list pvalue
logit stunted i.age
scalar pvalue=e(p)
scalar list pvalue
svyset v021 [pweight=v005], strata(v022) singleunit(centered)
svy: logit stunted i.age
scalar pvalue=e(p)
scalar list pvalue



Goto Forum:
Current Time: Fri Jun 22 03:34:24 Eastern Daylight Time 2018
