Performing a manual backward stepwise logistic regression in Stata [message #10711] |
Mon, 05 September 2016 02:22 |
npolle
Messages: 6 Registered: August 2016 Location: MOMBASA, KENYA
|
Member |
|
|
I am new in using stata and interested in using the 2011 Uganda Demographic and Health Survey to determine prevalence of disability and associated risk factors. I need assistance in performing a manual backward stepwise logistic regression in Stata. I have read that in performing a manual backward stepwise logistic regression in Stata, I first need to run the full model (with all covariates), followed by testing all variables for statistical significance at p<0.05 starting with the bottom variable.
I need assistance on how to test the variables for statistical significance. This is how I approached it, however the test command does not run.
*GENERATING THE SUBPOPULATION
generate overfive=.
replace overfive = 1 if hv105>=5 & hv105<96 & hv103==1
replace overfive = 0 if overfive!=1
*CONDUCTING MULTIVARIATE LOGISTICS REGRESSION
*Where 'onedisability' is the variable for difficulty in at least one functional
area
svy, subpop (overfive): logistic onedisability ///
hv104 ///
hv106 ///
hv025 ///
hv270 ///
hv024
test hv024
Thank you
|
|
|
Re: Performing a manual backward stepwise logistic regression in Stata [message #10728 is a reply to message #10711] |
Tue, 06 September 2016 12:44 |
Bridgette-DHS
Messages: 3230 Registered: February 2013
|
Senior Member |
|
|
Following is a response from Senior DHS Stata Specialist, Tom Pullum:
I have revised some of your Stata code below. Rather than manual stepwise selection I would use the Stata command "stepwise". There are limitations with this command. It does not work with "i." for categorical variables, so you must form sets of dummies with "xi". Then if one of the dummies is dropped from the model, it means that the category has been consolidated with the reference category. It also does not work with svyset, but it DOES work with the pweight and cluster specifications within an estimation command, so all you are missing is the stratum adjustment and subpop, which you can add after you have selected the final model. In the example, I am setting a p value of .1 as the threshold for retention in the model. This number can be changed. I would repeat the procedure with various values ranging from, say, .5 to .05. I personally prefer logit to logistic, but have used logistic below since that's what you had.
use e:\DHS\DHS_data\PR_files\UGPR60FL.dta
generate overfive=.
replace overfive = 1 if hv105>=5 & hv105<96 & hv103==1
replace overfive = 0 if overfive!=1
* construct onedisability
describe sh24-sh29
tab1 sh24-sh29,m
gen sh24r=0
replace sh24r=1 if sh24>=2 & sh24<=4
replace sh24r=. if sh24==.
gen sh25r=0
replace sh25r=1 if sh25>=2 & sh25<=4
replace sh25r=. if sh25==.
gen sh26r=0
replace sh26r=1 if sh26>=2 & sh26<=4
replace sh26r=. if sh26==.
gen sh27r=0
replace sh27r=1 if sh27>=2 & sh27<=4
replace sh27r=. if sh27==.
gen sh28r=0
replace sh28r=1 if sh28>=2 & sh28<=4
replace sh28r=. if sh28==.
gen sh29r=0
replace sh29r=1 if sh29>=2 & sh29<=4
replace sh29r=. if sh29==.
egen onedisability=rowtotal(sh24r-sh29r), missing
replace onedisability=1 if onedisability>1 & onedisability<.
tab onedisability,m
*CONDUCTING MULTIVARIATE LOGISTICS REGRESSION
*Where 'onedisability' is the variable for difficulty in at least one functionalarea
* specify svyset, but stepwise cannot work with svyset. Use pweight and cluster, and do not
* include strata and subpop until you get to the final model.
* stepwise cannot handle i. for categorical variables; must form sets of dummies
xi, prefix(v_) i.hv024 i.hv025 i.hv104 i.hv106 i.hv270
logistic onedisability v_* [pweight=hv005], cluster(hv021)
stepwise, pr(.1): logistic onedisability v_* [pweight=hv005], cluster(hv021)
|
|
|
|