The DHS Program User Forum: India » k fold cross validation for logistic regression in R

Home » Countries » India » k fold cross validation for logistic regression in R (stuck up using cross validation using surveyCV package)

Show: Today's Messages :: Show Polls :: Message Navigator

k fold cross validation for logistic regression in R [message #26199]

Thu, 16 February 2023 03:53

dhivvyajp@am.amrita.edu is currently offline

dhivvyajp@am.amrita.edu
Messages: 4
Registered: January 2023

Member

Dear Experts,

I am working on DHS7 dataset. I was able to do logistic regression for 70% training and 30% testing data. But When I am trying to do k fold cross validation instead of 70: 30 split up, came across surveyCV package. I am getting the following error. Kindly let me know how can I fix this issue.
> set.seed(2023)
> svylogistic <- svyglm(formula = InternetUsage~RuralOrUrban+AgeGroup+WealthIndex+SchoolingCom pleted+Religion+Caste+MaritalStatus+Occupation+Gender+Litera cy+OwnsMobile, design=my_design, family=quasibinomial())
> cv.svyglm(svylogistic, nfolds=3, na.rm = FALSE)
Error in if (clusterID %in% c("0", "1")) { : the condition has length > 1

Report message to a moderator

Re: k fold cross validation for logistic regression in R [message #26205 is a reply to message #26199]

Thu, 16 February 2023 08:25

Bridgette-DHS
Messages: 3043
Registered: February 2013

Senior Member

Following is a response from Senior DHS staff member, Tom Pullum:

If you were using Stata, I would want to be sure that your svyset command includes "singleunit(centered)" (or one of the other options in the parentheses). The India surveys are huge, but it is still possible that singleunit is needed when you are dividing the sample up. I hope R has something equivalent to singleunit (I do not use R myself). Beyond that, I have no suggestions. Other users may be able to help.

Report message to a moderator

Re: k fold cross validation for logistic regression in R [message #26409 is a reply to message #26199]

Fri, 17 March 2023 03:45

dhivvyajp@am.amrita.edu
Messages: 4
Registered: January 2023

Member

I tried the following also. But not able to fix the error.

> cv.svy(train, formulae = " InternetUsage~RuralOrUrban+AgeGroup+WealthIndex+SchoolingCom pleted+Religion+Caste+MaritalStatus+Occupation+Gender+Litera cy+OwnsMobile ", method = "logistic", nfolds=3, strataID = train$strata, clusterID = train$Cluster, nest = T, weightsID = train$samplewt)
....................................Error in .subset2(x, i, exact = exact) : no such index at level 1[/color]
> cv.svy(train, formulae = " InternetUsage~RuralOrUrban+AgeGroup+WealthIndex+SchoolingCom pleted+Religion+Caste+MaritalStatus+Occupation+Gender+Litera cy+OwnsMobile ", method = "logistic", nfolds=3, strataID = train$strata, clusterID = train$Cluster, nest = F, weightsID = train$samplewt)
....................................Error in .subset2(x, i, exact = exact) : no such index at level 1
> cv.svyglm(svylogistic, nfolds=3,na.rm=FALSE)
....................................Error in if (clusterID %in% c("0", "1")) { : the condition has length > 1

Can anyone help me in fixing this error?

Report message to a moderator

Re: k fold cross validation for logistic regression in R [message #26502 is a reply to message #26409]

Mon, 27 March 2023 08:05

Bridgette-DHS
Messages: 3043
Registered: February 2013

Senior Member

Following is a response from DHS Senior Analysis & Research Manager, Shireen Assaf:

# install and load the packages you need
install.packages("survey")
library(survey)

# setting your survey design
# To identify the survey design, you need three variables: weight, psu, and strata

# creating the sampling weight variable. 
IRdata$wt <- IRdata$v005/1000000

mysurvey<-svydesign(id=IRdata$v021, data=IRdata, strata=IRdata$v022,  weight=IRdata$wt, nest=T)
options(survey.lonely.psu="adjust")

#now you can use the svy commands in the survey package and use the "mysurvey" sample design object. Check the commands you can use in the survey package. 

# for example

# table of variable v313 (FP use) this is after you attach your data. You can also use svyby, svyglm, etc. 

svytable(~v313, mysurvey)

[Updated on: Mon, 27 March 2023 08:05]

Report message to a moderator

Previous Topic:	Multilevel modelling sampling weights
Next Topic:	Norms variables [[URGENT]]

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sat Apr 27 08:08:52 Coordinated Universal Time 2024