scikit-learn Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

Getting ready

Suppose we wanted to choose between the support vector classifier and the logistic regression classifier. We cannot measure their performance on the unavailable test set.

What if, instead, we:

  • Forgot about the test set for now?
  • Split the training set into two parts, one to train on and one to test the training?

Split the training set into two parts using the train_test_split function used in previous sections:

from sklearn.model_selection import train_test_split
X_train_2, X_test_2, y_train_2, y_test_2 = train_test_split(X_train, y_train, test_size=0.25, random_state=1)

X_train_2 consists of 75% of the X_train data, while X_test_2 is the remaining 25%. y_train_2 is 75% of the target data, and matches the observations of X_train_2. y_test_2 is 25% of the target data present in y_train.

As you might have expected, you have to use these new splits to choose between the two models: SVC and logistic regression. Do so by writing a predictive program.