How to Save Gradient Boosting Models with XGBoost in Python

XGBoost can be used to create some of the most performant models for tabular data using the gradient boosting algorithm.

Once trained, it is often a good practice to save your model to file for later use in making predictions new test and validation datasets and entirely new data.

In this post you will discover how to save your XGBoost models to file using the standard Python pickle API.

After completing this tutorial, you will know:

How to save and later load your trained XGBoost model using pickle.
How to save and later load your trained XGBoost model using joblib.

Let’s get started.

How to Save Gradient Boosting Models with XGBoost in Python
Photo by Keoni Cabral, some rights reserved.

The Algorithm that is Winning Competitions
...XGBoost for fast gradient boosting

XGBoost With Python Mini Course XGBoost is the high performance implementation of gradient boosting that you can now access directly in Python.

Your PDF Download and Email Course.

FREE 7-Day Mini-Course on
XGBoost With Python

Download Your FREE Mini-Course

Download your PDF containing all 7 lessons.

Daily lesson via email with tips and tricks.

Serialize Your XGBoost Model with Pickle

Pickle is the standard way of serializing objects in Python.

You can use the Python pickle API to serialize your machine learning algorithms and save the serialized format to a file, for example:

# save model to file
pickle.dump(model, open("pima.pickle.dat", "wb"))

Later you can load this file to deserialize your model and use it to make new predictions, for example:

# load model from file
loaded_model = pickle.load(open("pima.pickle.dat", "rb"))

The example below demonstrates how you can train a XGBoost model on the Pima Indians onset of diabetes dataset, save the model to file and later load it to make predictions.

The full code listing is provided below for completeness.

# Train XGBoost model, save to file using pickle, load and make predictions
from numpy import loadtxt
import xgboost
import pickle
from sklearn import cross_validation
from sklearn.metrics import accuracy_score
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
Y = dataset[:,8]
# split data into train and test sets
seed = 7
test_size = 0.33
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, Y, test_size=test_size, random_state=seed)
# fit model no training data
model = xgboost.XGBClassifier()
model.fit(X_train, y_train)
# save model to file
pickle.dump(model, open("pima.pickle.dat", "wb"))

# some time later...

# load model from file
loaded_model = pickle.load(open("pima.pickle.dat", "rb"))
# make predictions for test data
y_pred = loaded_model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Running this example saves your trained XGBoost model to the pima.pickle.dat pickle file in the current working directory.

pima.pickle.dat

After loading the model and making predictions on the training dataset, the accuracy of the model is printed.

Accuracy: 77.95%

Serialize XGBoost Model with joblib

Joblib is part of the SciPy ecosystem and provides utilities for pipelining Python jobs.

The Joblib API provides utilities for saving and loading Python objects that make use of NumPy data structures, efficiently. It may be a faster approach for you to use with very large models.

The API looks a lot like the pickle API, for example, you may save your trained model as follows:

# save model to file
joblib.dump(model, "pima.joblib.dat")

You can later load the model from file and use it to make predictions as follows:

# load model from file
loaded_model = joblib.load("pima.joblib.dat")

The example below demonstrates how you can train an XGBoost model for classification on the Pima Indians onset of diabetes dataset, save the model to file using Joblib and load it at a later time in order to make predictions.

# Train XGBoost model, save to file using joblib, load and make predictions
from numpy import loadtxt
import xgboost
from sklearn.externals import joblib
from sklearn import cross_validation
from sklearn.metrics import accuracy_score
# load data
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
# split data into X and y
X = dataset[:,0:8]
Y = dataset[:,8]
# split data into train and test sets
seed = 7
test_size = 0.33
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, Y, test_size=test_size, random_state=seed)
# fit model no training data
model = xgboost.XGBClassifier()
model.fit(X_train, y_train)
# save model to file
joblib.dump(model, "pima.joblib.dat")

# some time later...

# load model from file
loaded_model = joblib.load("pima.joblib.dat")
# make predictions for test data
y_pred = loaded_model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Running the example saves the model to file as pima.joblib.dat in the current working directory and also creates one file for each NumPy array within the model (in this case two additional files).

pima.joblib.dat
pima.joblib.dat_01.npy
pima.joblib.dat_02.npy

After the model is loaded, it is evaluated on the training dataset and the accuracy of the predictions is printed.

Accuracy: 77.95%

Want to Systematically Learn How To Use XGBoost?

You can develop and evaluate XGBoost models in just a few lines of Python code. You need:

>> XGBoost With Python

Take the next step with 15 self-study tutorial lessons.

Covers building large models on Amazon Web Services, feature importance, tree visualization, hyperparameter tuning, and much more...

Ideal for machine learning practitioners already familiar with the Python ecosystem.

Bring XGBoost To Your Machine Learning Projects

Summary

In this post you discovered how to serialize your trained XGBoost models and later load them in order to make predictions.

Specifically, you learned:

How to serialize and later load your trained XGBoost model using the pickle API.
How to serialize and later load your trained XGBoost model using the joblib API.

Do you have any questions about serializing your XGBoost models or about this post? Ask your questions in the comments and I will do my best to answer.

The post How to Save Gradient Boosting Models with XGBoost in Python appeared first on Machine Learning Mastery.

How to Save Gradient Boosting Models with XGBoost in Python

The Algorithm that is Winning Competitions
...XGBoost for fast gradient boosting

FREE 7-Day Mini-Course on
XGBoost With Python

Serialize Your XGBoost Model with Pickle

Serialize XGBoost Model with joblib

Want to Systematically Learn How To Use XGBoost?

Summary

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List

The Algorithm that is Winning Competitions...XGBoost for fast gradient boosting

FREE 7-Day Mini-Course on XGBoost With Python

Serialize Your XGBoost Model with Pickle

Serialize XGBoost Model with joblib

Want to Systematically Learn How To Use XGBoost?

Summary

Trending Articles

The Algorithm that is Winning Competitions
...XGBoost for fast gradient boosting

FREE 7-Day Mini-Course on
XGBoost With Python