XGBoost is a popular implementation of Gradient Boosting because of its speed and performance. Internally, XGBoost models represent all problems as a regression predictive modeling problem that only takes numerical values as input. If your data is in a different form, it must be prepared into the expected format.
In this post, you will discover how to prepare your data for using with gradient boosting with the XGBoost library in Python.
After reading this post you will know:
- How to encode string output variables for classification.
- How to prepare categorical input variables using one hot encoding.
- How to automatically handle missing data with XGBoost.
Let’s get started.
All code could be found HERE