1. Install Python 3.7
You need to download
Windows x86-64 executable installer. Check ‘add to PATH’ when downloading.
2. Set OpenCL Environments
- Make sure you have nvidia gpu and have nvidia graphic drivers. (https://www.nvidia.com/Download/index.aspx
- Install Visual Studio 2017. From Visual Studio installer, check ‘Desktop development with C++’ and install. (https://visualstudio.microsoft.com)
- Install CUDA 10.0. (https://developer.nvidia.com/cuda-10.0-download-archive) Latest version is 10.1 but since Tensorflow supports 10.0(https://www.tensorflow.org/install/gpu), download 10.0.
- Install latest cuDNN for CUDA 10.0. (https://developer.nvidia.com/cudnn) You have to login for Nvidia to download it. You need to unzip and move folder named ‘cuda’ to C:\tools
- Reboot your computer.
3. Set LightGBM GPU Environments
- Install Cmake. (https://cmake.org/download/)
- Install Boost Binaries ‘boost_1_69_0-msvc-14.1-64.exe’. (https://bintray.com/boostorg/release/boost-binaries/1.69.0#files)
4. Set PATH
Add these to system PATH.
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\extras\CUPTI\libx64
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\include
5. Create VirtualEnv
- Go to cmd, type
pip install virtualenv.
virtualenv gputo create virtual environment called ‘gpu’.
- Type ‘call gpu\scripts\activate’ to activate environment.
6. Install Packages
At your env-activated cmd, do the following.
pip install numpy pandas matplotlib seaborn scikit-learn tqdm jupyter tensorflow-gpu catboost xgboost optuna multiprocess category_encoders tables.
pip install lightgbm --install-option=--gputo install lightgbm gpu version.
What are these packages for?
- NumPy: NumPy is the fundamental package for scientific computing with Python.
- Pandas: Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
- Matplotlib: Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
- seaborn: Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
- scikit-learn: Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
- tqdm: tqdm displays progress bar when doing loops.
- jupyter: jupyter notebook makes it a lot easier to code in data science.
- TensorFlow: TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.
- LightGBM: LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is fast and accurate.
- CatBoost: CatBoost is a machine learning method based on gradient boosting over decision trees. It is really fast on GPUs.
- XGBoost: XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It was once used by many kagglers, but is diminishing due to arise of LightGBM and CatBoost.
- Optuna: Optuna is a define-by-run bayesian hyperparameter optimization framework. It supports multiprocessing and pruning when searching.
- multiprocess: It is a lot easier to use this libarary rather than defualt ‘multiprocessing’.
- Category Encoders: A set of scikit-learn-style transformers for encoding categorical variables into numeric with different techniques.
- Tables: This enables pandas to save and load dataframes or series with hdf format.