pycaret
pycaret是机器学习的懒人包。与其他开源机器学习库相比,pycaret是一个备用的低代码库,可用于仅用很少几个单词替换数百行代码。它本质上就是组装了多个机器学习库和框架,例如scikit-learn,XGBoost,Microsoft LightGBM,spaCy等。

比如几年前,为了这样对比sklearn的几个estimator,你需要以下的代码:
# Regression problemimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport numpy as npfrom sklearn import model_selectionfrom sklearn.metrics import make_scorer, mean_squared_errorfrom sklearn.svm import SVR, LinearSVRfrom sklearn.tree import DecisionTreeRegressorfrom sklearn.linear_model import LinearRegression,Ridge,Lasso,ElasticNet,BayesianRidge,SGDRegressorfrom sklearn.gaussian_process import GaussianProcessRegressorfrom sklearn.neighbors import KNeighborsRegressorfrom sklearn.tree import DecisionTreeRegressorfrom sklearn.neural_network import MLPRegressorfrom sklearn.ensemble import GradientBoostingRegressor,RandomForestRegressor,ExtraTreesRegressorfrom sklearn.kernel_ridge import KernelRidgemodels=[]models.append(('DecisionTree', DecisionTreeRegressor()))models.append(('Ridge', Ridge()))models.append(('Lasso', Lasso()))models.append(('EN', ElasticNet(alpha=0.001,max_iter=10000)))models.append(('BayesianRidge',BayesianRidge()))models.append(('SVM',SVR()))models.append(('KNeighbors',KNeighborsRegressor()))models.append(('NN',MLPRegressor()))models.append(('GBoosting',GradientBoostingRegressor()))models.append(('RF',RandomForestRegressor()))models.append(('ExtraTrees',ExtraTreesRegressor()))models.append(('SGD',SGDRegressor(max_iter=1000,tol=1e-3)))models.append(('Kernel_Ridge',KernelRidge(alpha=0.6, kernel='polynomial', degree=2, coef0=2.5)))models.append(('LR_SVR',LinearSVR()))models.append(('LR',LinearRegression()))def compare_scores_mae(models, X, y):cv_means = []cv_std = []cv_resutls= []names=[]for name,model in models:kfold = model_selection.KFold(n_splits=10)cv_results = model_selection.cross_val_score(model, X, y, cv=kfold, scoring='neg_mean_absolute_error',n_jobs=10)cv_means.append(cv_results.mean())cv_std.append(cv_results.std())names.append(name)msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())print(msg)cv_res=pd.DataFrame({"CrossValMeans":cv_means,"CrossValerrors": cv_std,"Algorithm":names})g = sns.barplot("CrossValMeans","Algorithm",data = cv_res, palette="Set3",orient = "h",**{'xerr':cv_std})g.set_xlabel("negative MAE")g = g.set_title("Cross validation scores")return cv_res
对,我在认识pycaret之前就是这么干的。
pycaret就是这样把这些代码封装📦成了一个函数:compare_models()

安装
官方给出了通过pip
#installing for the first timepip install pycaret#if you have installed beta version in past, run the below code to upgradepip install --upgrade pycaret#Run the below code in your notebook to check the installed versionfrom pycaret.utils import versionversion()
或者conda安装的方法,
#create a conda environmentconda create --name yourenvname python=3.6#activate environmentconda activate yourenvname#install pycaretpip install pycaret#create notebook kernel connected with the conda environmentpython -m ipykernel install --user --name yourenvname --display-name "display-name-here"
如果在colab或者kaggle的instance的话使用!pip就好。然鹅如果安装过python或者R包的你知道,事情可能并没有那么简单,Macos在安装llvmlite和LightGBM的时候各种error让人猝不及防,导致安装失败。大概花了一个小时在Macos上安装pycaret (悄悄告诉你kaggle的instance上安装没有任何毛病)。
llvmlite
pip安装总是出现python setup_tools的相关错误,过程中发现llvmite这个包需要cmake。用brew安装了cmake,结果还是不行。最后在github一个角落发现可以使用easy_install的命令轻松解决其不能在python3.8上安装的问题,果断试了试(自己用的conda环境python3.5),问题解决。原理不得而知,pip不行,easy_install就可以。

brew install cmakeeasy_install llvmlite
LightGBM
LihgtGBM是树模型中模型能力最优异的模型之一,作为pycaret包含的模型之一,安装pycaret的过程中也需要安装LightGBM。LightGBM在window上的安装很简单(微软自家开发),直接使用python自带的pip安装工具安装即可。在Mac上用pip安装会遇到错误。因此需要安装C版本LightGBM。
pip uninstall lightgbmgit clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBMexport CXX=g++-8 CC=gcc-8mkdir build ; cd buildcmake ..make -j4
如果发现自己没有gcc-8的话,使用brew安装gcc-8,记忆中cmake也是需要用到到。
brew install gcc@8
最后的建议
conda和pip安装最好不要混搭。
不要升级pip,升级过后你会有一种需要重新装python的赶脚。
升级之后使用pip如下
File "F:\anaconda\envs\emotion\lib\site-packages\pkg_resources\__init__.py", line 2331, in resolvemodule = __import__(self.module_name, fromlist=['__name__'], level=0)File "F:\anaconda\envs\emotion\lib\site-packages\pip\_internal\__init__.py", line 42, in <module>from pip._internal import cmdoptionsFile "F:\anaconda\envs\emotion\lib\site-packages\pip\_internal\cmdoptions.py", line 16, in <module>from pip._internal.index import (ImportError: cannot import name 'FormatControl'
附赠一份降级教程:
https://pypi.org/project/pip/19.1.1/#files

手动下载第二个文件并解压,在其目录下运行
python setup.py install





