我们准备了一个 Docker 映像,它可以让您快速启动一个已安装 PostgreML 的新 PostgreSQL 数据库。它还包括一些 Scikit 玩具数据集,因此您可以轻松地试验 PostgresML 工作流程,而无需导入您自己的数据。
目录
PostgresML安装
苹果OS操作系统安装PostgresML
为 OS X 安装 Docker
-
克隆回购:
git clone git@github.com:postgresml/postgresml.git -
启动 Dockerized 服务。PostgresML 将在端口 5433 上运行,以防万一您已经运行了 Postgres:
cd postgresml && docker-compose up -
在安装了 PostgresML 的容器中连接到 Postgres:
psql postgres://postgres@localhost:5433/pgml_development -
验证您的安装:
SQL
SELECT pgml.version();
输出
pgml_development=# SELECT pgml.version();
version
---------
2.0.0
(1 row)
Linux 安装PostgresML
为 Linux 安装 Docker。一些包管理器(例如 Ubuntu/Debian)还要求docker-compose单独安装包
-
克隆回购:
git clone git@github.com:postgresml/postgresml.git -
启动 Dockerized 服务。PostgresML 将在端口 5433 上运行,以防万一您已经运行了 Postgres:
cd postgresml && docker-compose up -
在安装了 PostgresML 的容器中连接到 Postgres:
psql postgres://postgres@localhost:5433/pgml_development -
验证您的安装:
SQL
SELECT pgml.version();输出
pgml_development=# SELECT pgml.version(); version --------- 2.0.0 (1 row)
Windows 安装PostgresML
为 Windows 安装 Docker。如果要在适用于 Linux 的 Windows 子系统中安装,请使用 Linux 说明。
-
克隆回购:
git clone git@github.com:postgresml/postgresml.git -
启动 Dockerized 服务。PostgresML 将在端口 5433 上运行,以防万一您已经运行了 Postgres:
cd postgresml && docker-compose up -
在安装了 PostgresML 的容器中连接到 Postgres:
psql postgres://postgres@localhost:5433/pgml_development -
验证您的安装:
SQL
SELECT pgml.version();输出
pgml_development=# SELECT pgml.version(); version --------- 2.0.0 (1 row)
快速开始 PostgresML
这是一个简单的 PostgresML 工作流程,可帮助您入门。我们将导入一个 Scikit 数据集,在其上训练几个模型并进行实时预测,所有这些都只使用 SQL。
-
导入
digits数据集:SQL
SELECT * FROM pgml.load_dataset('digits');输出
pgml=# SELECT * FROM pgml.load_dataset('digits'); INFO: num_features: 64, num_samples: 1797, feature_names: ["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"] table_name | rows -------------+------ pgml.digits | 1797 (1 row) -
训练一个 XGBoost 模型:
SQL
SELECT * FROM pgml.train('My First PostgresML Project', task => 'classification', relation_name => 'pgml.digits', y_column_name => 'target', algorithm => 'xgboost', hyperparams => '{ "n_estimators": 25 }' );输出
pgml=# SELECT * FROM pgml.train('My First PostgresML Project', task => 'classification', relation_name => 'pgml.digits', y_column_name => 'target', algorithm => 'xgboost', hyperparams => '{ "n_estimators": 25 }' ); INFO: Snapshotting table "pgml.digits", this may take a little while... INFO: Snapshot of table "pgml.digits" created and saved in "pgml"."snapshot_1" INFO: Dataset { num_features: 64, num_labels: 1, num_rows: 1797, num_train_rows: 1348, num_test_rows: 449 } INFO: Training Model { id: 15, algorithm: xgboost, runtime: rust } INFO: Hyperparameter searches: 1, cross validation folds: 1 INFO: Hyperparams: { "n_estimators": 25 } INFO: Metrics: { "f1": 0.88522536, "precision": 0.8835865, "recall": 0.88687027, "accuracy": 0.8841871, "mcc": 0.87189955, "fit_time": 0.44059604, "score_time": 0.005983766 } project | task | algorithm | deployed -----------------------------+----------------+-----------+---------- My first PostgresML project | classification | xgboost | t (1 row) -
训练 LightGBM 模型:
SQL
SELECT * FROM pgml.train('My First PostgresML Project', task => 'classification', relation_name => 'pgml.digits', y_column_name => 'target', algorithm => 'lightgbm' );输出
pgml=# SELECT * FROM pgml.train('My First PostgresML Project', task => 'classification', relation_name => 'pgml.digits', y_column_name => 'target', algorithm => 'lightgbm' ); INFO: Snapshotting table "pgml.digits", this may take a little while... INFO: Snapshot of table "pgml.digits" created and saved in "pgml"."snapshot_18" INFO: Dataset { num_features: 64, num_labels: 1, num_rows: 1797, num_train_rows: 1348, num_test_rows: 449 } INFO: Training Model { id: 16, algorithm: lightgbm, runtime: rust } INFO: Hyperparameter searches: 1, cross validation folds: 1 INFO: Hyperparams: {} INFO: Metrics: { "f1": 0.91579026, "precision": 0.915012, "recall": 0.9165698, "accuracy": 0.9153675, "mcc": 0.9063865, "fit_time": 0.27111048, "score_time": 0.004169579 } project | task | algorithm | deployed -----------------------------+----------------+-----------+---------- My first PostgresML project | classification | lightgbm | t (1 row) -
实时推断几个数据点:
SQL
SELECT target, pgml.predict('My First PostgresML Project', image) AS prediction FROM pgml.digits LIMIT 5;输出
pgml=# SELECT target, pgml.predict('My First PostgresML Project', image) AS prediction FROM pgml.digits LIMIT 5; target | prediction --------+------------ 0 | 0 1 | 1 2 | 2 3 | 3 4 | 4
PostgresML 自动执行以下常见机器学习任务:
- 快照数据,以便实验可重现
- 将数据集拆分为训练集和测试集
- 训练和验证模型
- 将其保存到模型存储中(Postgres 表)
- 在推理期间加载并缓存它
仪表板
仪表板应用程序位于https://localhost:8000。您可以使用它在 Jupyter 风格的笔记本中编写实验、管理项目以及可视化 PostgresML 使用的数据集。

原文标题:PostgresML - Quick Start w/ Docker
原文链接:https://postgresml.org/user_guides/setup/quick_start_with_docker/




