DuckDB
duckdb.hpp[1] 是 DuckDB 对外提供的 C++ 头文件,主要对 4 个头文件进行包装,方便用户使用。
#pragma once
#include "duckdb/main/connection.hpp"
#include "duckdb/main/database.hpp"
#include "duckdb/main/query_result.hpp"
#include "duckdb/main/loadable_extension.hpp"
database.hpp[2] 头文件中定义了 duckdb::DuckDB[3] 和 duckdb::DatabaseInstance[4]。
DuckDB 中持有一个 std::shared_ptr<DatabaseInstance>
智能指针。
DatabaseInstance
是主要的数据库实现。其中封装了 StorageManager[5]、Catalog[6]、FileSystem[7]、TransactionManager[8]、TaskScheduler[9]、ObjectCache[10]、ConnectionManager[11]。
创建 DuckDB 的时候传两个可选的参数:
path
:数据持久化保存的文件路径。如果path == nullptr
,则 DuckDB 的数据只会保存在内存中,不进行持久化。duckdb::DBConfig[12] *config:打开/创建 DuckDB 实例的配置。 设置查询时的最大并行线程数: PRAGMA threads=20;设置内存限制(根据我的测试,这个限制并不靠谱): PRAGMA memory_limit='1GB';DuckDB 的一些配置可以在运行时通过 pragmas 语句[13]修改。比如:
Connection
创建 DuckDB 实例后,需要通过 duckdb::Connection[14] 来访问 DuckDB,执行 SQL 语句。Connection
其实是“模拟”了一个数据库连接,实际上它并没有网络连接。
duckdb::Connection
的声明在 connection.hpp[15] 头文件,其中最重要的是几个 Query
接口,用于直接执行 SQL 语句:
//! Issues a query to the database and returns a QueryResult. This result can be either a StreamQueryResult or a
//! MaterializedQueryResult. The result can be stepped through with calls to Fetch(). Note that there can only be
//! one active StreamQueryResult per Connection object. Calling SendQuery() will invalidate any previously existing
//! StreamQueryResult.
DUCKDB_API unique_ptr<QueryResult> SendQuery(const string &query);
//! Issues a query to the database and materializes the result (if necessary). Always returns a
//! MaterializedQueryResult.
DUCKDB_API unique_ptr<MaterializedQueryResult> Query(const string &query);
//! Issues a query to the database and materializes the result (if necessary). Always returns a
//! MaterializedQueryResult.
DUCKDB_API unique_ptr<MaterializedQueryResult> Query(unique_ptr<SQLStatement> statement);
DuckDB 还支持 prepared statements:
// prepared statements
template <typename... Args>
unique_ptr<QueryResult> Query(const string &query, Args... args) {
vector<Value> values;
return QueryParamsRecursive(query, values, args...);
}
//! Prepare the specified query, returning a prepared statement object
DUCKDB_API unique_ptr<PreparedStatement> Prepare(const string &query);
//! Prepare the specified statement, returning a prepared statement object
DUCKDB_API unique_ptr<PreparedStatement> Prepare(unique_ptr<SQLStatement> statement);
可以参考 connection.hpp
,查看更多的接口和功能。
QueryResult
DuckDB 的 query 返回结果是一个 duckdb::QueryResult[16] 对象。QueryResult
是一个虚基类,子类型有 2 个:
MaterializedQueryResult[17]:完整的结果集。 StreamQueryResult[18]:流式结果集,可以通过 Fetch
方法逐个获取。
Extension
DuckDB 基于 UDF 实现了一些 extension,主要有 5 个:
ICUExtension[19] ParquetExtension[20] TPCHExtension[21] FTSExtension[22] HTTPFsExtension[23]
这些 extension 大概率不会是学习 DuckDB 的重点,后面有遇到再说吧。
例子
我们来看看 C++ 的例子,代码路径 duckdb/examples/embedded-c++[24]。切换到这个路径下,执行 make
,就可以编译和运行这个例子。
这个例子非常简单:
创建一个默认的 DuckDB 实例—— db
。创建一个“连接”这个 db
的 Connection 对象——con
。通过这个 con
执行一个建表语句。通过这个 con
执行一个 insert 语句。通过这个 con
执行一个 select 语句。打印 select
语句的执行结果。
#include "duckdb.hpp"
using namespace duckdb;
int main() {
DuckDB db(nullptr);
Connection con(db);
con.Query("CREATE TABLE integers(i INTEGER)");
con.Query("INSERT INTO integers VALUES (3)");
auto result = con.Query("SELECT * FROM integers");
result->Print();
}
下面,我们就简单修改一下,让这个示例更完整一下。测试的 csv 文件:
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA
1988-01-03|AA|New York, NY|Los Angeles, CA
示例代码:
#include "duckdb.hpp"
#include "duckdb/main/appender.hpp"
#include <iostream>
#include <memory>
using namespace duckdb;
void Usage(const char *program) {
std::cout << "Usage: " << program << " <data_file> <csv_file>" << std::endl;
}
int main(int argc, char **argv) {
if (argc != 3) {
Usage(argv[0]);
exit(-1);
}
std::string data_file = argv[1];
std::string csv_file = argv[2];
DuckDB db(data_file);
Connection con(db);
{
// 从 CSV 文件读取数据
std::string sql = "SELECT * FROM read_csv_auto('" + csv_file + "')";
auto result = con.Query(sql);
std::cout << sql << std::endl;
result->Print();
}
{
// 从 CSV 文件创建表格和导入数据
std::string sql = "CREATE TABLE ontime AS SELECT * FROM read_csv_auto('" + csv_file + "')";
auto result = con.Query(sql);
result->Print();
std::string sql_select = "SELECT * FROM ontime";
result = con.Query(sql_select);
std::cout << sql_select << std::endl;
result->Print();
}
{
// 创建表和后从 csv 导入数据
std::string sql_create_table = "CREATE TABLE ontime2(flightdate DATE, uniquecarrier VARCHAR, origincityname "
"VARCHAR, destcityname VARCHAR)";
auto result = con.Query(sql_create_table);
result->Print();
std::string sql_copy_data = "COPY ontime2 FROM '" + csv_file + "' ( DELIMITER '|', HEADER )";
result = con.Query(sql_copy_data);
result->Print();
std::string sql_select = "SELECT * FROM ontime2";
result = con.Query(sql_select);
std::cout << sql_select << std::endl;
result->Print();
}
{
// Insert 数据
std::string sql_insert = "INSERT INTO ontime VALUES('2021-05-23', 'AA', 'Shang Hai, SH', 'Guang Zhou, GD')";
auto result = con.Query(sql_insert);
result->Print();
std::string sql_select = "SELECT * FROM ontime";
result = con.Query(sql_select);
std::cout << sql_select << std::endl;
result->Print();
}
{
// 通过 Appender 写入数据
// https://duckdb.org/docs/data/appender
// https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/appender.hpp
Appender appender(con, "ontime");
appender.AppendRow(Value::DATE(2021, 5, 24), "AA", "Shang Hai, SH", "Shen Zhen, GD");
appender.Close();
std::string sql_select = "SELECT * FROM ontime";
auto result = con.Query(sql_select);
std::cout << sql_select << std::endl;
result->Print();
}
}
这个例子只要展示了各种向 DuckDB 写入数据的方式[25]:
导入 csv 文件 通过 insert 语句写入数据 通过 appender 写入数据 DuckDB 还支持导入 parquet 格式的数据(上面的例子中没展示)。
DuckDB 支持比较完整的 SQL 功能,这里就不再多说了,具体可以参考官网文档[26]。
参考资料
duckdb.hpp: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb.hpp
[2]database.hpp: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/database.hpp
[3]duckdb::DuckDB: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/database.hpp#L63
[4]duckdb::DatabaseInstance: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/database.hpp#L25
[5]StorageManager: https://github.com/duckdb/duckdb/blob/v0.2.6/src/storage/storage_manager.cpp
[6]Catalog: https://github.com/duckdb/duckdb/blob/v0.2.6/src/catalog/catalog.cpp
[7]FileSystem: https://github.com/duckdb/duckdb/blob/v0.2.6/src/common/file_system.cpp
[8]TransactionManager: https://github.com/duckdb/duckdb/blob/v0.2.6/src/transaction/transaction_manager.cpp
[9]TaskScheduler: https://github.com/duckdb/duckdb/blob/v0.2.6/src/parallel/task_scheduler.cpp
[10]ObjectCache: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/storage/object_cache.hpp
[11]ConnectionManager: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/connection_manager.hpp
[12]duckdb::DBConfig: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/config.hpp#L28
[13]pragmas 语句: https://duckdb.org/docs/sql/pragmas
[14]duckdb::Connection: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/connection.hpp#L33
[15]connection.hpp: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/connection.hpp
[16]duckdb::QueryResult: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/query_result.hpp#L24
[17]MaterializedQueryResult: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/materialized_query_result.hpp#L17
[18]StreamQueryResult: https://github.com/duckdb/duckdb/blob/v0.2.6/src/include/duckdb/main/stream_query_result.hpp#L20
[19]ICUExtension: https://github.com/duckdb/duckdb/tree/v0.2.6/extension/icu
[20]ParquetExtension: https://github.com/duckdb/duckdb/tree/v0.2.6/extension/parquet
[21]TPCHExtension: https://github.com/duckdb/duckdb/tree/v0.2.6/extension/tpch
[22]FTSExtension: https://github.com/duckdb/duckdb/tree/v0.2.6/extension/fts
[23]HTTPFsExtension: https://github.com/duckdb/duckdb/tree/v0.2.6/extension/httpfs
[24]duckdb/examples/embedded-c++: https://github.com/duckdb/duckdb/tree/v0.2.6/examples/embedded-c%2B%2B
[25]DuckDB 写入数据的方式: https://duckdb.org/docs/data/overview
[26]官网文档: https://duckdb.org/docs/




