一、clickhouse local
1、测试数据
[root@19c01 clickhouse]# cat wine_data.csv
14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065,0
13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050,0
13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185,0
14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480,0
13.24,2.59,2.87,21,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735,0
14.2,1.76,2.45,15.2,112,3.27,3.39,0.34,1.97,6.75,1.05,2.85,1450,0
14.39,1.87,2.45,14.6,96,2.5,2.52,0.3,1.98,5.25,1.02,3.58,1290,0
14.06,2.15,2.61,17.6,121,2.6,2.51,0.31,1.25,5.05,1.06,3.58,1295,0
14.83,1.64,2.17,14,97,2.8,2.98,0.29,1.98,5.2,1.08,2.85,1045,0
13.86,1.35,2.27,16,98,2.98,3.15,0.22,1.85,7.22,1.01,3.55,1045,0
14.1,2.16,2.3,18,105,2.95,3.32,0.22,2.38,5.75,1.25,3.17,1510,0
2、命令
[root@19c01 clickhouse]# clickhouse local --help
usage: clickhouse-local [initial table definition] [--query <query>]
clickhouse-local allows to execute SQL queries on your data files via single command line call. To do so, initially you need to define your data source and its format. After you can execute your SQL queries in usual manner.
There are two ways to define initial table keeping your data. Either just in first query like this:
CREATE TABLE <table> (<structure>) ENGINE = File(<input-format>, <file>);
Either through corresponding command line parameters --table --structure --input-format and --file.
Main options:
--help produce help message
-V [ --version ] print version information and exit
--version-clean print version in machine-readable format and exit
-C [ --config-file ] arg config-file path
-q [ --query ] arg query; can be specified multiple times (--query "SELECT 1" --query "SELECT
2"...)
--queries-file arg file path with queries to execute; multiple files can be specified
(--queries-file file1 file2...)
-n [ --multiquery ] If specified, multiple queries separated by semicolons can be listed after
--query. For convenience, it is also possible to omit --query and pass the
queries directly after --multiquery.
-m [ --multiline ] If specified, allow multiline queries (do not send the query on Enter)
-d [ --database ] arg database
--query_kind arg (=initial_query) One of initial_query/secondary_query/no_query
--query_id arg query_id
--history_file arg path to history file
--stage arg (=complete) Request query processing up to specified stage:
complete,fetch_columns,with_mergeable_state,with_mergeable_state_after_aggregati
on,with_mergeable_state_after_aggregation_and_limit
--progress [=arg(=tty)] (=default) Print progress of queries execution - to TTY: tty|on|1|true|yes; to STDERR
non-interactive mode: err; OFF: off|0|false|no; DEFAULT - interactive to TTY,
non-interactive is off
-A [ --disable_suggestion ] Disable loading suggestion data. Note that suggestion data is loaded
asynchronously through a second connection to ClickHouse server. Also it is
reasonable to disable suggestion if you want to paste a query with TAB
characters. Shorthand option -A is for those who get used to mysql client.
-t [ --time ] print query execution time to stderr in non-interactive mode (for benchmarks)
--echo in batch mode, print query before execution
3、测试
通过管道传送数据,你还在为怎么将文件数据加载进入数据库而纠结吗
[root@19c01 clickhouse]# cat /clickhouse/wine_data.csv| clickhouse local --structure "col1 Float64,col2 Float64,col3 Float64,col4 Float64,col5 UInt64,col6 Float64,col7 Float64,col8 Float64,col9 Float64,col10 Float64,col11 Float64,col12 Float64,col13 UInt64,col14 UInt64" --input-format "CSV" --query "select sum(col1) from table";
2314.11
[root@19c01 clickhouse]# cat /clickhouse/wine_data.csv| clickhouse local --structure "col1 Float64,col2 Float64,col3 Float64,col4 Float64,col5 UInt64,col6 Float64,col7 Float64,col8 Float64,col9 Float64,col10 Float64,col11 Float64,col12 Float64,col13 UInt64,col14 UInt64" --input-format "CSV" --query "select max(col1),max(col2),max(col3) from table";
14.83 5.8 3.23
你还在为linux命令行界面没有excel而纠结吗
[root@19c01 clickhouse]# cat /clickhouse/wine_data.csv|clickhouse local --structure "col1 Float64,col2 Float64,col3 Float64,col4 Float64,col5 UInt64,col6 Float64,col7 Float64,col8 Float64,col9 Float64,col10 Float64,col11 Float64,col12 Float64,col13 UInt64,col14 UInt64" --input-format "CSV" --query "select max(col1),max(col2),max(col3) from table" --query "select count(*) from table";
14.83 5.8 3.23
0
第二个查询是错的?
[root@19c01 clickhouse]# cat /clickhouse/wine_data.csv|clickhouse local --structure "col1 Float64,col2 Float64,col3 Float64,col4 Float64,col5 UInt64,col6 Float64,col7 Float64,col8 Float64,col9 Float64,col10 Float64,col11 Float64,col12 Float64,col13 UInt64,col14 UInt64" --input-format "CSV" --query "select max(col1),max(col2),max(col3) from table" --query "select min(col1),min(col2) from table";
14.83 5.8 3.23
0 0
[root@19c01 clickhouse]# cat /clickhouse/wine_data.csv|clickhouse local --structure "col1 Float64,col2 Float64,col3 Float64,col4 Float64,col5 UInt64,col6 Float64,col7 Float64,col8 Float64,col9 Float64,col10 Float64,col11 Float64,col12 Float64,col13 UInt64,col14 UInt64" --input-format "CSV" --multiquery "select max(col1),max(col2),max(col3) from table; select max(col1),min(col2) from table";
14.83 5.8 3.23
0 0
[root@19c01 clickhouse]# cat /clickhouse/wine_data.csv|clickhouse local --structure "col1 Float64,col2 Float64,col3 Float64,col4 Float64,col5 UInt64,col6 Float64,col7 Float64,col8 Float64,col9 Float64,col10 Float64,col11 Float64,col12 Float64,col13 UInt64,col14 UInt64" --input-format "CSV" --query "create table winner engine=MergeTree() order by col1 as select * from table" --database=t1;
[root@19c01 clickhouse]# clickhouse-client
ClickHouse client version 23.12.2.59 (official build).
Connecting to localhost:9000 as user default.
Password for user (default):
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.12.2.
19c01 :) use t1;
USE t1
Query id: 5139b9be-49d7-4879-ae69-328dd1f66727
Ok.
0 rows in set. Elapsed: 0.001 sec.
19c01 :) show tables;
SHOW TABLES
Query id: 85294b27-cd0e-4582-bdaf-831dc4018500
┌─name───┐
│ t1 │
│ test │
│ winner │
└────────┘
3 rows in set. Elapsed: 0.001 sec.
19c01 :) select * from winner;
SELECT *
FROM winner
Query id: 310a6160-d545-4ff1-be23-72f125aa2e7f
┌──col1─┬───────────────col2─┬───────────────col3─┬─col4─┬─col5─┬───────────────col6─┬───────────────col7─┬─col8─┬───────────────col9─┬────col10─┬──────────────col11─┬──────────────col12─┬─col13─┬─col14─┐
│ 14.23 │ 1.71 │ 2.43 │ 15.6 │ 127 │ 2.8 │ 3.06 │ 0.28 │ 2.29 │ 5.64 │ 1.04 │ 3.92 │ 1065 │ 0 │
│ 13.2 │ 1.78 │ 2.14 │ 11.2 │ 100 │ 2.65 │ 2.76 │ 0.26 │ 1.28 │ 4.38 │ 1.05 │ 3.4 │ 1050 │ 0 │
│ 13.16 │ 2.36 │ 2.67 │ 18.6 │ 101 │ 2.8 │ 3.24 │ 0.3 │ 2.81 │ 5.68 │ 1.03 │ 3.17 │ 1185 │ 0 │
│ 14.37 │ 1.95 │ 2.5 │ 16.8 │ 113 │ 3.85 │ 3.49 │ 0.24 │ 2.18 │ 7.8 │ 0.86 │ 3.45 │ 1480 │ 0 │
你还在为oracle的外部表而纠结吗。
clickhouse local --structure "col1 Float64,col2 Float64,col3 Float64,col4 Float64,col5 UInt64,col6 Float64,col7 Float64,col8 Float64,col9 Float64,col10 Float64,col11 Float64,col12 Float64,col13 UInt64,col14 UInt64" --file='/clickhouse/wine_data.csv' --input-format "CSV" --query "select * from test" --database=t1 --table=test;
14.23 1.71 2.43 15.6 127 2.8 3.06 0.28 2.29 5.64 1.04 3.92 1065 0
13.2 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38 1.05 3.4 1050 0
13.16 2.36 2.67 18.6 101 2.8 3.24 0.3 2.81 5.68 1.03 3.17 1185 0
14.37 1.95 2.5 16.8 113 3.85 3.49 0.24 2.18 7.8 0.86 3.45 1480 0
13.24 2.59 2.87 21 118 2.8 2.69 0.39 1.8199999999999998 4.32 1.04 2.93 735 0
14.2 1.76 2.45 15.2 112 3.27 3.39 0.34 1.97 6.75 1.05 2.85 1450 0
14.39 1.87 2.45 14.6 96 2.5 2.52 0.3 1.98 5.25 1.02 3.58 1290 0
14.06 2.15 2.61 17.6 121 2.6 2.51 0.31 1.25 5.05 1.06 3.58 1295 0
14.83 1.6400000000000001 2.17 14 97 2.8 2.98 0.29 1.98 5.2 1.08 2.85 1045 0
13.86 1.35 2.27 16 98 2.98 3.15 0.22 1.85 7.22 1.01 3.55 1045 0
14.1 2.16 2.3 18 105 2.95 3.32 0.22 2.38 5.75 1.25 3.17 1510 0
14.12 1.48 2.32 16.8 95 2.2 2.43 0.26 1.5699999999999998 5 1.17 2.82 1280 0
13.75 1.73 2.41 16 89 2.6 2.76 0.29 1.81 5.6 1.15 2.9 1320 0
14.75 1.73 2.39 11.4 91 3.1 3.69 0.43 2.81 5.4 1.25 2.73 1150 0
14.38 1.87 2.38 12 102 3.3 3.64 0.29 2.96 7.5 1.2 3 1547 0
13.63 1.81 2.7 17.2 112 2.85 2.91 0.3 1.46 7.3 1.28 2.88 1310 0
14.3 1.92 2.7199999999999998 20 120 2.8 3.14 0.33 1.97 6.2 1.07 2.65 1280 0
13.83 1.5699999999999998 2.62 20 115 2.95 3.4 0.4 1.72 6.6 1.13 2.57 1130 0
14.19 1.5899999999999999 2.48 16.5 108 3.3 3.93 0.32 1.8599999999999999 8.7 1.23 2.82 1680 0
二、clickhouse client
1、用法
[root@19c01 clickhouse]# clickhouse client --help
Main options:
--help produce help message
-V [ --version ] print version information and exit
--version-clean print version in machine-readable format and exit
-C [ --config-file ] arg config-file path
-q [ --query ] arg query; can be specified multiple times (--query "SELECT 1" --query "SELECT
2"...)
--queries-file arg file path with queries to execute; multiple files can be specified
(--queries-file file1 file2...)
-n [ --multiquery ] If specified, multiple queries separated by semicolons can be listed after
--query. For convenience, it is also possible to omit --query and pass the
queries directly after --multiquery.
-m [ --multiline ] If specified, allow multiline queries (do not send the query on Enter)
-d [ --database ] arg database
--query_kind arg (=initial_query) One of initial_query/secondary_query/no_query
--query_id arg query_id
--history_file arg path to history file
--stage arg (=complete) Request query processing up to specified stage:
complete,fetch_columns,with_mergeable_state,with_mergeable_state_after_aggregati
on,with_mergeable_state_after_aggregation_and_limit
--progress [=arg(=tty)] (=default) Print progress of queries execution - to TTY: tty|on|1|true|yes; to STDERR
non-interactive mode: err; OFF: off|0|false|no; DEFAULT - interactive to TTY,
non-interactive is off
-A [ --disable_suggestion ] Disable loading suggestion data. Note that suggestion data is loaded
asynchronously through a second connection to ClickHouse server. Also it is
reasonable to disable suggestion if you want to paste a query with TAB
characters. Shorthand option -A is for those who get used to mysql client.
-t [ --time ] print query execution time to stderr in non-interactive mode (for benchmarks)
--echo in batch mode, print query before execution
--verbose print query and other debugging info
--log-level arg log level
--server_logs_file arg put server logs into specified file
--suggestion_limit arg (=10000) Suggestion limit for how many databases, tables and columns to fetch.
-f [ --format ] arg default output format
-E [ --vertical ] vertical output format, same as --format=Vertical or FORMAT Vertical or \G at
end of command
--highlight arg (=1) enable or disable basic syntax highlight in interactive command line
--ignore-error do not stop processing in multiquery mode
...
2、使用
把他当mysql用就完事
[root@19c01 clickhouse]# clickhouse client --host=172.16.220.201 --port=9000 --username=cy
Code: 552. DB::Exception: Unrecognized option '--username=cy'. (UNRECOGNIZED_ARGUMENTS)
[root@19c01 clickhouse]# clickhouse client --host=172.16.220.201 --port=9000 --user=cy
ClickHouse client version 23.12.2.59 (official build).
Connecting to 172.16.220.201:9000 as user cy.
Password for user (cy):
Connecting to 172.16.220.201:9000 as user cy.
Connected to ClickHouse server version 23.12.2.
19c01 :)
三、clickhouse benchmark
1、语法
root@19c01 clickhouse]# clickhouse benchmark --help
Usage: clickhouse [options] < queries.txt
Allowed options:
--help produce help message
-q [ --query ] arg query to execute
-c [ --concurrency ] arg (=1) number of parallel queries
-d [ --delay ] arg (=1) delay between intermediate reports in seconds (set 0 to disable reports)
--stage arg (=complete) request query processing up to specified stage:
complete,fetch_columns,with_mergeable_state,with_mergeable_state_after_aggregati
on,with_mergeable_state_after_aggregation_and_limit
-i [ --iterations ] arg (=0) amount of queries to be executed
-t [ --timelimit ] arg (=0) stop launch of queries after specified time limit
-r [ --randomize ] randomize order of execution
-h [ --host ] arg list of hosts
--port arg list of ports
--roundrobin Instead of comparing queries for different --host/--port just pick one random
--host/--port for every query and send query to it.
--cumulative prints cumulative data instead of data per interval
-s [ --secure ] Use TLS connection
-u [ --user ] arg (=default)
--password arg
--quota_key arg
--database arg (=default)
--stacktrace print stack traces of exceptions
--confidence arg (=5) set the level of confidence for T-test [0=80%, 1=90%, 2=95%, 3=98%, 4=99%,
5=99.5%(default)
--query_id arg
--max-consecutive-errors arg (=0) set number of allowed consecutive errors
--ignore-error continue testing even if a query fails
2、测试
[root@19c01 clickhouse]# clickhouse benchmark --host=172.16.220.201 --port=9000 --user=cy --database=t1 --password=oracle <bench.sql
Loaded 2 queries.
Queries executed: 11 (550.000%).
172.16.220.201:9000, queries: 11, QPS: 10.309, RPS: 481050810.203, MiB/s: 9008.492, result RPS: 10.309, result MiB/s: 0.000.
0.000% 0.069 sec.
10.000% 0.080 sec.
20.000% 0.080 sec.
30.000% 0.083 sec.
40.000% 0.094 sec.
50.000% 0.099 sec.
60.000% 0.103 sec.
70.000% 0.105 sec.
80.000% 0.112 sec.
90.000% 0.112 sec.
95.000% 0.112 sec.
99.000% 0.112 sec.
99.900% 0.112 sec.
99.990% 0.112 sec.
Queries executed: 23 (1150.000%).
172.16.220.201:9000, queries: 12, QPS: 11.265, RPS: 525624332.440, MiB/s: 10025.489, result RPS: 11.265, result MiB/s: 0.001.
0.000% 0.068 sec.
10.000% 0.068 sec.
20.000% 0.070 sec.
30.000% 0.072 sec.
40.000% 0.073 sec.
50.000% 0.089 sec.
60.000% 0.098 sec.
70.000% 0.102 sec.
80.000% 0.102 sec.
90.000% 0.102 sec.
95.000% 0.102 sec.
99.000% 0.140 sec.
99.900% 0.140 sec.
99.990% 0.140 sec.
Queries executed: 33 (1650.000%).
三、clickhouse compressor
1、语法
[root@19c01 clickhouse]# clickhouse compressor --help
Usage: clickhouse [options] < INPUT > OUTPUT
Usage: clickhouse [options] INPUT OUTPUT
Allowed options:
-h [ --help ] produce help message
--input INPUT input file
--output OUTPUT output file
-d [ --decompress ] decompress
--offset-in-compressed-file arg (=0) offset to the compressed block (i.e. physical file offset)
--offset-in-decompressed-block arg (=0) offset to the decompressed block (i.e. virtual offset)
-b [ --block-size ] arg (=1048576) compress in blocks of specified size
--hc use LZ4HC instead of LZ4
--zstd use ZSTD instead of LZ4
--deflate_qpl use deflate_qpl instead of LZ4
--codec arg use codecs combination instead of LZ4
--level arg compression level for codecs specified via flags
--none use no compression instead of LZ4
--stat print block statistics of compressed data
--stacktrace print stacktrace of exception
2、测试
压缩
[root@19c01 test]# du -sh *
4.0K bench.sql
26M test.csv
12K wine_data.csv
[root@19c01 test]# clickhouse compressor --input=test.csv --output=out.csv --hc
[root@19c01 test]# du -sh *
4.0K bench.sql
7.0M out.csv
26M test.csv
12K wine_data.csv
解压
[root@19c01 test]# clickhouse compressor --input=out.csv --output=test2.csv --decompress
[root@19c01 test]# du -sh *
4.0K bench.sql
7.0M out.csv
26M test2.csv
26M test.csv
12K wine_data.csv
四、clickhouse format
1、语法
[root@19c01 test]# clickhouse format --help
Usage: clickhouse [options] < query
Allowed options:
--query arg query to format
-h [ --help ] produce help message
--hilite add syntax highlight with ANSI terminal escape sequences
--oneline format in single line
-q [ --quiet ] just check syntax, no output on success
-n [ --multiquery ] allow multiple queries in the same file
--obfuscate obfuscate instead of formatting
--backslash add a backslash at the end of each line of the formatted query
--allow_settings_after_format_in_insert Allow SETTINGS after FORMAT, but note, that this is not always safe
--seed arg seed (arbitrary string) that determines the result of obfuscation
--max_query_size arg The maximum number of bytes of a query string parsed by the SQL parser. Data in the VALUES clause of INSERT queries is
processed by a separate stream parser (that consumes O(1) RAM) and not affected by this restriction.
--max_parser_depth arg Maximum parser depth (recursion depth of recursive descend parser).
2、测试
[root@19c01 test]# clickhouse format --hilite --query "create table winner engine=MergeTree() order by col1 as select * from table";
CREATE TABLE winner
ENGINE = MergeTree
ORDER BY col1 AS
SELECT *
FROM table
很优秀的工具
五、clickhouse-obfuscator
混淆器,它读取输入表并生成输出表,输出表保留输入的一些属性,但包含不同的数据。它允许发布几乎真实的生产数据以供基准测试使用
1、语法
root@19c01 test]# clickhouse obfuscator --help
Simple tool for table data obfuscation.
It reads input table and produces output table, that retain some properties of input, but contains different data.
It allows to publish almost real production data for usage in benchmarks.
Options:
--help produce help message
-S [ --structure ] arg structure of the initial table (list of column and type names)
--input-format arg input format of the initial table data
--output-format arg default output format
--seed arg seed (arbitrary string), must be random string with at least 10 bytes length; note that a seed for each column is derived
from this seed and a column name: you can obfuscate data for different tables and as long as you use identical seed and
identical column names, the data for corresponding non-text columns for different tables will be transformed in the same
way, so the data for different tables can be JOINed after obfuscation
--limit arg if specified - stop after generating that number of rows; the limit can be also greater than the number of source dataset
- in this case it will process the dataset in a loop more than one time, using different seeds on every iteration,
generating result as large as needed
--silent arg (=0) don't print information messages to stderr
--save arg save the models after training to the specified file. You can use --limit 0 to skip the generation step. The file is
using binary, platform-dependent, opaque serialization format. The model parameters are saved, while the seed is not.
--load arg load the models instead of training from the specified file. The table structure must match the saved file. The seed
should be specified separately, while other model parameters are loaded.
--order arg (=5) order of markov model to generate strings
--frequency-cutoff arg (=5) frequency cutoff for markov model: remove all buckets with count less than specified
--num-buckets-cutoff arg (=0) cutoff for number of different possible continuations for a context: remove all histograms with less than specified
number of buckets
--frequency-add arg (=0) add a constant to every count to lower probability distribution skew
--frequency-desaturate arg (=0) 0..1 - move every frequency towards average to lower probability distribution skew
--determinator-sliding-window-size arg (=8) size of a sliding window in a source string - its hash is used as a seed for RNG in markov model
Example:
clickhouse --seed "$(head -c16 /dev/urandom | base64)" --input-format TSV --output-format TSV --structure 'CounterID UInt32, URLDomain String, URL String, SearchPhrase String, Title String' < stats.tsv
六、clickhouse-odbc
简单的 HTTP 服务器,其工作方式类似于 ODBC 驱动程序的代理。主要动机是 ODBC 实现中可能存在段错误或其他错误,这可能会导致整个 clickhouse-server 进程崩溃。
该工具通过 HTTP 工作,而不是通过管道、共享内存或 TCP,因为:
- 实现起来更简单
- 调试起来更简单
- jdbc-bridge可以用同样的方式实现
这个命令没有空格版本
1、语法
[root@19c01 test]# clickhouse-odbc-bridge --help
usage: clickhouse-odbc-bridge --http-port <port>
HTTP-proxy for odbc requests
--http-port=http-port port to listen
--listen-host=listen-host hostname or address to
listen, default
127.0.0.1
--http-timeout=http-timeout http timeout for
socket, default 180
--max-server-connections=max-server-connections max connections to
server, default 1024
--keep-alive-timeout=keep-alive-timeout keepalive timeout,
default 30
--http-max-field-value-size=http-max-field-value-size max http field value
size, default 1048576
--log-level=log-level sets log level, default
info
--log-path=log-path log path for all logs,
default console
--err-log-path=err-log-path err log path for all
logs, default no
--stdout-path=stdout-path stdout log path,
default console
--stderr-path=stderr-path stderr log path,
default console
--help produce this help
message
--daemon Run application as a
daemon.
--umask=mask Set the daemon's umask
(octal, e.g. 027).
--pidfile=path Write the process ID of
the application to
given file.
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




