在11gR2发布之前,已经从一些Beta测试中知道,新版本中包含了列式压缩(Columnar Compression)的新特性,这一特性是被期待已久的,Sybase IQ产品尤以列式存储而著称,现在Oracle也在列式压缩上引入了类似技术。
显然对于具有大量重复值的Column,按照列来压缩可以极大缩减存储空间的耗用,这是Oracle在压缩技术上的又一增强,也是为了在细粒度解决用户问题的一个改进。
我们先看看此前Beta文档(文档号E10881-01)中的描述:
Columnar compression is a new feature being introduced in Oracle Database 11g
Release 2 (11.2). Using columnar compression, data can be compressed during bulk
load operations.
使用列式压缩,在批量加载数据时,可以将数据压缩。
During the load process, data is transformed into a column-oriented
format and then compressed using a set of new compression algorithms. Different
levels of compression can be specified. Oracle Database selects the compression
algorithm according to the desired level of compression. Data transformation is
handled by Oracle Database internally and no application changes are required to use
columnar compression.
这一改变对应用透明,Oracle可以根据用户的定义来决定压缩级别和算法,选择列式压缩的数据将以基于列的格式存储。
Columnar compression provides significantly higher compression ratios as compared
to previously available compression features of Oracle Database. It also has a higher
CPU cost both at load time and at access time. Due to higher compression ratios, data
consumes less space in the buffer cache, providing I/O benefits that offset the CPU
costs.
列式压缩的压缩率可能相当高,这也是以CPU换IO的做法之一,在节省了IO的同时会消耗更多的CPU资源。
The overhead is relatively low for full table scans than for single row access.
All database operations (SQL queries, indexing, DMLs, DDLs, as well as features like
partitioning) work transparently against tables compressed using columnar
compression. This makes it possible to grow databases to large sizes and to store and
query that data at a fraction of cost.
在正式发布的文档上,也可以找到相应的描述:
During bulk-load operations, Oracle Database will compress the data
being loaded when it is beneficial for performance. For small segments
with very little data, no compression will occur even if you specify
it. Oracle Database handles data transformation and compression
internally and requires no application changes to use compression.
Additional compression technologies, including hybrid columnar
compression, are available with Oracle Exadata Storage Server. See the
Oracle Exadata documentation for more information.
在这里,Oracle声明,列式压缩的特性,仅在Exadata存储上可用,这一特性显然对数据仓库大有益处。
在创建数据表时,可以通过增加新的COMPRESS FOR QUERY或者COMPRESS FOR ARCHIVE来启用混合列式压缩(hybrid columnar compression)特性,文档说明如下:
When you specify
With hybrid columnar compression, data can be compressed during bulk
load operations. During the load process, data is transformed into a
column-oriented format and then compressed. Oracle Database uses a
compression algorithm appropriate for the level you specify. In
general, the higher the level, the greater the compression ratio.
Hybrid columnar compression can result in higher compression ratios, at
a greater CPU cost. Therefore, this form of compression is recommended
for data that is not frequently updated.
显然对于具有大量重复值的Column,按照列来压缩可以极大缩减存储空间的耗用,这是Oracle在压缩技术上的又一增强,也是为了在细粒度解决用户问题的一个改进。
我们先看看此前Beta文档(文档号E10881-01)中的描述:
Columnar compression is a new feature being introduced in Oracle Database 11g
Release 2 (11.2). Using columnar compression, data can be compressed during bulk
load operations.
使用列式压缩,在批量加载数据时,可以将数据压缩。
During the load process, data is transformed into a column-oriented
format and then compressed using a set of new compression algorithms. Different
levels of compression can be specified. Oracle Database selects the compression
algorithm according to the desired level of compression. Data transformation is
handled by Oracle Database internally and no application changes are required to use
columnar compression.
这一改变对应用透明,Oracle可以根据用户的定义来决定压缩级别和算法,选择列式压缩的数据将以基于列的格式存储。
Columnar compression provides significantly higher compression ratios as compared
to previously available compression features of Oracle Database. It also has a higher
CPU cost both at load time and at access time. Due to higher compression ratios, data
consumes less space in the buffer cache, providing I/O benefits that offset the CPU
costs.
列式压缩的压缩率可能相当高,这也是以CPU换IO的做法之一,在节省了IO的同时会消耗更多的CPU资源。
The overhead is relatively low for full table scans than for single row access.
All database operations (SQL queries, indexing, DMLs, DDLs, as well as features like
partitioning) work transparently against tables compressed using columnar
compression. This makes it possible to grow databases to large sizes and to store and
query that data at a fraction of cost.
在正式发布的文档上,也可以找到相应的描述:
During bulk-load operations, Oracle Database will compress the data
being loaded when it is beneficial for performance. For small segments
with very little data, no compression will occur even if you specify
it. Oracle Database handles data transformation and compression
internally and requires no application changes to use compression.
No special installation is required to configure this feature.
However, in order to use this feature, the database compatibility
parameter must be set to 11.2.0 or higher.
Note:
Additional compression technologies, including hybrid columnar
compression, are available with Oracle Exadata Storage Server. See the
Oracle Exadata documentation for more information.
在这里,Oracle声明,列式压缩的特性,仅在Exadata存储上可用,这一特性显然对数据仓库大有益处。
在创建数据表时,可以通过增加新的COMPRESS FOR QUERY或者COMPRESS FOR ARCHIVE来启用混合列式压缩(hybrid columnar compression)特性,文档说明如下:
When you specify
COMPRESS FOR QUERY or COMPRESS FOR ARCHIVE, you enable hybrid columnar compression.With hybrid columnar compression, data can be compressed during bulk
load operations. During the load process, data is transformed into a
column-oriented format and then compressed. Oracle Database uses a
compression algorithm appropriate for the level you specify. In
general, the higher the level, the greater the compression ratio.
Hybrid columnar compression can result in higher compression ratios, at
a greater CPU cost. Therefore, this form of compression is recommended
for data that is not frequently updated.
COMPRESS FOR QUERY is useful in data warehousing environments. Valid values are LOW and HIGH, with HIGH providing a higher compression ratio. The default is HIGH.
COMPRESS FOR ARCHIVE uses higher compression ratios than COMPRESS FOR QUERY, and is useful for compressing data that will be stored for long periods of time. Valid values are LOW and HIGH, with HIGH providing the highest possible compression ratio. The default is LOW.
新的语法结构是这样的:
{ COMPRESS [ BASIC
| FOR { OLTP
| { QUERY | ARCHIVE } [ LOW | HIGH ]
}
]
| NOCOMPRESS
}
同样这个错误提示让很多人为之一震,Exadata据说目前国内尚未有用户引入:
ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage
Cause: An attempt was made to use hybrid columnar compression on unsupported storage.
Action: Create this table in a tablespace residing on Exadata storage or use a different compression type.
那么也就是说,这一吸引人的特性离我们还有点距离。
-The End-
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




