What's Vertica Database?

原创 Quanwen Zhao 2021-12-13

8483

Preface
Origin
History
Architecture Basics
Enterprise Mode vs Eon Mode
Learning Path

Preface

I recalled a simple and relaxed scene chatting with my friend in his residence a couple of years ago (seems like in 2018), at that time he was responsible for maintaining some Oracle Databases of China Mobile that is one of most popular and famous three telecom operators in China. I'm not sure what's the reason (probably we discussed something about data analysis) that he unconsciously mentioned a data analysis software that named HP Vertica. Yes, I heard about this software from him for the first time. Perhaps this thing has passed over the years.

Earlier this month occasionally I discovered the free download resource about Vertica 11.0.x Community Edition from modb (as well-known as I'm the big fans of modb). Since the version is 11.0.x then it should be the latest version at least I think. The primary aim is to drive and encouraged us to test, learn and communicate with each other in order to expanding and improving entire Vertica ecosystem. Based on those links I successively opened the official website and documentation page, the following is 2 number of relevant screenshots.

[Back to TOC]

Origin

It appears the English word "Vertical" immediately in my brain when I've heard of Vertica Database for the first time. Of course, on one hand they just have a bit difference between "Vertical" and "Vertica", on the other hand the antonym/opposite English word of "Vertical" is "Horizontal". Naturally I can imagine that there exists the concept of a table on the RDBMS (Relational DataBase Management System) in IT database industry due to the pairs of these two English words because I'm an Oracle DBA in China. Typically a two dimension table is comprised of multiple of rows and columns, the row is horizontal in the direction from left to right and the column is vertical in the direction from up to down. It's the usual concept that all of data from the table has been stored with the row format by RDBMS, e.g. Oracle, MySQL, Microsoft SQL Server, PostgreSQL and other relational databases.

Nevertheless the data from the table in Vertica is stored with the column format. I've mentioned previously that the column of the table is vertical. I think that it's most possible and important reason about the origin of Vertica database as well. You know, although, Vertica official didn't say like that. Thus it only represents my opinion rather than official standpoint of Vertica.

[Back to TOC]

History

It's almost hard to find out about the history introduction of Vertica Database, however, which's from the official website or official documentation home page. So I searched the keyword Vertica on Wikipedia (The Free Encyclopedia). Amazing! Here's the full content about Vertica. You can spend a little time reading it. Hence just quoted a brief introduction as follows:

Vertica Systems is an analytic database management software company. Vertica was founded in 2005 by database researcher Michael Stonebraker and Andrew Palmer. Palmer was the founding CEO.

Lynch joined as chairman and CEO in 2010 and was responsible for Vertica's acquisition by Hewlett Packard in March 2011. The acquisition expanded the HP Software portfolio for enterprise companies and the public sector group. As part of the Micro Focus-Hewlett Packard Enterprise merger, Vertica joined Micro Focus in September, 2017.

Of course, here's a related screenshot.

If you check Downloads, Patches and Software/Licenses in underneath of the tab with Services & Support on official website it'll automatically skip to the MicroFocus who is the parent company of Vertica currently.

Now I have to mention that the company founder Ph.D. Michael Stonebraker (2014 Turing Award) who also once created Ingres and Postgres, you're able to see the section C-Store and Vertica about his personal introduction on Wikipedia. The following is a little quotation (you can find the full dissertation from here as well):

C-Store and Vertica

In the C-Store project, started in 2005, Stonebraker, along with colleagues from Brandeis, Brown, MIT, and University of Massachusetts Boston, developed a parallel, shared-nothing column-oriented DBMS for data warehousing. By dividing and storing data in columns, C-Store is able to perform less I/O and get better compression ratios than conventional database systems that store data in rows.

In 2005, Stonebraker co-founded Vertica to commercialize the technology behind C-Store.

I've also just noticed that the version of Vertica that started from 8.1.x to 11.0.x, the more earlier version has never been cited from the official documentation page. Here's the corresponding screenshot:

Even if the official documentation of 8.1.x has been published on 6/2/2021, so it's difficult to say that exactly which year the 8.1.x has been released on but absolutely not on 2021 because the entire 2021 Vertica never probably released the full versions from 8.1.x to 11.0.x. Taking a look at this screenshot captured by me as below:

But about the history introduction of Oracle Database is very continuous and centralized whatever it's from Wikipedia or its Official Documentation. Meanwhile here's some critical events of Vertica Database that is quoted from Wikipedia.

In late 2011, the Vertica Analytics Platform Community Edition was made available for free with certain limitations, such as a maximum of one terabyte of raw data, three-node (servers) cluster, and community-based support.

In 2018, Vertica introduced Vertica in Eon Mode, a separation of compute and storage architecture, which is available on Amazon, Google, and Microsoft clouds.

Recent versions of Vertica, 10.1.1 and 11 have introduced Docker containerization and Kubernetes Statefulsets support, with tested containerized versions now released on Dockerhub and Github. This allows more automated deployment, testing, elastic scaling, and broader deployment support.

At the Vertica Unify event in July, 2021, Vertica Accelerator, a SAAS (Software as a Service) version of Vertica, was announced initially on Amazon AWS only. Vertica Accelerator differs from similar analytics database services like Snowflake, Amazon Redshift, and Google BigQuery.

[Back to TOC]

Architecture Basics

There's three my most favorite architecture basics of Vertica, it includes Column Storage, Compression and Clustering. Thus I've just quoted some brief introduction in proper order from official documentation 11.0.x. But you can see more basics from here.

Vertica stores data in a column format so it can be queried for best performance. Compared to row-based storage, column storage reduces disk I/O making it ideal for read-intensive workloads. Vertica reads only the columns needed to answer the query. For example:

SELECT avg(price) FROM tickstore WHERE symbol = 'AAPL' and date = '5/31/13';

The column store usually reads only three columns but the row store reads all columns. The following is a nice image about unsorted-data and sorted-data (notice: the three columns marking with green color rectangle box).

Using compression, Vertica stores more data and uses less hardware than other databases. Vertica uses several different compression methods and automatically chooses the best one for the data being compressed.
Compression allows a column store to occupy substantially less storage than a row store. In a column store, every value stored in a projection column has the same data type. This greatly facilitates compression, particularly in sorted columns. In a row store, each value of a row can have a different data type, resulting in a much less effective use of compression. The efficient storage methods that Vertica uses for your database also lets you maintain more historical data in physical storage.

The following shows compression using sorting and cardinality:

Clustering supports scaling and redundancy. You can scale your database cluster by adding more nodes, and you can improve reliability by distributing and replicating data across your cluster.

Column data gets distributed across nodes in a cluster, so if one node becomes unavailable the database continues to operate. When a node is added to the cluster, or comes back online after being unavailable, it automatically queries other nodes to update its local data.

Here's a relevant picture of three nodes that balanced distributing data (A, B and C) in a cluster.

[Back to TOC]

Enterprise Mode vs Eon Mode

We can create a Vertica database in one of two modes: Enterprise Mode or Eon Mode. These two number of modes allow us to control how and where the database stores the data. Centainly, each mode has its own obvious advantages.

In an Enterprise Mode of Vertica database, the physical architecture is designed to move data as close as possible to computing resources. The data in an Enterprise Mode database is spread among all of the nodes within the database. Ideally, the data is evenly distributed to ensure that each node has an equal amount of the analytic workload. The following picture reveals its architecture mode.

The architecture of Eon Mode is distinctly different from enterprise mode, it separates the computational resources from the storage layer of your database. This separation gives you the ability to store your data in a single location. You can elastically vary the number of nodes connected to that location according to your computational needs. Adjusting the size of your cluster won't interrupt analytic workloads. Yes, a real innovation thought plays an import role in the design of Vertica database. Its architecture concept diagram is as below:

Actually according to the relative volume of computer nodes and communal storage we can imagine that there are two types of Eon modes (Heavy Storage, Minimal Compute and Small Storage, Heavy Compute) as follows.

[Back to TOC]

Learning Path

As an Oracle DBA with many years working experience I think that one of the most important learning paths about Vertica database is still to read official documentation and solve your confusion by Vertica Forum. Here's two number of screenshots.