
(a) Computer Architecture (b) Datar Architecture
Fig. 2. Computer VS Datar comparison in terms of architecture.
As we all know, the mainstream computer architecture
is divided into five parts, i.e., input, storage, computation,
control and output, in which, computation is the center.
If we look closely, we can find that, BDMS is much the
same as computer, consisting of (data) input, (data) storage,
(data) computation (query/analysis), (data) control (transac-
tion/recovery) and data output (visualization), in which, data
storage is the center. In other words, we can name a computer
Fast Computation Processing System (FCPS). Likewise, the
BDMS can be called as datar, focusing on data. We use Fig. 2
to illustrate the similarities and differences between a computer
and a datar, in terms of architecture. In Fig. 2 (a), five core
components of a computer are shown in separate rectangles,
while in Fig. 2 (b), five corresponding parts are shown. A
computer and a datar share the similar functionalities with
different emphases on computation or data storage.
B. Concept of Datar
Definition (Datar) A datar is a set of coherent softe-
wares/systems based on a unified architecture that can manage
(big) data pluggablly, automatically and intelligently with
specific functionalities, where specific functionalities refer to
input, storage, computation, control and output of the (big)
data. Datar is featured with Interactive Interface Clients,
Pluggable Engines Configuration, Automatic Dataflow on Job
Pipelines and Intelligent Self-driving System Management
based on the unified framework. In this paper, we implement
datar with these features as biggy, a data-storage-centered
solution to datar implementation.
A datar, i.e., a full-function BDMS, consists of five parts,
data input, data storage, data computation, data control and
data output. Compared with the computation-centered com-
puter, a datar is data-centered. We take AsterixDB [10] for
example, which is a new, full-function BDMS. Data input is
how data gets into the system. In AsterixDB, data feed is
a built-in mechanism allowing new data to be continuously
ingested into system from external sources, incrementally
populating the datasets and their associated indexes [16]. Data
storage is how the data is stored in the system and how the
indexes are built. In AstrixDB, data and index are stored based
on LSM structure [17]. Data computation is how to mine
valuable information from stored data. A bunch of methods
can be applied, such as popular in-memory computation
framework Spark on AsterixDB [18]. Besides, the execution
of data processing is also part of data analysis, like Hyracks
[19] in AsterixDB. Data control is how to control data when
it is processed. It is different from the traditional database
systems which have strict ACID properties. Another important
aspect of datar is data output, e.g., visualization. Cloudberry
3
is a research prototype to support interactive analytics and
visualization of large amounts of spatial-temporal data using
AsterixDB. Based on these features of AsterixDB, it is ideal
for us to explain the five main components of BDMS by one
system. The key drawback of taking AsterixDB as BDMS
is that it is a strongly coupled system, which is not suitable
for varied and dynamic requirement in real scenarios when
processing big data. And it is not easy for developers to combo
it with new emerging engines. Datar is proposed to achieve a
unified framework for building your own BDMS more flexible.
C. Contributions of Datar
With the development of Internet services, data contents are
rapidly growing, and we have to face the challenges of han-
dling such big data. Data system research has come into a new
era, which brings the traditional concepts from row-based store
to column-based store, from disk-based query to in-memory
based analysis, and from ACID properties to CAP theorem.
Big data shows great value in real application and challenges
arise. Various tools and systems are proposed and developed
to tackle these challenges on different emphases. In this paper,
we describe the BDMS from a new perspective, the view of a
computer architecture, to propose a unified framework datar.
We focus our attention on the system architecture in BDMS
and break it down into five main components to elaborate.
The envisioned datar is implemented as biggy with favorable
features. The key contributions can be summarized as,
• We review current big data management systems by five
core components and state our contributions.
3
http://cloudberry.ics.uci.edu/
评论