DBOS: A Proposal for a Data-Centric Operating System 数据库操作系统.pdf

盖国强

179

20页

0次

2024-03-15

5墨值下载

DBOS: A Proposal for a Data-Centric Operating System

The DBOS Committee

∗

dbos-project@googlegroups.com

Abstract

Current operating systems are complex systems that were designed before today’s computing

environments. This makes it diﬃcult for them to meet the scalability, heterogeneity, availability,

and security challenges in current cloud and parallel computing environments. To address

these problems, we propose a radically new OS design based on data-centric architecture: all

operating system state should be represented uniformly as database tables, and operations on

this state should be made via queries from otherwise stateless tasks. This design makes it

easy to scale and evolve the OS without whole-system refactoring, inspect and debug system

state, upgrade components without downtime, manage decisions using machine learning, and

implement sophisticated security features. We discuss how a database OS (DBOS) can improve

the programmability and performance of many of today’s most important applications and

propose a plan for the development of a DBOS proof of concept.

1 Introduction

Current operating systems have evolved over the last forty years into complex overlapping

code bases [70, 4, 51, 57], which were architected for very diﬀerent environments than exist

today. The cloud has become a preferred platform, for both decision support and online serving

applications. Serverless computing supports the concept of elastic provision of resources, which is

very attractive in many environments. Machine learning (ML) is causing many applications to be

redesigned, and future operating systems must intimately support such applications. Hardware

is becoming massively parallel and heterogeneous. These “sea changes” make it imperative to

rethink the architecture of system software, which is the topic of this paper.

Mainstream operating systems (OSs) date from the 1980s and were designed for the hardware

platforms of 40 years ago, consisting of a single processor, limited main memory and a small

set of runnable tasks. Today’s cloud platforms contain hundreds of thousands of processors,

heterogeneous computing resources (including CPUs, GPUs, FPGAs, TPUs, SmartNICs, and

so on) and multiple levels of memory and storage. These platforms support millions of active

users that access thousands of services. Hence, the OS must deal with a scale problem of 10

or 10

more resources to manage and schedule. Managing OS state is a much bigger problem

∗

DBOS committee members in alphabetic order: Michael Cafarella (MIT CSAIL), David DeWitt (MIT CSAIL),

Vijay Gadepally (MIT LLSC), Jeremy Kepner (MIT LLSC), Christos Kozyrakis (Stanford University), Tim Kraska

(MIT CSAIL), Michael Stonebraker (MIT CSAIL), and Matei Zaharia (Stanford University).

arXiv:2007.11112v1 [cs.OS] 21 Jul 2020

than 40 years ago in terms of both throughput and latency, as thousands of services must

communicate to respond in near real-time to a user’s click [21, 5].

Forty years ago, there was little thought about parallelism. After all, there was only one

processor. Now it is not unusual to run Map-Reduce or Apache Spark jobs with thousands of

processes using millions of threads [13]. Stragglers creating long-tails inevitably result from sub-

stantial parallelism and are the bane of modern systems: incredibly costly and nearly impossible

to debug [21].

Forty years ago programmers typically wrote monolithic programs that ran to completion and

exited. Now, programs may be coded in multiple languages, make use of libraries of services (like

search, communications, databases, ML, and others), and may run continuously with varying

load. As a result, debugging has become much more complex and involves a ﬂow of control in

multiple environments. Debugging such a network of tasks is a real challenge, not considered

forty years ago.

Forty years ago there was little-to-no-thought about privacy and fraud. Now, GDPR [73]

dictates system behavior for Personally Identiﬁable Information (PII) on systems that are under

continuous attack. Future systems should build in support for such constructs. Moreover, there

are many cases of bad actors doctoring photos or videos, and there is no chain of provenance to

automatically record and facilitate exposure of such activity.

Machine learning (ML) is quickly becoming central to all large software systems. However,

ML is typically bolted onto the top of most systems as an after thought. Application and system

developers struggle to identify the right data for ML analysis and to manage synchronization,

ordering, freshness, privacy, provenance, and performance concerns. Future systems should

directly support and enable AI applications and AI introspection, including ﬁrst-order support

for declarative semantics for AI operations on system data.

In our opinion, serverless computing will become the dominant cloud architecture. One

does not need to spin up a virtual machine (VM), which will sit idle when there is no work

to do. Instead, one should use an execution environment like Amazon Lambda. Lambda is an

eﬃcient task manager that encourages one to divide up a user task into a pipeline of several-

to-many subtasks

. Resources are allocated to a task when it is running, and no resources are

consumed at other times. In this way, there are no dedicated VMs; instead there is a collection

of short-running subtasks. As such, users only pay for the resources that they consume and

their applications can scale to thousands of functions when needed. We expect that Lambda

will become the dominant cloud environment unless the cloud vendors radically modify their

pricing algorithms. Lambda will cause many more tasks to exist, creating a more expansive

task management problem.

Lastly, “bloat” has wrecked havoc on elderly OSs, and the pathlength of common operations

such as sending a message and reading bytes from a ﬁle are now uncompetitively expensive. One

key reason for the bloat is the uncontrolled layering of abstractions. Having a clean, declarative

way of capturing and operating on operating system state can help reduce that layering.

These changed circumstances dictate that system software should be reconsidered. In this

proposal, we explore a radically diﬀerent design for operating systems that we believe will

scale to support the performance, management and security challenges of modern computing

workloads: a data-centric architecture for operating systems built around clean separation of

In this paper, we will use Lambda as an exemplar of any resource allocation system that supports “pay only for

what you use.”

of 20

5墨值下载

dbos

文档被以下合辑收录

数据库简史（共33篇）

关于我的新书《数据库简史》中引用的一些资料，统一辑录在这里。

关注

文档被以下合辑收录

评论