暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
DBOS: A Proposal for a Data-Centric Operating System 数据库操作系统.pdf
179
20页
0次
2024-03-15
5墨值下载
DBOS: A Proposal for a Data-Centric Operating System
The DBOS Committee
dbos-project@googlegroups.com
Abstract
Current operating systems are complex systems that were designed before today’s computing
environments. This makes it difficult for them to meet the scalability, heterogeneity, availability,
and security challenges in current cloud and parallel computing environments. To address
these problems, we propose a radically new OS design based on data-centric architecture: all
operating system state should be represented uniformly as database tables, and operations on
this state should be made via queries from otherwise stateless tasks. This design makes it
easy to scale and evolve the OS without whole-system refactoring, inspect and debug system
state, upgrade components without downtime, manage decisions using machine learning, and
implement sophisticated security features. We discuss how a database OS (DBOS) can improve
the programmability and performance of many of today’s most important applications and
propose a plan for the development of a DBOS proof of concept.
1 Introduction
Current operating systems have evolved over the last forty years into complex overlapping
code bases [70, 4, 51, 57], which were architected for very different environments than exist
today. The cloud has become a preferred platform, for both decision support and online serving
applications. Serverless computing supports the concept of elastic provision of resources, which is
very attractive in many environments. Machine learning (ML) is causing many applications to be
redesigned, and future operating systems must intimately support such applications. Hardware
is becoming massively parallel and heterogeneous. These “sea changes” make it imperative to
rethink the architecture of system software, which is the topic of this paper.
Mainstream operating systems (OSs) date from the 1980s and were designed for the hardware
platforms of 40 years ago, consisting of a single processor, limited main memory and a small
set of runnable tasks. Today’s cloud platforms contain hundreds of thousands of processors,
heterogeneous computing resources (including CPUs, GPUs, FPGAs, TPUs, SmartNICs, and
so on) and multiple levels of memory and storage. These platforms support millions of active
users that access thousands of services. Hence, the OS must deal with a scale problem of 10
5
or 10
6
more resources to manage and schedule. Managing OS state is a much bigger problem
DBOS committee members in alphabetic order: Michael Cafarella (MIT CSAIL), David DeWitt (MIT CSAIL),
Vijay Gadepally (MIT LLSC), Jeremy Kepner (MIT LLSC), Christos Kozyrakis (Stanford University), Tim Kraska
(MIT CSAIL), Michael Stonebraker (MIT CSAIL), and Matei Zaharia (Stanford University).
1
arXiv:2007.11112v1 [cs.OS] 21 Jul 2020
than 40 years ago in terms of both throughput and latency, as thousands of services must
communicate to respond in near real-time to a user’s click [21, 5].
Forty years ago, there was little thought about parallelism. After all, there was only one
processor. Now it is not unusual to run Map-Reduce or Apache Spark jobs with thousands of
processes using millions of threads [13]. Stragglers creating long-tails inevitably result from sub-
stantial parallelism and are the bane of modern systems: incredibly costly and nearly impossible
to debug [21].
Forty years ago programmers typically wrote monolithic programs that ran to completion and
exited. Now, programs may be coded in multiple languages, make use of libraries of services (like
search, communications, databases, ML, and others), and may run continuously with varying
load. As a result, debugging has become much more complex and involves a flow of control in
multiple environments. Debugging such a network of tasks is a real challenge, not considered
forty years ago.
Forty years ago there was little-to-no-thought about privacy and fraud. Now, GDPR [73]
dictates system behavior for Personally Identifiable Information (PII) on systems that are under
continuous attack. Future systems should build in support for such constructs. Moreover, there
are many cases of bad actors doctoring photos or videos, and there is no chain of provenance to
automatically record and facilitate exposure of such activity.
Machine learning (ML) is quickly becoming central to all large software systems. However,
ML is typically bolted onto the top of most systems as an after thought. Application and system
developers struggle to identify the right data for ML analysis and to manage synchronization,
ordering, freshness, privacy, provenance, and performance concerns. Future systems should
directly support and enable AI applications and AI introspection, including first-order support
for declarative semantics for AI operations on system data.
In our opinion, serverless computing will become the dominant cloud architecture. One
does not need to spin up a virtual machine (VM), which will sit idle when there is no work
to do. Instead, one should use an execution environment like Amazon Lambda. Lambda is an
efficient task manager that encourages one to divide up a user task into a pipeline of several-
to-many subtasks
1
. Resources are allocated to a task when it is running, and no resources are
consumed at other times. In this way, there are no dedicated VMs; instead there is a collection
of short-running subtasks. As such, users only pay for the resources that they consume and
their applications can scale to thousands of functions when needed. We expect that Lambda
will become the dominant cloud environment unless the cloud vendors radically modify their
pricing algorithms. Lambda will cause many more tasks to exist, creating a more expansive
task management problem.
Lastly, “bloat” has wrecked havoc on elderly OSs, and the pathlength of common operations
such as sending a message and reading bytes from a file are now uncompetitively expensive. One
key reason for the bloat is the uncontrolled layering of abstractions. Having a clean, declarative
way of capturing and operating on operating system state can help reduce that layering.
These changed circumstances dictate that system software should be reconsidered. In this
proposal, we explore a radically different design for operating systems that we believe will
scale to support the performance, management and security challenges of modern computing
workloads: a data-centric architecture for operating systems built around clean separation of
1
In this paper, we will use Lambda as an exemplar of any resource allocation system that supports “pay only for
what you use.”
2
of 20
5墨值下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜