迈向 Streaming Warehouse
• Streaming Warehouse API: FLIP-282 [2] 在 Flink SQL 中引入了新的 Delete 和 Update API,它们可以在 Batch 模式下工作。在此基础上,外部存储系统比如 Flink Table Store 可以通过这些新的 API 实现行级删除和更新。同时对 ALTER TABLE 语法进行了增强,包括 ADD/MODIFY/DROP 列、主键和 watermark 的能力,这些增强使得用户更容易维护元数据。
• Batch 性能优化: 在 Flink 1.17 中,批处理作业的执行在性能、稳定性和可用性方面都得到了显着改进。就性能而言,通过策略优化和算子优化,如新的 join-reorder 算法和自适应的本地哈希聚合优化、Hive 聚合函数改进以及混合 shuffle 模式优化,这些改进带来了 26% 的 TPC-DS 性能提升。就稳定性而言,Flink 1.17 预测执行可以支持所有算子,自适应的批处理调度可以更好的应对数据倾斜场景。就可用性而言,批处理作业所需的调优工作已经大大减少。自适应的批处理调度已经默认开启,混合 shuffle 模式现在可以兼容预测执行和自适应批处理调度,同时所需的各种配置都进行了简化。
批处理
预测执行
自适应批处理调度器
混合 Shuffle 模式
• 混合 Shuffle 模式现在支持自适应批调度器和预测执行。
• 混合 Shuffle 模式现在支持重用中间数据,这带来了显着的性能改进。
TPC-DS

SQL Client/Gateway
SQL API
Hive 兼容
流处理
Streaming SQL 语义完善
== Optimized Physical Plan With Advice ==...advice[1]: [WARNING] The column(s): day(generated by non-deterministic function: CURRENT_TIMESTAMP ) can not satisfy the determinism requirement for correctly processing update message('UB'/'UA'/'D' in changelogMode, not 'I' only), this usually happens when input node has no upsertKey(upsertKeys=[{}]) or current node outputs non-deterministic update messages. Please consider removing these non-deterministic columns or making them deterministic by using deterministic functions.
== Optimized Physical Plan With Advice ==...advice[1]: [ADVICE] You might want to enable local-global two-phase optimization by configuring ('table.optimizer.agg-phase-strategy' to 'AUTO').
此外 Flink 1.17 还修复了多个可能影响数据正确性的 plan 优化问题,如:FLINK-29849 [11] , FLINK-30006 [12] , 和 FLINK-30841 [13] 等。
Watermark 对齐增强
Streaming FileSink 扩展
Checkpoint 改进
Checkpoint 耗时 | 增量大小 | 数据回放量 | cpu 使用量 | 网络流量 | |||
| 最大值 | 平均值 | 最大值 | 平均值 | ||||
开启 GIC vs 关闭 GIC | -79.5% | -95% | -86.7% | -39.3% | -46.8% | -53% | -67.7% |
表格-2: 在 WordCount 中开启 GIC 后的开销
| 最大处理性能 | 全量 Checkpoint 大小 | |
| 开启 GIC vs 关闭 GIC | -2% | +30% |
Unaligned Checkpoint (UC) 可以大大提高反压下 Checkpoint 的完成率。之前版本的 UC 会写入过多的小文件,进一步可能会导致 HDFS 的 namenode 负载过高。社区在 1.17 版本中解决了该问题,使 UC 在生产环境中更加可用。
RocksDBStateBackend 升级
1. 支持在 Apple 芯片上构建 FRocksDB Java
2. 通过避免昂贵的 ToString() 操作提高 Compaction Filter 的性能
3. 升级 FRocksDB 的 ZLIB 版本,避免 Memory Corruption
Calcite 升级
其他
PyFlink
性能监控 Benchmark
Task 级别火焰图

通用的令牌机制
升级说明
贡献者列表
https://developer.aliyun.com/article/851771?spm=a2c6h.12873639.article-detail.6.4ff859e2A1PFCU
[2] FLIP-282:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235838061
[6] OutputFormat Sink
https://github.com/apache/flink/blob/release-1.17/flink-core/src/main/java/org/apache/flink/api/common/io/OutputFormat.java
[7] 混合 Shuffle:
https://nightlies.apache.org/flink/flink-docs-release-1.17/zh/docs/ops/batch/batch_shuffle/#hybrid-shuffle
[8] HiveModule:
https://nightlies.apache.org/flink/flink-docs-release-1.17/zh/docs/connectors/table/hive/hive_functions/
[9] PLAN_ADVICE:
https://nightlies.apache.org/flink/flink-docs-release-1.17/zh/docs/dev/table/sql/explain/#explaindetails
[10] NDU(非确定性更新):
https://nightlies.apache.org/flink/flink-docs-release-1.17/zh/docs/dev/table/concepts/determinism/#3-%E6%B5%81%E4%B8%8A%E7%9A%84%E7%A1%AE%E5%AE%9A%E6%80%A7
[11] FLINK-29849:
https://issues.apache.org/jira/browse/FLINK-29849
[12] FLINK-30006:
https://issues.apache.org/jira/browse/FLINK-30006
[13] FLINK-30841:
https://issues.apache.org/jira/browse/FLINK-30841
[14] FLIP-182:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
[15] FLIP-217:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits
[16] FileSink:
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/filesystem/#file-sink
[17] 性能测评文章:
https://mp.weixin.qq.com/s/8662I8knfYTUMQ-3plqUKQ
[18] REST API:
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/rest_api/#jobs-jobid-checkpoints-1
[19] FLINK-30836:
https://issues.apache.org/jira/browse/FLINK-30836
[20] state.backend.rocksdb.memory.fixed-per-tm:
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/#state-backend-rocksdb-memory-fixed-per-tm
[21] Calcite:
https://calcite.apache.org/
[22] CALCITE-4325:
https://issues.apache.org/jira/browse/CALCITE-4325
[23] CALCITE-4352:
https://issues.apache.org/jira/browse/CALCITE-4352
[24] #flink-dev-benchmarks:
https://apache-flink.slack.com/archives/C0471S0DFJ9
[25] Speed Center:
http://codespeed.dak8s.net:8000
[26] Benchmark's wiki:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115511847
[27] FLIP-272:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-272%3A+Generalized+delegation+token+support
[28] FLIP-211:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework
[29] Release Notes:
https://nightlies.apache.org/flink/flink-docs-release-1.17/release-notes/flink-1.17/
往期精选


点击「阅读原文」,查看更多技术内容~







