个人简介
2002和2005年于北京大学分别获得学士和硕士学位,于2011年在新加坡国立大学获得博士学位,主要的研究方向包括:分布式数据、大数据处理、人工智能驱动的数据分析等。在数据库领域顶级/重要学术期刊ACM Computing Survey、The VLDB Journal (VLDBJ)、IEEE Transactions on Knowledge and Data Engineering (TKDE)等和国际会议SIGMOD、VLDB、ICDE、SIGIR等发表论文60多篇。大数据并行处理框架epiC的论文被评选为VLDB 2014最佳论文奖,分布式数据库BestPeer++论文获ICDE 2012最佳论文提名。论文《Distributed data management using MapReduce》作为滑铁卢大学、雅典大学等大数据课程教材。申请人是多个国际知名会议(VLDB2010, ICDE2011,CIKM2011, VLDB2014, ICDE2014, SIGMOD2014, SIGMOD2015, VLDB2015,VLDB2016, VLDB2017,CIKM2017, ICDE 2018, VLDB2018, VLDB2019, VLDB 2022, KDD 2021, KDD 2022)的程序委员会委员,是VLDB 2014会议的Web Chair,APWeb 2015 的Tutorial Chair, APWeb 2011特邀报告学者。申请人基于epiC开发的yzStack大数据平台已经在浙江省财政厅、南方电网超高压、杭州市海关等项目上得到应用。大数据相关研究成果获得了2016年教育部科技进步奖一等奖(4/10), 2019年电子学会科技进步特等奖(6/15), 2020年电子学会科技进步一等奖(3/15), 2020年教育部科技进步奖一等奖(9/10)。2020获得国家万人计划青年拔尖人才。
教学与课程
2018-2019 大数据分析
2017-2018 操作系统原理,操作系统实践
2016-2017 操作系统原理,操作系统实践
2015-2016 操作系统原理
2014-2015 操作系统原理
2013-2014 操作系统原理
研究成果
伍赛.基于微博的众包问答系统信息采集方法.发明专利,.2016:2/6
伍赛.一种基于内容环境增强的用户事件相关度计算方法.发明专利,.2017:4/6
伍赛.一种分布式网络信息结构化处理方法.发明专利,.2017:2/5
期刊论文
-
Zhifei Pang, Sai Wu, Haichao Huang, Zhouzhenyan Hong, Yuqing Xie: AQUA+: Query Optimization for Hybrid Database-MapReduce System. Knowl. Inf. Syst. 63(4): 905-938 (2021)
-
Sai Wu, Zhifei Pang, Gang Chen, Yunjun Gao, Cenjiong Zhao, Shili Xiang: NEIST: A Neural-Enhanced Index for Spatio-Temporal Queries. IEEE Trans. Knowl. Data Eng. 33(4): 1659-1673 (2021)
-
Jingtian Zhang, Lidan Shou, Sai Wu, Gang Chen, Ke Chen: A two-phase approach for unexpected pattern mining. Expert Syst. Appl. 141 (2020)
-
Dongxiang Zhang, Rui Cao, Sai Wu: Information fusion in visual question answering: A Survey. Inf. Fusion 52: 268-280 (2019)
-
Zunlei Feng, Zhenyun Yu, Yongcheng Jing, Sai Wu, Mingli Song, Yezhou Yang, Junxiao Jiang: Interpretable Partitioned Embedding for Intelligent Multi-item Fashion Outfit Composition. ACM Trans. Multim. Comput. Commun. Appl. 15(2s): 61:1-61:20 (2019)
-
Xinyuan Luo, Sai Wu, Wei Wang, Lidan Shou: Tuning the granularity of parallelism for distributed graph processing. Distributed and Parallel Databases 35(2): 117-148 (2017)
-
Xiaoling Gu, Sai Wu, Pai Peng, Lidan Shou, Ke Chen, Gang Chen:CSIR4G: An effective and efficient cross-scenario image retrieval model for glasses. Inf. Sci. 417: 310-327 (2017)
-
Dongxiang Zhang, Long Guo, Liqiang Nie, Jie Shao, Sai Wu, Heng Tao Shen:Targeted Advertising in Public Transportation Systems with Quantitative Evaluation. ACM Trans. Inf. Syst. 35(3): 20:1-20:29 (2017)
-
Dawei Jiang, Sai Wu, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Jun Xu:epiC: an extensible and scalable system for processing Big Data. VLDB J. 25(1): 3-26 (2016)
-
Gang Chen, Sai Wu, Yuan Wang: The Evolvement of Big Data Systems: From the Perspective of an Information Security Application. Big Data Research 2(2): 65-73
-
Sai Wu, Gang Chen, Ke Chen, Feng Li, Lidan Shou:HM: A Column-Oriented MapReduce System on Hybrid Storage. IEEE Trans. Knowl. Data Eng. 27(12): 3304-3317 (2015)
-
Dingyu Yang, Jian Cao, Sai Wu, Jie Wang:Progressive online aggregation in a distributed stream system. Journal of Systems and Software 102: 146-157 (2015)
-
Xianke Zhou, Sai Wu, Gang Chen, Lidan Shou: kNN processing with co-space distance in SoLoMo systems. Expert Syst. Appl. 41(16): 6967-6982 (2014)
-
Xianke Zhou, Sai Wu, Chun Chen, Gang Chen, Shanshan Ying: Real-time recommendation for microblogs. Inf. Sci. 279: 301-325 (2014)
-
Feng Li, Beng Chin Ooi, M. Tamer Özsu, Sai Wu: Distributed data management using MapReduce. ACM Comput. Surv. 46(3): 31 (2014)
-
Gang Chen, Tianlei Hu, Dawei Jiang, Peng Lu, Kian-Lee Tan, Hoang Tam Vo, Sai Wu: BestPeer++: A Peer-to-Peer BasedLarge-Scale Data Processing Platform. IEEE Trans. Knowl. Data Eng. 26(6): 1316-1331 (2014)
-
Lidan Shou, Sai Wu: Supporting Efficient Social Media Search in Cyber-Physical Web. IEEE Data Eng. Bull. 36(3): 83-90 (2013)
-
Sai Wu, Xiaoli Wang, Sheng Wang, Zhenjie Zhang, Anthony K.H. Tung: K-Anonymity for Crowdsourcing Database. IEEE Transactions on Knowledge and Data Engineering, 15 May 2013 (preprint)
-
Gang Chen, Sai Wu, Jingbo Zhou, Anthony K.H. Tung: Automatic Itinerary Planning for Traveling Services. IEEE Transactions on Knowledge and Data Engineering, 13 March 2013 (preprint)
-
Gang Chen, Ke Chen, Dawei Jiang, Beng Chin Ooi, Lei Shi, Hoang Tam Vo, Sai Wu: E3: an Elastic Execution Engine for Scalable Data Processing. JIP 20(1): 65-76 (2012)
-
Dalie Sun, Sai Wu, Shouxu Jiang, Jianzhong Li: Approximate Aggregations in Structured P2P Networks. IEEE Trans. Knowl. Data Eng. 23(11): 1748-1752 (2011)
会议论文
-
Jue Wang, Ke Chen, Lidan Shou, Sai Wu, Gang Chen: Effective Slot Filling via Weakly-Supervised Dual-Model Learning. AAAI 2021: 13952-13960
-
Xiu Tang, Sai Wu, Gang Chen, Jinyang Gao, Wei Cao, Zhifei Pang: A Learning to Tune Framework for LSH. ICDE 2021: 2201-2206
-
Junbo Zhao, Mingfeng Ou, Linji Xue, Yunkai Cui, Sai Wu, Gang Chen: Joining datasets via data augmentation in the label space for neural networks. ICML 2021: 12686-12696
-
Hao Huang, Yanan Peng, Ting Gan, Weiping Tu, Ruiting Zhou, Sai Wu: Metric Learning via Penalized Optimization. KDD 2021: 656-664
-
Wentao Hu, Dongxiang Zhang, Dawei Jiang, Sai Wu, Ke Chen, Kian-Lee Tan, Gang Chen: AUDITOR: A System Designed for Automatic Discovery of Complex Integrity Constraints in Relational Databases. SIGMOD Conference 2020: 2697-2700
-
Dongxiang Zhang, Yuyang Nie, Sai Wu, Yanyan Shen, Kian-Lee Tan: Multi-Context Attention for Entity Matching. WWW 2020: 2634-2640
-
Jingtian Zhang, Sai Wu, Zeyuan Tan, Gang Chen, Zhushi Cheng, Wei Cao, Yusong Gao, Xiaojie Feng: S3: A Scalable In-memory Skip-List Index for Key-Value Store. Proc. VLDB Endow. 12(12): 2183-2194 (2019)
-
Zhifei Pang, Sai Wu, Dongxiang Zhang, Yunjun Gao, Gang Chen: NAD: Neural Network Aided Design for Textile Pattern Generation. CIKM 2019: 2081-2084
-
Keyu Yang, Yunjun Gao, Rui Ma, Lu Chen, Sai Wu, Gang Chen: DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces. ICDE 2019: 1346-1357
-
Dongxiang Zhang, Long Guo, Xiangnan He, Jie Shao, Sai Wu, Heng Tao Shen: A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution. ICDE 2018: 713-724
-
Zhifei Pang, Sai Wu, Gang Chen, Ke Chen, Lidan Shou:FlashView: An Interactive Visual Explorer for Raw Data. PVLDB 10(12): 1869-1872 (2017)
-
Sai Wu, Mengdan Zhang, Gang Chen, Ke Chen:A New Approach to Compute CNNs for Extremely Large Images. CIKM 2017: 39-48
-
Sai Wu, Weichao Ren, Chengchao Yu, Gang Chen, Dongxiang Zhang, Jingbo Zhu:Personal recommendation using deep recurrent neural networks in NetEase. ICDE 2016: 1218-1229
-
Xiaoling Gu, Lidan Shou, Pai Peng, Ke Chen, Sai Wu, Gang Chen:iGlasses: A Novel Recommendation System for Best-fit Glasses. SIGIR 2016: 1109-1112
-
Chang Yao, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Sai Wu:Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases. SIGMOD Conference 2016: 1119-1134
-
Yuxin Zheng, Qi Guo, Anthony K. H. Tung, Sai Wu:LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index. SIGMOD Conference 2016: 2023-2037
-
Zhenhua Wang, Ping He, Lidan Shou, Ke Chen, Sai Wu, Gang Chen:Toward the New Item Problem: Context-Enhanced Event Recommendation in Event-Based Social Networks. ECIR 2015: 333-338
-
Sai Wu, Gang Chen, Xianke Zhou, Zhenjie Zhang, Anthony K. H. Tung, Marianne Winslett:PABIRS: A data access middleware for distributed file systems. ICDE 2015: 113-124
-
Xiaoling Gu, Pai Peng, Mengwen Li, Sai Wu, Lidan Shou, Gang Chen:Cross-Scenario Eyeglasses Retrieval via EGYPT Model. ICMR 2015: 463-466
-
Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Sai Wu: epiC: an Extensible and Scalable System for Processing Big Data. PVLDB 7(7): 541-552 (2014) (Best Paper Award)
-
Sai Wu, Gang Chen, Ke Chen, Lidan Shou, Hui Cao, He Bai: yzStack: Provisioning Customizable Solution for Big Data. PVLDB 7(13): 1778-1783 (2014)
-
Pai Peng, Lidan Shou, Ke Chen, Gang Chen, Sai Wu: The knowing camera 2: recognizing and annotating places-of-interest in smartphone photos. SIGIR 2014: 707-716
-
Pai Peng, Lidan Shou, Ke Chen, Gang Chen, Sai Wu: The Knowing Camera: Recognizing Places-of-Interest in SmartPhone Photos (Poster), SIGIR 2013
-
Xianke Zhou, Ke Chen, Sai Wu, Bingbing Zhang: Crowd-Answering System via Microblogging. ICDE (DEMO) 2013
-
Peng Lu, Sai Wu, Lidan Shou, Kian-Lee Tan: An Efficient and Compact Indexing Scheme for Large-scale Data Store. ICDE 2013
-
Sai Wu, Vibhore Kumar, Kun-Lung Wu, Beng Chin Ooi: Parallelizing stateful operators in a distributed stream processing system: how, should you and how much? DEBS 2012: 278-289
-
Chen Liu, Sai Wu, Shouxu Jiang, Anthony K. H. Tung: Cross Domain Search by Exploiting Wikipedia. ICDE 2012: 546-557
-
Gang Chen, Tianlei Hu, Dawei Jiang, Peng Lu, Kian-Lee Tan, Hoang Tam Vo, Sai Wu: BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform. ICDE 2012: 582-593
-
Xuan Liu, Meiyu Lu, Beng Chin Ooi, Yanyan Shen, Sai Wu, Meihui Zhang: CDAS: A Crowdsourcing Data Analytics System CoRR abs/1207.0143: (2012)
-
Xuan Liu, Meiyu Lu, Beng Chin Ooi, Yanyan Shen, Sai Wu, Meihui Zhang: CDAS: A Crowdsourcing Data Analytics System. PVLDB 5(10): 1040-1051 (2012)
-
Sai Wu, F. Li, S. Mehrotra, B. C. Ooi: Query Optimization for Massively Parallel Data Processing. ACM Symposium on Cloud Computing (SOCC). 2011.
-
Yu Cao, Chun Chen, Fei Guo, Dawei Jiang, Yuting Lin, Beng Chin Ooi, Hoang Tam Vo, Sai Wu, Quanqing Xu: ES2: A cloud data storage system for supporting both OLTP and OLAP. ICDE 2011: 291-302
-
Chun Chen, Feng Li, Beng Chin Ooi, Sai Wu: TI: an efficient indexing mechanism for real-time search on tweets. SIGMOD Conference 2011: 649-660
-
Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chin Ooi, Sai Wu: Llama: leveraging columnar storage for scalable join processing in the MapReduce framework. SIGMOD Conference 2011: 961-972
-
Gang Chen, Hoang Tam Vo, Sai Wu, Beng Chin Ooi, M. Tamer Özsu: A Framework for Supporting DBMS-like Indexes in the Cloud. PVLDB 4(11): 702-713 (2011)
-
Jinbao Wang, Sai Wu, Hong Gao, Jianzhong Li, Beng Chin Ooi: Indexing multi-dimensional data in a cloud system. SIGMOD Conference 2010: 591-602
-
Sai Wu, Beng Chin Ooi, Kian-Lee Tan: Continuous sampling for online aggregation over multiple queries. SIGMOD Conference 2010: 651-662
-
Chun Chen, Gang Chen, Dawei Jiang, Beng Chin Ooi, Hoang Tam Vo, Sai Wu, Quanqing Xu: Providing Scalable Database Services on the Cloud. WISE 2010: 1-19
-
Sai Wu, Dawei Jiang, Beng Chin Ooi, Kun-Lung Wu: Efficient B-tree Based Indexing for Cloud Data Processing. PVLDB 3(1): 1207-1218 (2010)
-
Dawei Jiang, Beng Chin Ooi, Lei Shi, Sai Wu: The Performance of MapReduce: An In-depth Study. PVLDB 3(1): 472-483 (2010)
-
Quang Hieu Vu, Mihai Lupu, Sai Wu: SiMPSON: Efficient Similarity Search in Metric Spaces over P2P Structured Overlay Networks. Euro-Par 2009: 498-510
-
Sai Wu, Quang Hieu Vu, Jianzhong Li, Kian-Lee Tan: Adaptive Multi-join Query Processing in PDBMS. ICDE 2009: 1239-1242
-
Sai Wu, Kun-Lung Wu: An Indexing Framework for Efficient Retrieval on the Cloud. IEEE Data Eng. Bull. 32(1): 75-82 (2009)
-
Sai Wu, Shouxu Jiang, Beng Chin Ooi, Kian-Lee Tan: Distributed Online Aggregation. PVLDB 2(1): 443-454 (2009)
-
Dalie Sun, Sai Wu, Jianzhong Li, Anthony K. H. Tung: Skyline-join in distributed databases. ICDE Workshops 2008: 176-181
-
Sai Wu, Jianzhong Li, Beng Chin Ooi, Kian-Lee Tan: Just-in-time query retrieval over partially indexed data on structured P2P overlays. SIGMOD Conference 2008: 279-290
-
Sai Wu, Hong Gao, Bei Yu. Supporting High Dimensional Range Queries in Peer-to-Peer Systems DBISP2P 2007. Austria