报告题目1：Data Management in Microservices: State of the Practice, Challenges, and Res-earch Directions
报告题目2：Fast Search-By-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests
报告人单位：University of Copenhagen
报告人简介：Yongluan Zhou is a professor in the Department of Computer Science (DIKU) at the University of Copenhagen, where he leads the Data Management Systems Lab (DMS Lab). He also heads the MSc in Computer Science at DIKU. Prior to his current position, he worked as an Associate Professor at the University of Southern Denmark (SDU) and as a postdoc at the Ecole Polytechnique Fédérale de Lausanne (EPFL). He earned his Ph.D. in Computer Science from the National University of Singapore (NUS). His research interests span database systems and distributed systems, with his recent focus being on scalable event driven systems. He has authored more than 80 peer-reviewed research articles in internationl journals and conference proceedings. He serves on the EDBT Executive Board and the SSDBM Steering Committee and has chaired various international conferences, including DEBS 2022, SSDBM 2022, and EDBT 2020. He has also served on the Program Committees of many other international conferences, including SIGMOD, VLDB, ICDE, EDBT, CIKM, and SSDBM.
报告摘要1：Microservices have become a popular architectural style for data-driven applications, given their ability to functionally decompose an application into small and autonomous services to achieve scalability, strong isolation, and specialization of database systems to the workloads and data formats of each service. Despite the accelerating industrial adoption of this architectural style, an investigation of the state of the practice and challenges practitioners face regarding data management in microservices is lacking. To bridge this gap, this talk conducted a systematic literature review of representative articles reporting the adoption of microservices, analyzed a set of popular open-source microservice applications, and conducted an online survey to cross-validate the findings of the previous steps with the perceptions and experiences of over 120 experienced practitioners and researchers.
Through this process, researchers are able to categorize the state of practice of data management in microservices and observe several foundational challenges that cannot be solved by software engineering practices alone, but rather require system-level support to alleviate the burden imposed on practitioners. This talk discusses the shortcomings of state-of-the-art database systems regarding microservices and we conclude by devising a set of features for microservice-oriented database systems.
报告摘要2：The vast amounts of data collected in various domains pose great challenges to modern data exploration and analysis. To find “interesting” objects in large databases, users typically define a query using positive and negative example objects and train a classification model to identify the objects of interest in the entire data catalog. However, this approach requires a scan of all the data to apply the classification model to each instance in the data catalog, making this method prohibitively expensive to be employed in large-scale databases serving many users and queries interactively. This talk proposes a novel framework for such search-by-classification scenarios that allow users to interactively search for target objects by specifying queries through a small set of positive and negative examples. Unlike previous approaches, the proposed framework can rapidly answer such queries at low cost without scanning the entire database. The proposed framework is based on an index-aware construction scheme for decision trees and random forests that transforms the inference phase of these classification models into a set of range queries, which in turn can be efficiently executed by leveraging multidimensional indexing structures. The experiments show that queries over large data catalogs with hundreds of millions of objects can be processed in a few seconds using a single server, compared to hours needed by classical scanning-based approaches.