Olga Papaemmanouil

Assistant Professor
Department of Computer Science
Brandeis University

Email: {olga}@cs.brandeis.edu
Phone: +1-781-736-2716
Fax: +1-781-736-2741
Address: Department of Computer Science,  MS 018
415 South St, Waltham, 02454, MA, USA
Volen 139

[Home] [Teaching] [Publications] [Research] [Service]

  • Learning-based Cost and Performance Management for Cloud Databases. This is one of the first projects to leverage machine learning techniques in order to offer cost and performance management services for cloud databases. Our solutions tackle the task of resource provisioning and workload management (query dispatching to available virtual machines (VMs) and query scheduling on these VMs) aiming to minimize the monetary cost of executing database workloads on the cloud while respecting application-defined performance goals (SLAs). Our work uses a number of machine learning technique aiming to capture the complex interplay between  cost and performance. We rely on supervised learning to automatically learn decision models and recommend strategies for VM provisioning and  scheduling batch workloads.  We also use reinforcement learning to offer low cost online scheduling  and resource scaling up/down solutions that have the ability to automatically adapt to query arrival rates and resource availability. In contrast with existing work, our approach is decoupled from notoriously inaccurate performance prediction models. Our solutions are the first to generate strategies customized to  applications workload characteristics and performance goals.  This work is supported by an NSF Career Award.

        Related publications

  • A Learning-based Service for Cost and Performance Management of Cloud Databases (Demonstration), Ryan Marcus, Sofiya Semenova, Olga Papaemmanouil, In  Proceedings of 33rd  IEEE International Conference on Data Engineering (ICDE  2017). [pdf]

  • Releasing Cloud Databases from the Chains of Predictions Models. Ryan Marcus, Olga Papaemmanouil. In Proceedings of the 8th Biennial Conference in Innovative Data Systems Research (CIDR 2017). [pdf]

  • WiSeDB: A Learning-based  Workload Management Advisor for Cloud Databases, Ryan Marcus, Olga Papaemmanouil. In Proceedings of the Very Large Databases Endowment (PVLDB 2016). Volume 9, Issue 10, pages 780-791. [pdf]

  • Workload Management for Cloud Databases via Machine Learning, Ryan Marcus, Olga Papaemmanouil. In Proceedings of 7th International Workshop on Cloud Data Management (CloudDM 2016). [pdf]

  • XCloud: Extensible Performance Management for Cloud Data Services (Abstract),  Olga Papaemmanouil. In Proceedings of 7th Biennial Conference on Innovative Data Systems Research (CIDR 2015). [pdf]

  • Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction, Jenny Duggan, Olga Papaemmanouil, Ugur Cetintemel, Eli Upfal. In Proceedings of 17th International Conference on Extending Database Technology (EDBT 2014). [pdf]

  • SLA-driven Workload Management for Cloud Databases, Dimokritos Stamatakis, Olga Papaemmanouil. In Proceedings of 6th International Workshop on Cloud Data Management (CloudDB  2014), [pdf]

  • Supporting Extensible Performance SLAs for Cloud Databases, Olga Papaemmanouil. In Proceedings of the International Workshop on Data Management in the Cloud (DMC 2012). [pdf]

  • Performance Prediction for Concurrent Database Workloads, Jennie Rogers, Ugur Cetintemel, Olga Papaemmanouil, Eli Upfal. In Proceedings of the 30th ACM Special Interest Group on Management of Data (SIGMOD 2011). [pdf]

  • A Generic Auto-Provisioning Framework for Cloud Databases, Jennie Rogers, Olga Papaemmanouil, Ugur Cetintemel, In Proceedings of the 5th International Workshop on Self-Managing Database Systems (SMDB 2010). [pdf]

  • Automatic Interactive Data Exploration. Our work in this area aims to assist users in discovering interesting data sets among big and complex exploration spaces. To achieve that we leverage machine learning techniques to derive insights from huge and complex datasets and automatically steer the user towards data area of interest. Our research proposes a unique "exploration-by-example" approach where the system collects user feedback on strategically selected data samples and uses this feedback to train a user model that predicts user interests. This is an iterative approach: at each round we leverage the user model to identify promising areas to explore and sample further. Our work realizes active learning models and offers new database optimizations in the new setting of interactive data exploration aiming to provide effective exploration (i.e.,  modeling accurately user interests) as well as highly interactive performance (reduce user wait time and exploration overhead).  This work is supported by an NSF award.

        Related publications

  • Interactive Data Exploration via Machine Learning Models, O. Papaemmanouil, Y. Diao, K. Dimitriadou, L. Peng. In Proceedings of IEEE Data Engineering Bulletin (invited paper), Volume 39, Issue 4, pages 21-30, December 2016. [pdf]

  • AIDE: An Active Learning-based Approach for Interactive Data Exploration. Kyriaki Dimitriadou, Olga Papaemmanouil, and Yanlei Diao. IEEE Transactions on Knowledge and Data Engineering (TKDE), Volume 28, Issue 11, pages 2842 - 2856, November 2016. [pdf]

  • CourseNavigator: An Interactive System for Learning Path Exploration, Zhan Li, Olga Papaemmanouil, Georgi Koutrika. In Proceedings of 3rd International Workshop on Exploratory Search in Databases and the Web (ExploreDB  2016). [pdf]

  • AIDE: An Automatic User Navigation Service for Interactive Data Exploration (Demonstration), Yanlei Diao, Kyriaki Dimitriadou, Zhan Li, Wenzhao Liu, Olga Papaemmanouil, Kemi Peng, Liping Peng. In Proceedings of 41st International Conference on Very Large Databases (VLDB 2015) [pdf]

  • Overview of Data Exploration Techniques (Tutorial), Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri. In Proceedings of 34th ACM Special Interest Group in Data Management (SIGMOD 2015). [pdf] [slides] (the slides cover part 1 (User Interaction) and part 2 (Middleware Optimizations))

  • Explore-by-Example: An Automatic Query Steering Framework for Interactive Data Exploration, Kyriaki Dimitriadou, Olga Papaemmanouil, Yanlei Diao. In Proceedings of 33rd ACM Special Interest Group in Data Management (SIGMOD 2014). [pdf]

  • Interactive Data Exploration based on User Relevance Feedback, Kyriaki Dimitriadou, Olga Papaemmanouil, Yanlei Diao. In Proceedings of 9th International Workshop on Self-Managing Databases Systems (SMDB 2014). [pdf]
  • Query Steering for Interactive Data Exploration. Ugur Cetintemel, Mitch Cherniack, Justin DeBrabant, Yanlei Diao, Kyriaki Dimitriadou, Alex Kalinin, Olga Papaemmanouil, Stan Zdonik. In Proceedings of the 6th Biennial Conference in Innovative Data Systems Research (CIDR 2013). [pdf]

  • Devel-Op: Engineering Query Optimizers.  In this project we explore the design, development and evaluation of a development environment that facilitates the engineering of system-specific optimizer designs. Within this context, our group focuses on the design of benchmarking tools for query optimizers. Using statistical analysis tools we identified (a) metrics for assessing the quality of the optimizer (i.e., the quality of the execution plan) and (b) predictors of the optimizer's efficiency (i.e., resources required to generate the execution plan). Both these were implemented in a benchmarking toolkit that assists developers in assessing the quality of an optimizer end-to-end.   This work is supported by an NSF award.

        Related publications

  • OptMark: A Toolkit for Benchmarking Query Optimizers,  Zhan Li, Olga Papaemmanouil, Mitch Cherniack. In Proceedings of 25th ACM International Conference on on Information and Knowledge Management (CIKM  2016).  [pdf] [long version]

  • Devel-Op: An Optimizer Development Environment (Demonstration), Zhibo Peng, Mitch Cherniack, Olga Papaemmanouil. In Proceedings of 30th IEEE International Conference on Data Engineering (ICDE  2014). [pdf]

  • A Development Environment for Query Optimizers, Olga Papaemmanouil, Nga Tran, Mitch Cherniack. In Proceedings of the 3rd International Workshop on Testing Database Systems (DBTest 2010). [pdf]

Past Projects

  • XPORT:  XPORT is a general-purpose infrastructure that provides the core functionalities of large-scale stream processing and dissemination applications. It can be extended to support diverse processing logic, stream types, and performance targets and, given these specifications, it automatically creates and optimizes a data stream acquisition, processing and  overlay network. Its optimization is driven by metric-independent operations, which refine the structure of the overlay network as well as efficiently distribute processing across the network.

  • SemCast: SemCast investigates efficient content-based data filtering and dissemination over conventional multicast channels. SemCast splits input data streams into multiple pieces and spreads the pieces across multiple multicast channels for delivery. This approach eliminates the need for content-based filtering and routing at interior nodes of the overlay.

  • Borealis: Borealis is a distributed stream processing engine developed by Brandeis University, Brown University, and MIT. It deploys a network of cooperating Borealis stream engines, distributes query processing across multiple machines, and maintains integrity and correct operation as the network is dynamically mutated.