Database Research Group

Faculty of Science, Ontario Tech University

Research Statement

Our group supports research in different aspects of data analytics and query processing platforms. This includes but not limited to:

  • Database technology such as novel indexing structures and query processing techniques
  • Applications of database systems in data science and related fields such as machine learning
  • Applications of statistical learning and machine learning in the context of data processing
  • Novel data models and their unique query processing challenges

Our missions are:

  1. Advance the field of databases and bring benefits to governments and industry partners.
  2. Provide high quality training environment for graduate and undergraduate students.

Current Members

Ken Pu

Associate Professor

Andrei Stoica

Master student, 2018

Michael Valdron

Master student, thesis defended (January 2021).

Jude Arokiam

Master student, 2019

Limin Ma

Master student, 2020

Active Projects

Deep Data Understanding

We apply deep learning to gain machine semantic understanding of open data sets. Open data repositories are known to have limited or missing schema and meta-data, making them very difficult to be used as part of a larger data processing pipeline. Recent advances in text understanding have demonstrated the potential of deep neural networks. We are investigating novel neural network architectures that can generate meaningful semantic understanding on a wide range of public data sets.

Topic Analysis of the DBLP Dataset

DBLP contains over 2.5 million citations in Computer Science and its related fields from 1960 - now. We have developed temporal topic modeling techniques to discover the latent topics, and their temporal trends.

Data Driven Constraint Programming

Constraint (SAT) solvers have been a cornerstone of artificial intelligence, coupled with optimizers, they are able to perform planning and decision making with high degree of automation. In this project, we are interested in the integration of databases with high performant constraint solvers in order to perform data driving constraint solving. The benefit our research brings will be methods and systems for iterative and interactive planning systems that involve both large datasets and complex constraints and objectives.

Publications

2020

  • Organizing Data Lakes for Navigation;Nargesian, Fatemeh and Pu, Ken Q and Zhu, Erkang and Ghadiri Bashardoost, Bahar and Miller, Renee J;Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data;1939--1950;2020Conference
  • A Stream Algebra for Performance Optimization of Large Scale Computer Vision Pipelines;Helala, Mohamed and Qureshi, Faisal Z and Pu, Ken Qian;IEEE Transactions on Pattern Analysis and Machine Intelligence;2020;IEEEJournal
  • Data Driven Relational Constraint Programming;Valdron, Michael and Pu, Ken Q;2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI);156--163;2020;IEEEConference
  • NLP Relational Queries and Its Application;Stoica, Andrei and Pu, Ken Q and Davoudi, Heidar;2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI);395--398;2020;IEEEConference
  • Semantic Data Understanding with Character Level Learning;Mior, Michael J and Pu, Ken Q;2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI);253--258;2020;IEEEConference

2019

  • Data lake management: challenges and opportunities;Nargesian, Fatemeh and Zhu, Erkang and Miller, Renee J and Pu, Ken Q and Arocena, Patricia C;Proceedings of the VLDB Endowment;Volume 12;(12);1986--1989;2019;VLDB EndowmentJournal
  • Trusted relational databases with blockchain: design and optimization;Beirami, Amin and Zhu, Ying and Pu, Ken;Procedia Computer Science;Volume 155;137--144;2019;ElsevierJournal
  • Scalable analysis of open data graphs;Stoica, Andrei and Valdron, Michael and Pu, Ken;2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI);334--341;2019;IEEEConference

2018

  • Table union search on open data;Nargesian, Fatemeh and Zhu, Erkang and Pu, Ken Q and Miller, Renee J;Proceedings of the VLDB Endowment;Volume 11;(7);813--825;2018;VLDB EndowmentJournal
  • Making Open Data Transparent: Data Discovery on Open Data.;Miller, Renee J and Nargesian, Fatemeh and Zhu, Erkang and Christodoulakis, Christina and Pu, Ken Q and Andritsos, Periklis;IEEE Data Eng. Bull.;Volume 41;(2);59--70;2018Journal
  • Towards Optimal Snapshot Materialization to Support Large Query Workload for Append-Only Temporal Databases;Beiraimi, Amin and Pu, Ken and Zhu, Ying;2018 IEEE International Congress on Big Data (BigData Congress);268--271;2018;IEEEConference
  • Optimizing Organizations for Navigating Data Lakes;Nargesian, Fatemeh and Pu, Ken Q and Zhu, Erkang and Bashardoost, Bahar Ghadiri and Miller, Renee J;arXiv preprint arXiv:1812.07024;2018Journal

2017

  • Modeling Transition and Mobility Patterns;Hedrick, Adele and Zhu, Ying and Pu, Ken;International Conference on Applied Human Factors and Ergonomics;528--537;2017;Springer, ChamConference
  • Interactive navigation of open data linkages;Zhu, Erkang and Pu, Ken Q and Nargesian, Fatemeh and Miller, Renee J;Proceedings of the VLDB Endowment;Volume 10;(12);1837--1840;2017;VLDB EndowmentJournal
  • An Index Structure for Fast Range Search in Hamming Space;Reina, EM and Pu, Ken Q and Qureshi, Faisal Z;2017 14th Conference on Computer and Robot Vision (CRV);8--15;2017;IEEEConference

2016

  • LSH ensemble: Internet-scale domain search;Zhu, Erkang and Nargesian, Fatemeh and Pu, Ken Q and Miller, Renee J;arXiv preprint arXiv:1603.07410;2016Journal
  • A formal algebra implementation for distributed image and video stream processing;Helala, Mohamed A and Pu, Ken Q and Qureshi, Faisal Z;Proceedings of the 10th International Conference on Distributed Smart Camera;84--91;2016Conference
  • ARC: A pipeline approach enabling large-scale graph visualization;Ferron, Michael and Pu, Ken Q and Szlichta, Jaroslaw;2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM);1397--1400;2016;IEEEConference
  • Hierarchical temporal mobility analysis with semantic labeling;Hedrick, Adele and Pu, Ken Q and Zhu, Ying;2016 International Conference on Computational Science and Computational Intelligence (CSCI);1321--1326;2016;IEEEConference

2015

  • Automatic parsing of lane and road boundaries in challenging traffic scenes;Helala, Mohamed A and Qureshi, Faisal Z and Pu, Ken Q;Journal of electronic imaging;Volume 24;(5);053020;2015;International Society for Optics and PhotonicsJournal

2014

  • A stream algebra for computer vision pipelines;Helala, Mohamed A and Pu, Ken Q and Qureshi, Faisal Z;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops;786--793;2014Conference
  • Towards Efficient Feedback Control in Streaming Computer Vision Pipelines;Helala, Mohamed A and Pu, Ken Q and Qureshi, Faisal Z;Asian Conference on Computer Vision;314--329;2014;Springer, ChamConference
  • Scalable distributed processing of K nearest neighbor queries over moving objects;Yu, Ziqiang and Liu, Yang and Yu, Xiaohui and Pu, Ken Q;IEEE Transactions on Knowledge and Data Engineering;Volume 27;(5);1383--1396;2014;IEEEJournal
  • Using document space for relational search;Drake, Richard and Pu, Ken Q;Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014);841--844;2014;IEEEConference
  • An Index Structure for Fast Range Search in Hamming Space;Reina, Ernesto Rodriquez;2014;UOITMisc

2013

  • Discovering linkage points over web data;Hassanzadeh, Oktie and Pu, Ken Q and Yeganeh, Soheil Hassas and Miller, Renee J and Popa, Lucian and Hern{\'a}ndez, Mauricio A and Ho, Howard;Proceedings of the VLDB Endowment;Volume 6;(6);445--456;2013;VLDB EndowmentJournal

2012

  • Tag Grid: Supporting Multidimensional Queries of Tagged Datasets;Pu, Ken Q and Cheung, Russell;Recent Trends in Information Reuse and Integration;331--342;2012;Springer, ViennaMisc
  • Road boundary detection in challenging scenarios;Helala, Mohamed A and Pu, Ken Q and Qureshi, Faisal Z;2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance;428--433;2012;IEEEConference
  • Authoring relational queries on the mobile devices;Hedrick, Adele and Pu, Ken Q;Procedia Computer Science;Volume 10;752--757;2012;ElsevierJournal

2011

  • Selection of features for surname classification;Rachevsky, Lev and Pu, Ken Q;2011 IEEE International Conference on Information Reuse & Integration;15--20;2011;IEEEConference

2010

  • Online annotation of text streams with structured entities;Pu, Ken Q and Hassanzadeh, Oktie and Drake, Richard and Miller, Renee J;Proceedings of the 19th ACM international conference on Information and knowledge management;29--38;2010Conference
  • Tag grid: supporting collaborative and fuzzy multidimensional queries of tagged datasets;Pu, Ken Q and Cheung, Russell;2010 IEEE International Conference on Information Reuse & Integration;364--367;2010;IEEEConference
  • Recent Patents on Information Retrieval Using Natural Language and Keyword Query;Q Pu, Ken;Recent Patents on Computer Science;Volume 3;(3);186--194;2010;Bentham Science PublishersJournal

2009

  • Keyword query cleaning using hidden markov models;Pu, Ken Q;Proceedings of the First International Workshop on Keyword Search on Structured Data;27--32;2009Conference
  • Visual integration tool for heterogeneous data type by unified vectorization;Bourennani, Farid and Pu, Ken Q and Zhu, Ying;2009 IEEE International Conference on Information Reuse & Integration;132--137;2009;IEEEConference
  • Visualization and integration of databases using self-organizing map;Bourennani, Farid and Pu, Ken Q and Zhu, Ying;2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications;155--160;2009;IEEEConference
  • Frisk: Keyword query cleaning and processing in action;Pu, Ken Q and Yu, Xiaohui;2009 IEEE 25th International Conference on Data Engineering;1531--1534;2009;IEEEConference
  • Spatial inference using networks of RFID receiver: a Bayesian approach;Zhu, Ying and Howard, William and Pu, Ken Q;GLOBECOM 2009-2009 IEEE Global Telecommunications Conference;1--6;2009;IEEEConference
  • Unified vectorization of numerical and textual data using self-organizing map;Bourennani, Farid and Pu, Ken Q and Zhu, Ying;International Journal on Advances in Systems and Measurements Volume 2, Numbers 2&3, 2009;2009Journal
  • Analysis of Service Compatibility: Complexity and Computation;Pu, Ken Q;Services and Business Computing Solutions with XML: Applications for Quality Management and Best Processes;136--155;2009;IGI GlobalMisc

2008

  • Keyword query cleaning;Pu, Ken Q and Yu, Xiaohui;Proceedings of the VLDB Endowment;Volume 1;(1);909--920;2008;VLDB EndowmentJournal
  • Dynamic multicast in overlay networks with linear capacity constraints;Zhu, Ying and Li, Baochun and Pu, Ken Qian;IEEE Transactions on Parallel and Distributed Systems;Volume 20;(7);925--939;2008;IEEEJournal
  • Adaptive multicast tree construction for elastic data streams;Zhu, Ying and Pu, Ken Q;IEEE GLOBECOM 2008-2008 IEEE Global Telecommunications Conference;1--5;2008;IEEEConference
  • Modeling and synthesis of service composition using tree automata;Pu, Ken Q and Zhu, Ying;2008 IEEE International Conference on Information Reuse and Integration;46--51;2008;IEEEConference

2007

  • Efficient indexing of heterogeneous data streams with automatic performance configurations;Pu, Ken Q and Zhu, Ying;19th International Conference on Scientific and Statistical Database Management (SSDBM 2007);34--34;2007;IEEEConference
  • Fast identification of relational constraint violations;Chandel, Amit and Koudas, Nick and Pu, Ken Q and Srivastava, Divesh;2007 IEEE 23rd International Conference on Data Engineering;776--785;2007;IEEEConference
  • Service description and analysis from a type theoretic approach;Pu, Ken Q;2007 IEEE 23rd International Conference on Data Engineering Workshop;379--386;2007;IEEEConference
  • Fast archiving and querying of heterogeneous sensor data streams;Pu, Ken Q and Zhu, Ying;2007 Second International Conference on Digital Telecommunications (ICDT'07);28--28;2007;IEEEConference

2006

  • Syntactic rule based approach toweb service composition;Pu, Ken and Hristidis, Vagelis and Koudas, Nick;22nd International Conference on Data Engineering (ICDE'06);31--31;2006;IEEEConference
  • On formal methods of multidimensional databases;Pu, Qian Ken;2006;University of TorontoJournal

2005

  • Monitoring k-nearest neighbor queries over moving objects;Yu, Xiaohui and Pu, Ken Q and Koudas, Nick;21st International Conference on Data Engineering (ICDE'05);631--642;2005;IEEEConference
  • Concise descriptions of subsets of structured sets;Pu, Ken Q and Mendelzon, Alberto O;ACM Transactions on Database Systems (TODS);Volume 30;(1);211--248;2005;ACMJournal
  • Modeling, querying and reasoning about OLAP databases: a functional approach;Pu, Ken Q;Proceedings of the 8th ACM international workshop on Data warehousing and OLAP;1--8;2005Conference
  • Typed functional query languages with equational specifications;Pu, Ken Q and Mendelzon, Alberto O;Proceedings of the 14th ACM international conference on Information and knowledge management;233--234;2005Conference

2004

  • Functional Integration of Relational, OLAP and XML Data;Pu, Ken Q;Proceedings of VLDB Workshop on Information Integration on the Web (IIWeb-2004);97;2004Conference

2003

  • Concise descriptions of subsets of structured sets;Mendelzon, Alberto O and Pu, Ken Q;Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems;123--133;2003Conference

2000

  • Modeling and control of discrete-event systems with hierarchical abstraction;Pu, Ken Qian;MA. Sc. thesis, Dept. Elect. Comput. Eng., Univ. Toronto, Toronto, ON, Canada;2000Journal

1998

  • Theory Of Discrete Wavelet Transform And An Error Analysis Of The Pyramid Algorithm;Pu, Ken Qian;1998;CiteseerMisc

Others

  • Efficient Indexing of Heterogeneous Data Streams with Automatic Performance Tuning;Pu, Ken Q and Zhu, YingJournal
  • Algorithm and Complexity of the Unification Problem of a Polymorphic Attribute-based Type System;Pu, Ken QJournal