Talks


guoliang.jpeg

Title:
AgenticData: An Agentic Data Analytics System

Abstract:
Current unstructured data analytics systems often depend heavily on experts to code and manage complex workflows, resulting in high costs and significant time delays. To overcome these challenges, this talk introduces AgenticData, an intelligent data analytics system that empowers users to submit natural language (NL) queries while autonomously analyzing data across various domains, including both structured and unstructured data. AgenticData utilizes a multi-agent collaboration framework to automatically convert an NL query into a semantic plan that incorporates relational and semantic operators. This framework features a data profiling agent for identifying relevant data, a semantic validation agent for iterative optimization based on user feedback, and a smart memory agent for managing short-term context and long-term knowledge. Furthermore, AgenticData incorporates advanced semantic optimization techniques to efficiently refine and execute semantic plans. The system has already been deployed in real-world applications, such as data analysis for banks and power grid companies, demonstrating its effectiveness and versatility in practical scenarios.

Bio:
Guoliang Li is a professor in the Department of Computer Science at Tsinghua University. His research interests include database systems, machine learning for databases, and data agents. He is an ACM Fellow and an IEEE Fellow. He has received numerous prestigious awards, including the VLDB 2017 Early Research Contribution Award, TCDE 2014 Early Career Award, ICDE 2025 Best Paper Runner-up, SIGMOD 2024 Research Highlight Award, Best of SIGMOD 2023 Papers, VLDB 2023 Best Industry Paper Runner-up, DASFAA 2023 Best Paper Award, Best of VLDB 2020 Papers, CIKM 2017 Best Paper Award, Best of KDD 2018 Papers, Best of ICDE 2018 Papers, DASFAA 2019 Best Student Paper Award, DASFAA 2014 Best Paper Runner-up, and APWeb 2014 Best Paper Award. He served as General Co-chair for SIGMOD 2021 and will serve as PC Co-chair for ICDE 2027. He regularly contributes as a Senior Program Committee member for leading conferences such as SIGMOD, VLDB, ICDE, and KDD. He was an Associate Editor for IEEE TKDE and the VLDB Journal.


Kavitha.jpeg

Title:
Data in the Age of AI

Abstract:
The traditional data lifecycle has involved gathering functional requirements, gathering approvals, building a data stack, and maintaining the stack as application needs change. To what extent does this all change in the age of AI? I will point to some challenges in using AI to automate parts of this stack.

Bio:
Kavitha Srinivas is a senior research scientist at IBM. Her research interests have spanned semantic web, knowledge graphs, AI planning, and more recently the application of AI to data management problems. In particular, she is interested is in using agents to semi-automate data management tasks.


ap.jpg

Title:
Agent-first database systems

Abstract:
Agentic harnesses are revolutionizing how we perform knowledge and data work. At Berkeley, we’re kickstarting a new research agenda around rethinking how database systems should evolve to support agents as a new workload that is likely to dominate future database workloads. I’ll describe new challenges and opportunities in this vein. If time permits, I’ll describe some approaches that we’ve been taking to make agents more efficient by recording and reusing what we call “tribal knowledge” as they explore databases, as well as a new benchmark to truly test agentic capabilities on complex database settings.

Bio:
Aditya Parameswaran is an Associate Professor in Computer Science at UC Berkeley, and a co-director of the EPIC Data Lab. Aditya leverages techniques from artificial intelligence, databases, and human-computer interaction to solve hard data challenges. Multiple open-source tools developed in his group have received thousands of GitHub stars (including Modin, Lux, IPyFlow, DocETL)—and have been downloaded tens of millions of times overall across a spectrum of industries. His research was commercialized as a startup, Ponder, in 2021, where he served as Co-founder and President, before its acquisition by Snowflake. Aditya has received the Alfred P. Sloan Research Fellowship, VLDB Early Career Award, the NSF CAREER Award, the TCDE Rising Star Award, along with other recognitions. His website is at http://adityagp.net.


dd.jpg

Title:
LindormVector: A Distributed Vector Engine on a Cloud-Native Multi-Model NoSQL Database

Abstract:
Vector databases have become a key component of modern AI infrastructure, supporting semantic retrieval and retrieval-augmented generation (RAG) over large-scale unstructured data. In practice, however, existing solutions face a fundamental trade-off: specialized vector databases offer low-latency search but are difficult to scale and integrate, while general-purpose databases provide robustness and consistency at the cost of performance.

In this talk, we present LindormVector, a distributed vector engine built natively on Lindorm. LindormVector adopts a cloud-native shared-storage architecture to achieve elasticity, durability, and high availability, while maintaining efficient vector search. We discuss key design insights, including an optimized IVFPQ-style index, a practical cost model for balancing search efficiency, and techniques for scaling to a large number of clusters using approximate nearest neighbor search and graph-based centroid indexing.

We will also share our experience in supporting real-time updates, filtered queries, and hybrid search over vectors and structured data. Extensive evaluations on billion-scale datasets show that LindormVector achieves high recall with millisecond-level latency, outperforming systems such as Milvus, Elasticsearch, and PostgreSQL. The system has been deployed in production at Alibaba Cloud, managing over 100 billion vectors.

Bio:
Deng Dong is a Senior Staff Engineer and Director at Alibaba Cloud, where he leads the architecture and development of next-generation AI search infrastructure. His work focuses on building unified retrieval systems that integrate vector search, full-text search, and structured data processing to support large-scale AI applications. His team has designed and deployed several key innovations in production, including unified multi-model architectures over shared storage, high-performance vector indexing frameworks combining graph-based and quantization-based techniques, and efficient query processing for vector + full-text + scalar hybrid search. These systems power large-scale AI workloads across leading foundation model providers and internet companies, enabling applications such as retrieval-augmented generation (RAG), multimodal search, and intelligent data discovery at scale. Dong’s research interests lie in unstructured data management, approximate nearest neighbor search, and large-scale data systems for AI. He has published extensively in top-tier conferences such as SIGMOD and VLDB.


amine.png

Title:
Semantic Query Processing over Relations

Abstract:
Language models are making it possible to ask richer questions over relational data, but doing so efficiently remains difficult. Join-heavy queries, often over networked data, can produce large intermediate results that must be serialized into prompts and then fed into language models. This talk presents FFX (Fast Factorized eXecution), a query engine that combines factorized and vectorized execution to address this bottleneck.

The talk focuses on how FFX changes semantic query processing by keeping join intermediates compact, enabling semantic operators to serialize factorized intermediates and predict over their implied Cartesian products. Operators then produce predictions as flat output tuples and bypass having to first flatten the input relation. Empirically, and somewhat surprisingly, our evaluation shows that even non-reasoning models can often perform this Cartesian expansion accurately while still carrying out the semantic task. In our evaluation, FFX achieves an order-of-magnitude reduction in input tokens while maintaining the same accuracy and degrades more gracefully as context size increases.

Bio:
Amine Mhedhbi is an assistant professor at École Polytechnique de Montréal. His interests are in building and analyzing analytical and AI-driven data system architectures. His work includes tackling performance considerations, debuggability, interface design, and data applications. Amine received his Ph.D. in 2023 from the University of Waterloo. His research has been awarded a VLDB best paper award, a Microsoft Ph.D. fellowship award, and the University of Waterloo’s Computer Science distinguished dissertation award.


huaxin.png

Title:
GaussVector: A Multi-modal Vector Database for the Agentic AI Era

Abstract:
As large language models and agentic AI systems become central to enterprise operations, production deployments demand vector database infrastructure that is simultaneously fast, accurate, scalable, and cost-efficient — with seamless integration across diverse application ecosystems. GaussVector is Huawei’s production-grade distributed vector database, purpose-built for multi-modal, enterprise-scale AI workloads. Its storage-compute separation architecture achieves sub-second retrieval over 100B-scale datasets with 10× storage capacity improvement and 100×+ faster node scaling over traditional deployments. Leveraging Huawei’s Ascend NPU co-design stack, GaussVector further delivers 10× index build speedup and 4× query throughput improvement over CPU baselines. GaussVector is a unified multi-modal storage engine consolidates relational, vector, and graph workloads in a single system, and an open ecosystem design enables seamless integration across the full AI stack.

Bio:
Huaxin Zhang is a Senior Principal Software Engineer with two decades of industry experience in database systems and data infrastructure. He holds a Ph.D. in Computer Science from the University of Waterloo, spent 10 years at IBM working on DB2, followed by 10 years at Huawei advancing GaussDB. Huaxin holds 10 U.S. patents, and is currently focusing on vector database architecture and storage engine design.