
Databricks AI + Data Summit 2025 Recap
- Events
The “Databricks AI + Data Summit 2025,” held from June 15 to 18 in San Francisco, California, was one of the world’s largest conferences dedicated to data and AI. With over 22,000 attendees from around the globe, it showcased the cutting edge of the industry. This blog post provides a detailed recap of the Databricks AI summit, highlighting key themes including data lakes, AI engineering, and advanced self-service analytics.
The AI & Data Summit 2025—Databricks’ annual global conference—reached record scale this year. Held over four days, it brought together data scientists, engineers, professionals, and business leaders from around the world to engage in lively discussions on the latest AI technologies and data strategies.
One key theme across keynotes and sessions was the concept of the “Data Product,” which we’ll explore next.
A “data product” is not just a collection of data—it is a curated, quality-assured data asset that delivers value to specific consumers or use cases.
Traditionally, enterprise data was managed in departmental silos, with each team independently cleansing, transforming, and validating data. This led to poor reusability, data inconsistency, and slow decision-making across the organization.
With the data product approach, business teams become consumers of pre-packaged, reusable data components—complete with defined use cases, quality standards, metadata, and monitoring capabilities. This fosters organization-wide data democratization and scalable usage. Achieving this requires a fully integrated data analytics foundation.
The most noteworthy data platform announcements at the Summit were the release of “Lakebase” and enhancements to “Unity Catalog.” These initiatives are closely tied to Databricks’ strategic acquisitions of Neon and Tabular in 2024—each valued at approximately $1 billion. These developments represent a $2 billion investment in advancing Databricks’ vision of unifying data and AI.
Lakebase introduces a new architecture that removes the boundaries between operational (OLTP) and analytical (OLAP) databases. This serverless, PostgreSQL-compatible, Delta Lake-based database separates storage and compute for scalable resource allocation.
For example, application data can be instantly analyzed in real-time to inform business operations—without the need for traditional ETL or reverse ETL processes. It represents a significant step toward a fully unified data platform in the future.
Previously known as a metadata catalog for AI model and data management within Databricks, Unity Catalog has undergone significant evolution. Key new features include:
Additionally, Unity Catalog Metrics and Unity Catalog Discovery were introduced to address a common issue: inconsistent KPI definitions across departments, which resulted in discrepancies in tools such as Power BI and Tableau. By centrally defining business metrics at the Lakehouse layer, enterprises can ensure consistency across analytics and business intelligence tools. Enhancements to metric definitions, dashboard governance, and approval workflows further improve data integrity and trust.
One of the key highlights of the Summit was the focus on “AI Engineering”—a discipline that brings software engineering principles to the development and deployment of AI systems, ensuring greater stability, maintainability, and quality.
In 2024, “Compound AI Systems” gained attention, where multiple AI agents collaborate to complete tasks. In 2025, the focus shifted to how such systems can be built and operated at a production quality level.
During the event, several solutions were unveiled to support AI engineering—here are some of the highlights:
These tools automate and streamline the entire AI lifecycle—raising the standard for enterprise AI implementation.
The crucial part of MLflow 3.0 is the concept of the “LLM Judge”—a framework where one AI model evaluates the output quality of another. Unlike traditional methods that rely mainly on comparing outputs to predefined answers, LLM Judge is designed for the generative AI era, where flexible assessments, such as “Is this output appropriate?” or “Is this a better response?” are crucial.
The system uses judgment prompts to evaluate each input-output pair, determining the validity and relevance of the result. These evaluations are then used to refine and calibrate prompts for better performance. In some cases, human professionals are expected to be involved for reviewing output quality and improvement suggestions.
This open-source framework enables automatic prompt optimization based on performance evaluations. Using feedback labeled by LLM Judges—as either successful or failed—DSPy intelligently adjusts prompts by adding few-shot examples, fine-tuning phrasing, or avoiding known failure patterns. It can even suggest new evaluation data to improve prompt performance further, making prompt design and tuning tasks with AI automation.
These capabilities represent a new paradigm in which AI can evaluate and improve itself, helping to eliminate the dependency on an individual person’s expertise. As a result, AI development becomes faster, more scalable, and of consistently high quality.
Databricks also introduced AgentBricks, a unified development environment for AI agents that supports natural language-driven development.
Key features include:
Running entirely on the Databricks platform, AgentBricks integrates generative, evaluation, and orchestration AIs—creating highly sophisticated, collaborative AI systems. While currently geared toward AI engineers, in the future it is expected to enable business users to build agents with no code—a true democratization of AI.
Databricks highlighted significant progress toward its long-standing goal of “data democratization.” Two demos stood out:
Lakeflow Designer is a no-code ETL tool that lets users build data pipelines using natural language commands. In the demo, the presenter used contact center call logs and simply asked the system to add a new column that determines whether each case was closed. Then, they calculated the close rate by agent, all through natural language instructions.
The system automatically visualized the entire data flow, allowing users to inspect inputs and outputs at each step in the process. The most impressive moment occurred when the speaker uploaded an image showing the desired aggregation layout and said to the platform, “I want to summarize the data like this.” Instantly, the tool generated the necessary SQL and processing steps, prompting audible reactions of surprise from the audience.
Another demo showcased AI-enhanced business intelligence. The use case involved a line chart displaying hourly marketing impressions, with a visible spike. When the presenter asked in natural language, “Why is there a spike here?” the AI automatically searched relevant datasets and responded with an explanation, such as, “There’s an increase in marketing activity in the APAC region.”
Not only did it generate the insight in plain language, but it also surfaced the supporting tables and data to provide full context. This level of analysis would previously have required assistance from a data engineer, but now even complex queries can be performed interactively with just a natural language prompt in a single session. This marks a significant leap forward in intuitive, self-service data exploration capabilities.
From dotData’s perspective, three major trends stood out:
That said, the path to “data democratization” is not without challenges. Despite advances in models and tools, such as AutoML and analytics platforms, many business users continue to struggle to find the right solution that helps them derive meaningful insights from various data types. Achieving the intuitive querying shown in the demos requires well-prepared data—yet data prep demands both skill and effort. Even with clean data, asking the right questions, creating the right prompts, and digging deeper into insights remains a challenge.
To address these challenges, dotData offers two key solutions:
Together, these tools connect seamlessly with existing business intelligence and machine learning platforms, creating a truly user-friendly analytics environment. For more information or to schedule a live demo, please don’t hesitate to contact us.