Yukitaka Kusumura, Ph.D., Author at dotData

Databricks AI + Data Summit 2025 Recap

July 29, 2025

From AI Agents to the Lakehouse: The Future of Data Utilization Introduction: A New Era Driven by the Evolution of Data and AI The “Databricks AI + Data Summit 2025,” held from June 15 to 18 in San Francisco, California, was one of the world’s largest conferences dedicated to data and AI. With over 22,000 attendees from around the globe, it showcased the cutting edge of the industry. This blog post provides a detailed recap of the Databricks AI summit, highlighting key themes including data lakes, AI engineering, and advanced self-service analytics. Overview of Data + AI Summit 2025 The AI & Data Summit 2025—Databricks’ annual global conference—reached record scale this year. Held over four days, it brought together data scientists, engineers, professionals, and business leaders from around the world to engage in lively discussions on the latest AI technologies and data strategies. One key theme across keynotes and sessions…

Unlocking Business Insights with Generative AI Text Analysis

April 23, 2025

By Yukitaka Kusumura, Ph.D.

1. The Role of Text Analysis in Modern Enterprises Companies deal with large volumes of data every day, including both numerical and text-based information. Numerical data is structured data, so it is often easier to analyze due to its well-formed formatting. On the other hand, unstructured text data from various sources, such as emails, sales reports, call center logs, and internal documentation, is especially challenging to analyze. However, it's also a goldmine of hidden, meaningful insights that can drive business improvements and strategic decision-making, if only we had the right tools. Natural language processing (NLP) text mining tools have been the go-to for analyzing text data. However, these text analysis software, which primarily relied on statistical properties of words and sentences, had their limitations. They struggled to understand the deeper context and meaning, which is where generative AI steps in. Generative AI has revolutionized text analytics, enabling more sophisticated interpretation…

Maintain Model Robustness: Strategies to Combat Feature Drift in Machine Learning

June 7, 2023

By Yukitaka Kusumura, Ph.D.

Introduction Building robust and reliable models in machine learning is of utmost importance for assured decision-making and resilient predictions. While accuracy remains a desirable trait, the stability and durability of these models take precedence in ensuring long-term efficacy. A crucial aspect that bolsters model reliability is the stability of the features, where consistency over time can drastically improve the model's robustness. In this article, we will explore the concept of feature drift and analyze methods to maintain feature stability, thereby enhancing the model's overall robustness and resistance to fluctuating conditions, even at the cost of marginal reductions in accuracy. Challenges of Feature Drift The evolution of features, more commonly referred to as feature drift, is typically categorized into two distinct types: data drift and concept drift. Data Drift Machine learning models are trained to minimize overall input errors, causing the models to be more inclined toward fitting the majority of…

Preventing Data Leakage in Feature Engineering: Strategies and Solutions

April 26, 2023

By Yukitaka Kusumura, Ph.D.

Data leakage is a widespread and critical issue that can undermine the reliability of features. In this blog, we will delve into the concept of data leakage, examine how it can transpire during feature engineering, and present various strategies to prevent or mitigate its consequences. Understanding Data Leakage Data leakage occurs when the feature engineering process unintentionally uses information from the target variable or the validation/test set. This can lead to overly optimistic performance metrics, as the feature appears to perform exceptionally well on the test set. However, when the feature is implemented in real-world applications, its performance is often significantly worse than anticipated. Data Leakage in Feature Engineering Feature engineering is the process of creating new features or transforming existing ones, and it can frequently be a source of data leakage if not managed carefully. Statistical Value Leakage Statistical value leakage arises when you create or transform features before…

Feature Engineering from Geo-Spatial Data

April 4, 2023

By Yukitaka Kusumura, Ph.D.

Geospatial data, combining geographic and spatial information, is becoming increasingly important in various industries, from transportation and logistics to urban planning and environmental monitoring. However, working with geospatial data can be challenging, as it often requires specialized feature engineering and analysis techniques. In this blog, we will explore some of the key concepts and techniques involved in feature engineering for geospatial tables. Geospatial Tables Geo-spatial tables contain geographic data that can be used to represent geographic features. These tables are rows and columns containing information about geographic locations and features like roads, rivers, buildings, and parks. Each row represents a point in space or a specific area. The columns may contain longitude or latitude coordinates and other unique characteristics like environmental conditions, population density, and land use. For example, geospatial data showing land use in a town or city may have columns for longitudes and latitudes alongside other columns with…

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Dive Deeper

Products

Our On-Demand Webinars

Case Studies

Industry

Need

News

News

Events

News

Case Study: Sumitomo Mitsui Trust Bank Increases Close Rates by 20X with AI

Author: Yukitaka Kusumura, Ph.D.

Databricks AI + Data Summit 2025 Recap

Unlocking Business Insights with Generative AI Text Analysis

Maintain Model Robustness: Strategies to Combat Feature Drift in Machine Learning

Preventing Data Leakage in Feature Engineering: Strategies and Solutions

Feature Engineering from Geo-Spatial Data