fbpx

dotData Feature Factory Data-Centric & Programmatic Feature Discovery

Empower data scientists with automated feature discovery to make it easier for them to use all your data in building better, more accurate ML models.

A Paradigm Shift
for Enterprise Data

Feature Factory is a fundamental shift in how enterprise data science teams develop curated data and accumulate data know-how as reusable assets. Feature spaces and the ability to discover features through a data-centric, programmatic approach leads to enhanced collaboration, better efficiency, increased model quality, greater reusability, reproducibility, scalability, and transparency. Break down silos and capitalize on the wealth of information at your disposal.

Automated Feature Engineering to Maximize
Data Value and Empower AI

Feature engineering is essential for maximizing data value and advancing AI. Traditionally, it’s been a manual, time-intensive process relying on experience and intuition. dotData Feature Factory transforms this through a data-centric approach, programmatically defining feature spaces to automatically generate a vast array of feature hypotheses, impossible to achieve manually. By archiving user data and business insights in an analysis database it enables process reuse. Feature Factory automates feature pipeline creation for production deployment, supplying processed data to support AI development, business intelligence, and other data applications, empowering all data-driven initiatives.

Product Features

dotData Feature Factory transforms corporate data processing know-how into reusable assets through unique feature engineering to discover business patterns hidden in business data. It effectively transforms large volumes of data accumulated by companies into business insights to realize data-driven decision-making.

Multi-Source, Multi-Table Feature Engineering

Multi-Source, Multi-Table Feature Engineering

Quickly connect to diverse data sources to enrich feature discovery and unlock iterative feature additions from new sources.

No SQL Required

No SQL Required

Leverage dotData Feature Factory to interact with data through Dataframes and generate entity relationships and features using familiar Python commands and syntax.

Automated Data Wrangling

Automated Data Wrangling

Avoid time-consuming and error-prone data wrangling, dotData will automatically cleanse, align, and prepare data for feature discovery.

Time-Series Features, Without Leakage

Time-Series Features, Without Leakage

Automatically generate and validate multidimensional time-based features, including holidays, lags or delays, periodicity, seasonality, and more.

Build Reusable Feature Discovery Assets

Build Reusable Feature Discovery Assets

Record every data and feature transformation step into your Analytic Database and turn your “know-how” into reusable assets for your organization.

Automated Feature Discovery at Enterprise Scale

Automated Feature Discovery at Enterprise Scale

Generate and score millions of data features from complex tables, relationships, and billions of rows.

Insights and Explainable AI

Insights and Explainable AI

Produce actionable business insights and explainable features from natural language descriptions and blueprinting of discovered features from source variables.

Production-Ready Feature Pipeline Generation

Production-Ready Feature Pipeline Generation

Automatically generate feature pipelines with full SQL code from source tables, ready for production use.

Steps to Use

Prepare Tables as Dataframes

Connect to multiple data sources, data lakes, or data warehouses and ingest the data as Spark Dataframes in Python

  • Load data from modern cloud data marts (including Amazon Redshift, Google Big Query, Snowflake, MS Azure Synapse), traditional data warehouses (Oracle, Teradata, and MS SQL Server), and flat data sources (CSV files, Tableau Hyper files, etc.) via Spark Dataframe API.
  • Automatic data type detection and data schema inference.
  • Connect multiple data sources together by specifying Dataframe relationships.
  • Define and configure temporal data relationships for automated temporal feature discovery.

Run dotData Feature Factory

Specify your target variable and the source tables as Dataframes you will use to build features. Define your search criteria and run dotData Feature Factory from your favorite Python IDE or notebook

  • Resolve data quality issues like illegal values, outliers, data canonicalization, missing values, target label mapping, and more.
  • Explore millions of feature hypotheses – including numeric, categorical, time-series, text, and even geospatial data.
  • Resolve feature over-fitting, feature collinearity, feature drifts, and feature redundancy based on dotData’s proprietary algorithms.
  • Custom feature primitives and search criteria to add your own domain features into the feature exploration space.

Discover Features & Insights

Explore and evaluate discovered features interactively from Python

  • Feature leaderboard (feature list) that surfaces features that are the most relevant and correlated with your target variable.
  • Understand each feature’s business value and construction via an easy-to-understand auto-generated explanation and feature blueprint diagram.
  • Select your preferred features based on various feature metrics like correlation, feature-wise AUC, permutation importance, feature locality, popularity, and more.
  • Extract feature tables as Dataframe and visualize each feature using the built-in visualization tool or any Python visualization library you like.

Extract & Iterate Feature Discovery Experiments

Iterate feature discovery experiments to derive better quality and higher-order features. insightsExplore, optimize, and tune features interactively. Choose which features to extract for further analysis, modeling, or reporting from within Python

  • Edit feature descriptors (definitions) to customize discovered features and leverage your domain expertise.
  • Natural interface to add new datasets and run new experiments. Combine features from multiple experiments with different granularity.
  • All steps and feature space details are reported without any black box.
  • Modularized execution allows you to run your experiments from any intermediate steps and iterate them faster.

Deploy Your Features Into Production

Populate feature stores and continuously update features in production applications

  • Ingest features and metadata (feature explanation, feature statistics, feature schema) into any feature stores (Databricks, Snowflake, AWS SageMaker and more) and enhance your ML models.
  • Automatic feature pipeline generation with fully specified query statements for reuse and eliminate error-prone manual feature query implementation.
  • One command deployment of feature pipelines into dotData Ops. Continuously recalculate features values with the newest data and monitor feature quality and drifts.

Deployment Options

Jupyter Notebook

Jupyter Notebook

Install and use dotData Feature Factory on Jupyter Notebook, the standard Python environment for data scientists.

Databricks

Databricks

Seamlessly integrate dotData Automated Feature Discovery into Databricks Python workflows and ML experiments.

Snowflake

Snowflake

dotData Feature Factory is available on Snowflake via Snowpark Container Services.

Amazon EMR

Amazon EMR

Install dotData Feature Factory in your AWS EMR instance to accelerate feature discovery for your data science team.

Pip Install

Pip Install

Quickly deploy dotData Feature Factory via pip-install – even on your own personal laptop.

What Our Customers Say

sticky.io

sticky.io

I was spending 95% of my time wrangling data…now I can offload most of that work and just focus on delivering viable patterns and insights.

Justin Shoolery, Director of Data Science & Analytics
Exeter Finance

Exeter Finance

The biggest problem is that, when doing it manually, it’s just a repetitive, trial-and-error process that takes time. dotData solves a problem I’ve been trying to solve for 20 years.

Karthik Chandrasekhar, SVP of Decision Science
JAL Engineering Co., Ltd.

JAL Engineering Co., Ltd.

With dotData, it is now possible to create new features that can lead to signs of problems that could not be found through hypothesis-testing analysis based on the knowledge of mechanics and engineers.

Toru Taniuchi, System Technology Office, Technology Department (at the time)

dotData's AI Platform Maximize Data Utilization through Feature Discovery

dotData leverages automated feature engineering to build models using machine learning, enhancing data by accumulating feature values as assets and extracting valuable insights, enabling businesses to become more data-driven. Our platform satisfies a wide range of needs, including business transformation, and support the effective use of data and AI to drive innovation and growth.

dotData Insight Unlocking Hidden Patterns

dotData Insight is an innovative data analysis platform designed for business teams to identify high-value hyper-targeted data segments with ease. It provides dotData's hidden patterns through an intuitive, approachable interface. Through the powerful combination of AI-driven data analysis and GenAI, Insight discovers actionable business drivers that impact your most critical key performance indicators (KPIs). This convergence allows business teams to intuitively understand data insights, develop new business ideas, and more effectively plan and execute strategies.

dotData Enterprise No-Code Automated Feature Engineering and ML

dotData Enterprise is an AI platform that enables data analysis teams to develop predictive AI models without coding. Through automated feature engineering and machine learning (AutoML), dotData Enterprise provides a one-stop solution for AI development—from extracting features from business data to building predictive models using machine learning—without requiring specialized knowledge or coding skills. With dotData Enterprise, predictive analytics projects can be completed in days rather than months, allowing businesses to quickly harness the power of AI and gain valuable future predictions and insights from their data.

dotData Ops Self-Service Deployment of Data and Prediction Pipelines

dotData Ops offers analytics teams a self-service platform to deploy data, features, and prediction pipelines directly into real business operations. By testing and quickly validating the business value of data analytics within your workflows, you build trust with decision-makers and accelerate investment decisions for production deployment. dotData’s automated feature engineering transforms MLOps by validating business value, diagnosing feature drift, and enhancing prediction accuracy.

dotData Cloud Eliminate Infrastructure Hassles with Fully Managed SaaS

dotData Cloud delivers each of dotData’s AI platforms as a fully managed SaaS solution, eliminating the need for businesses to build and maintain a large-scale data analysis infrastructure. This minimizes Total Cost of Ownership (TCO) and allows organizations to focus on critical issues while quickly experimenting with AI development. dotData Cloud’s architecture, certified as an AWS "Competency Partner," ensures top-tier technology standards and uses a single-tenant model for enhanced data security.

dotData Stream Real-Time Predictive Streaming

dotData Stream is a platform that enables real-time predictive streaming. With a single command, you can deploy models developed in dotData Enterprise and Feature Factory into containerized, microservice-based environments for real-time predictions. This platform allows you to utilize AI predictions across various environments, including on-premises, cloud, and even IoT edge servers.

Request a Demo

We offer support tailored to your needs, whether you want to see a demo or learn more about use cases. Please feel free to contact us.