Your Enterprise Grade
Feature Factory

dotData Py is an enterprise-grade feature discovery platform that helps data science and data engineering teams iterate feature engineering faster and build production-quality feature pipelines automatically.

Discover features with dotData
Feature Engineering and Discovery with dotData

Feature Discovery & Engineering

Search millions of hypotheses

Search millions of feature hypotheses from enterprise-scale relational data across tens of tables, thousands of columns, and billions of rows.

Find the right features for your models

Leverage supervised learning techniques to address feature over-fitting, collinearity, stability, drift, and more.

Feature discovery and engineering

Explore different featurization techniques such as categorical encoding, numeric aggregation, temporal recency and periodicity, geo-location grid, text topic encoding, etc

Feature Quality & Selection

Automated data prep and cleansing

Built-in data & feature cleansing such as string value canonicalization, record duplication removal, missing value imputation, outlier elimination, etc.

Optimize features for your models

Apply stream feature selection and optimization techniques to evaluate millions of features and select the most impactful ones in hours.

Temporal features that work

Prevent temporal data leakage and guarantee point-in-time correctness with advanced temporal relation search

Optimize feature quality with dotData
ML Model Transparency with dotData

Transparency & Insight

Score feature importance

Derive supervised feature importance (such as feature-wise AUC, permutation importance, sample-wise SHAP) as well as feature statistics as feature metadata.

Feature explanations for full transparency

Produce natural language feature explanations to contextually understand the features and feature blueprint to visually understand the data lineage.

Feature queries built-in

Generate feature queries to reveal every single step of the feature generation processes with 100% transparency.

Feature Pipeline & Query

Production-ready pipelines

Build “production-ready” data and feature pipelines that include complete steps from data cleansing through multi-table aggregation to feature transformation.

Feed your Feature Store

Support Dataframe as the standard input and output format that can be connected with any type of data storage and feature stores.

Fine-tune your queries

Customize feature pipelines and queries and tune them to the requirements of your production environment.

Build your feature pipeline with dotData

What Challenges does Feature Factory Solve?

Feature stores need features

A feature store is critical to your AI/ML stack but it needs features. dotData Feature Factory discovers, produces, and supplies production-ready features as a “factory” of features and empowers your feature stores.

Artisanal feature engineering is slow

Manual feature engineering often takes months to build customized features. dotData Feature Factory allows you to explore millions of features and the most impactful patterns just in hours – automatically.


Great models
need great features

Data science teams always look for new features to incrementally improve models or build models from scratch. Feature Factory expands your ability to explore a broader feature space and enhance your feature tables.

Wrangling is a bottleneck

Data scientists spend 80% of their time wrangling and cleansing data for ML. Feature Factory’s built-in data & feature cleansing techniques let data scientists focus on discovering patterns, not cleansing data.

From experiments to production

Transforming features from a Jupyter Notebook data science lab into a stable and robust feature pipeline for production is a hassle. Build production-ready feature pipelines, automatically with dotData.

Discover without domain knowledge

Access to subject matter experts is expensive. Features based on domain knowledge are strong but not scalable. Feature Factory explores features purely based on data and augments your ability to discover high-value features at scale.

How SMBC Accelerated Their Feature Development Process 48X

When SMBC – one of the World’s largest banks – wanted to accelerate their AI/ML development process, they turned to dotData’s Feature Factory platform. Download the case study to see how they accelerated development times by 4,800%.

read-the-report@2x

Key Features of dotData Feature Factory

Temporal & Time-Series Features

Analyze temporal relationships, extract recency, seasonality, fluctuation, etc. by optimizing time resolutions (hours, days, weeks, etc.)

Categorical Feature
Discovery

Regularized target encoding beyond common one-hot encoding, extract multi-category patterns, numeric featurization like histogram encoding, and more!

Supervised Feature Search, Selection & Optimization

Apply patented supervised-learning-based feature search, selection, and optimization techniques to discover the most relevant features to your target variables.

Prevent Overfit, Collinearity, Drift, and Leakage

Produce a high-quality feature set by applying multiple techniques to prevent feature over-fitting, collinearity, draft, and leakage.

Explore Millions of Feature Hypotheses

Work on the distributed computation and handle tens of tables, thousands of columns, and billions of rows to explore millions of feature hypotheses.

Full Integration With Your Python Workflow

dotData lives as a library in your Python environment to create ML-ready feature tables and is seamlessly integrated with your existing ML workflow.