Why dotData Feature Factory?
dotData Py is an enterprise-grade feature discovery platform that helps data science and data engineering teams iterate feature engineering faster and build production-quality feature pipelines automatically.
Key Features
Feature Discovery & Engineering
Search millions of hypotheses
Search millions of feature hypotheses from enterprise-scale relational data across tens of tables, thousands of columns, and billions of rows.
Find the right features for your models
Leverage supervised learning techniques to address feature over-fitting, collinearity, stability, drift, and more.
Feature discovery and engineering
Explore different featurization techniques such as categorical encoding, numeric aggregation, temporal recency and periodicity, geo-location grid, text topic encoding, etc.
Feature Quality & Selection
Automated data prep and cleansing
Built-in data & feature cleansing such as string value canonicalization, record duplication removal, missing value imputation, outlier elimination, etc.
Optimize features for your models
Apply stream feature selection and optimization techniques to evaluate millions of features and select the most impactful ones in hours.
Temporal features that work
Prevent temporal data leakage and guarantee point-in-time correctness with advanced temporal relation search.
Transparency & Insights
Score feature importance
Derive supervised feature importance (such as feature-wise AUC, permutation importance, and sample-wise SHAP) as well as feature statistics as feature metadata.
Feature explanations for full transparency
Produce natural language feature explanations to contextually understand the features and feature blueprint to visually understand the data lineage.
Feature queries built-in
Generate feature queries to reveal every single step of the feature generation processes with 100% transparency.
Feature Pipeline & Query
Production-ready pipelines
Build “production-ready” data and feature pipelines that include complete steps from data cleansing through multi-table aggregation to feature transformation.
Feed your Feature Store
Support Dataframe as the standard input and output format that can be connected with any type of data storage and feature stores.
Fine-tune your queries
Customize feature pipelines and queries and tune them to the requirements of your production environment.
What Challenges does Feature Factory Solve?

How SMBC Accelerated Their Feature Development Process 48X
When SMBC – one of the World’s largest banks – wanted to accelerate their AI/ML development process, they turned to dotData’s Feature Factory platform. Download the case study to see how they accelerated development times by 4,800%.