Feature Discovery & Engineering
Search millions of hypotheses
Search millions of feature hypotheses from enterprise-scale relational data across tens of tables, thousands of columns, and billions of rows.
Find the right features for your models
Leverage supervised learning techniques to address feature over-fitting, collinearity, stability, drift, and more.
Feature discovery and engineering
Explore different featurization techniques such as categorical encoding, numeric aggregation, temporal recency and periodicity, geo-location grid, text topic encoding, etc
Feature Quality & Selection
Automated data prep and cleansing
Built-in data & feature cleansing such as string value canonicalization, record duplication removal, missing value imputation, outlier elimination, etc.
Optimize features for your models
Apply stream feature selection and optimization techniques to evaluate millions of features and select the most impactful ones in hours.
Temporal features that work
Prevent temporal data leakage and guarantee point-in-time correctness with advanced temporal relation search
Transparency & Insight
Score feature importance
Derive supervised feature importance (such as feature-wise AUC, permutation importance, sample-wise SHAP) as well as feature statistics as feature metadata.
Feature explanations for full transparency
Produce natural language feature explanations to contextually understand the features and feature blueprint to visually understand the data lineage.
Feature queries built-in
Generate feature queries to reveal every single step of the feature generation processes with 100% transparency.
Feature Pipeline & Query
Build “production-ready” data and feature pipelines that include complete steps from data cleansing through multi-table aggregation to feature transformation.
Feed your Feature Store
Support Dataframe as the standard input and output format that can be connected with any type of data storage and feature stores.
Fine-tune your queries
Customize feature pipelines and queries and tune them to the requirements of your production environment.
What Challenges does Feature Factory Solve?
Feature stores need features
A feature store is critical to your AI/ML stack but it needs features. dotData Feature Factory discovers, produces, and supplies production-ready features as a “factory” of features and empowers your feature stores.
Artisanal feature engineering is slow
Manual feature engineering often takes months to build customized features. dotData Feature Factory allows you to explore millions of features and the most impactful patterns just in hours – automatically.
need great features
Data science teams always look for new features to incrementally improve models or build models from scratch. Feature Factory expands your ability to explore a broader feature space and enhance your feature tables.
Wrangling is a bottleneck
Data scientists spend 80% of their time wrangling and cleansing data for ML. Feature Factory’s built-in data & feature cleansing techniques let data scientists focus on discovering patterns, not cleansing data.
From experiments to production
Transforming features from a Jupyter Notebook data science lab into a stable and robust feature pipeline for production is a hassle. Build production-ready feature pipelines, automatically with dotData.
Discover without domain knowledge
Access to subject matter experts is expensive. Features based on domain knowledge are strong but not scalable. Feature Factory explores features purely based on data and augments your ability to discover high-value features at scale.
Key Features of dotData Feature Factory
Temporal & Time-Series Features
Analyze temporal relationships, extract recency, seasonality, fluctuation, etc. by optimizing time resolutions (hours, days, weeks, etc.)
Regularized target encoding beyond common one-hot encoding, extract multi-category patterns, numeric featurization like histogram encoding, and more!
Supervised Feature Search, Selection & Optimization
Apply patented supervised-learning-based feature search, selection, and optimization techniques to discover the most relevant features to your target variables.
Prevent Overfit, Collinearity, Drift, and Leakage
Produce a high-quality feature set by applying multiple techniques to prevent feature over-fitting, collinearity, draft, and leakage.
Explore Millions of Feature Hypotheses
Work on the distributed computation and handle tens of tables, thousands of columns, and billions of rows to explore millions of feature hypotheses.
Full Integration With Your Python Workflow
dotData lives as a library in your Python environment to create ML-ready feature tables and is seamlessly integrated with your existing ML workflow.
Want to Learn More?
Experience the power of dotData