Discover 100X More Features and Build Better Models
dotData Py is an enterprise-grade feature discovery platform that helps data science and data engineering teams perform feature engineering faster and build production-quality feature pipelines automatically. dotData Py uses an AI algorithm to hypothesize, explore, build, and validate features automatically, and AI features augment your feature space to build explainable models.
Step 1
Prepare Tables as Dataframes
Connect to multiple data sources, data lakes, or data warehouses and ingest the data as Spark Dataframes in python
- Load data from modern cloud data marts (including Amazon Redshift, Google Big Query, Snowflake, MS Azure Synapse), traditional data warehouses (Oracle, Teradata, and MS SQL Server), and flat data sources (CSV files, Tableau Hyper files, etc.) via Spark Dataframe API
- Automatic data type detection and data schema inference importing of tables as a Dataframe
- Connect multiple data sources together by specifying Dataframe relationships
- Define and configure temporal data relationships for automated temporal feature discovery
Step 2
Run dotData Py
Specify your target variable and the source tables as Dataframes you will use to build features. Define your search criteria and run dotData Py from your favorite python IDE or notebook
- Resolve data quality issues like illegal values, outliers, data canonicalization, milling values, target label mapping, and more.
- Explore millions of feature hypotheses – including numeric, categorical, time-series, text, and even geospatial data.
- Resolve feature over-fitting, feature collinearity, feature drifts, and feature redundancy based on dotData’s proprietary algorithms.
- Custom feature primitives and search criteria to add your own domain features into the feature exploration space.
Step 3
Discover Features & Insights
Explore and evaluate discovered features interactively from python
- Feature leaderboard (feature list) that surfaces features that are the most relevant and correlated with your target variable
- Understand each feature’s business value and construction via an easy-to-understand auto-generated explanation and feature blueprint diagram
- Select your preferred features based on various feature metrics like correlation, feature-wise AUC, permutation importance, feature locality, popularity, and more.
- Extract feature tables as Dataframe and visualize each feature using the built-in visualization tool or any Python visualization library you like.
Step 4
Extract & Iterate Feature Discovery Experiments
Iterate feature discovery experiments to derive better quality and higher-order features. insightsExplore, optimize, and tune features interactively. Choose which features to extract for further analysis, modeling, or reporting from within python
- Edit feature descriptors (definitions) to customize discovered features and leverage your domain expertise.
- Natural interface to add new datasets and run new experiments. Combine features from multiple experiments with different granularity.
- All steps and feature space details are reported without any black box
- Modularized execution allows you to run your experiments from any intermediate steps and iterate them faster
Step 5
Deploy Your Features Into Production
Populate feature stores and continuously update features in production applications
- Ingest features and metadata (feature explanation, feature statistics, feature schema) into any feature stores (Databricks, Snowflake, AWS SageMaker and more) and enhance your ML models
- Automatic feature pipeline generation with fully specified query statements for reuse and eliminate error-prone manual feature query implementation.
- One command deployment of feature pipelines into dotData Ops. Continuously recalculate features values with the newest data and monitor feature quality and drifts.
Amazon EMR
Install dotData Py in your AWS EMR instance to accelerate feature discovery for your data science team.
Pip Install
Quickly deploy dotData Py via pip-install – even on your own personal laptop.
Product Features

How SMBC Discovered 2,000,000 new features
When SMBC, one of the world’s largest banks, wanted to get the maximum value from their feature engineering investment, they turned to dotData. Download the case study and read how the went from 2,000 features a year to over 2,000,000.
Are You Ready for dotData Py?
Take our five-minute self-assessment to see if your data and organization could benefit from dotData’s Feature Factory revolution.
