What IS Feature Engineering?

Walter Paliska

October 9, 2019

Thought Leadership

What Is Feature Engineering?
(And Why Do We Need To Automate it?)

The past few years have seen the rapid rise in the adoption of Artificial Intelligence (AI) and Machine Learning (ML) for a multitude of commercial use-cases. Beyond the “cute” factor of AI that can pick a cat out of a photo array, AI and Machine learning are being deployed to model and predict lending risk, to understand and manage customer churn, provide product recommendations, help with programmatic advertising and much more. The challenge for the business community is that the underlying practice that is at the heart of AI and Machine Learning – data science – is rooted in a complex world of statistical analysis, data manipulation, programming and more. Most businesses don’t have enough data scientists – a fact illustrated by research in 2018 by LinkedIn that showed that there would be a shortfall of over 150,000 people with data science skills in the US alone. The data science process is complex and involves multiple distinct phases, as illustrated below. A typical data science project can take months to complete – with the most complex part being the feature engineering piece.

What IS Feature Engineering?

Surprisingly, even in our daily conversations with clients, we find that there is often some amount of confusion as to what the term “feature engineering” actually means. What exactly is feature engineering? What are the steps of the process and why does it take so long? What can we do to accelerate this process? At a most basic level, feature engineering is comprised of three distinct steps:

Feature ideation
Feature selection
Feature creation

The first two steps in the process, feature ideation and feature selection, often require a high degree of “domain knowledge.” Domain knowledge refers to knowledge of the underlying business requirements that must be addressed. For example, a bank might employ a team of business analysts and data analysts to work with the data science team to consider “features” that might be useful in predicting if a client is likely to convert on a “zero balance” transfer offer for a new credit card. During this phase, a high degree of analysis of data is required to understand what data sources, tables and columns might be used to create the “features” that will then be tested in the next phase.

Feature creation and testing are the next part of the process. During this phase, data scientists collaborate with business analysts and data engineers to create flat tables that combine data from multiple related tables in one single “feature table.” For example, the same bank in our previous example might take data from their web tracking system, from their customer records, and from other data sources to create a single table that provides data for individual prospective clients that might be used by a machine learning model to predict the likelihood of that consumer accepting an offer. Each feature that is created must then be evaluated against machine learning models to identify which feature/model combinations provide the best possible outcome.

Why Automate Feature Engineering?

Clearly, the process of feature engineering can be lengthy, time-consuming and resource-intensive. Most organizations simply don’t have enough talent or time to effectively evaluate all possible use cases and to evaluate all possible permutations and combinations of tables and columns of data. Automated Feature Engineering can provide a huge benefit to businesses that aim to leverage AI and ML models for their business. The word “automated feature engineering,” however, can often mean different things, depending on which vendor you are evaluating. For most providers of Automated Machine Learning (AutoML) software, “automated feature engineering” describes the process of evaluating which features – built manually using the process described above, will be most beneficial for any given machine learning model. True Automated Feature Engineering, however, leverages Artificial Intelligence (AI) to create and evaluate features automatically. This is why at dotData we talk about discovering the “unknown unknowns” using Automated Feature Engineering. By automating the entire feature building process, you can build and evaluate hundreds of thousands, potentially even millions of features automatically – exposing only the ones that pass a specific threshold – and then providing data scientists with a wealth of additional features that they may have never considered.

To be specific, Automated Feature Engineering is not a replacement for manual feature creation and evaluation but instead can provide two significant benefits: Rapid prototyping and feature augmentation. Automated Feature Engineering can be used by data scientists to accelerate the process of trial and error that is often associated with feature engineering. Feature augmentation, on the other hand, is the process of using Automated Feature Engineering to create additional features that the data scientists, business analysts and data engineerings might have never even considered.

From Months to Days

What are the benefits of Automated Feature Engineering? By far the most valuable benefit is that of accelerated performance. Many dotData clients have leveraged the Automated Feature Engineering features of our dotData Enterprise or dotData Py platforms to accelerate their data science processes, often being able to deliver in days what traditionally took five months or longer to deliver. With the exponential growth in need for AI and ML use-cases and the low availability of data science resources, Automated Feature Engineering – as part of an effective AutoML platform – can help businesses grow exponentially the number of AI and ML projects that are executed and successfully brought into production.
Learn more about our platform and about Automated Feature Engineering by visiting our website.

Walter Paliska

Walter brings 25+ years of experience in enterprise marketing to dotData. Walter oversees the Marketing organization and is responsible for product marketing and demand generation for dotData. Walter’s background includes experience with both software and hardware companies, and he has worked in seven different startups, including three successful bootstrap startups.

Events

April 8, 2021

Reflections from ODSC East 2021

Reducing Customer Churn in the Insurance Industry with ML

Industry Use Cases

March 11, 2021

Reducing Customer Churn in the Insurance Industry with Machine Learning

Case Study

April 22, 2021

dotData Named a Leader By Forrester

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others

Dive Deeper

Products

Our On-Demand Webinars

Case Studies

Industry

Role

Need

News

News

Events

News

Case Study: Sumitomo Mitsui Trust Bank Increases Close Rates by 20X with AI

What IS Feature Engineering?