AutoML 2.0: Is The Data Scientist Obsolete?
As originally seen on Forbes Cognitive World, our CEO – Ryohei Fujimaki PhD was a primary contributor to this article. In case you’ve missed it, we’ve reposted the Original Article below.
It’s an AutoML World
The world of AutoML has been proliferating over the past few years – and with a recession looming, the notion of automating the development of AI and Machine Learning is bound to become even more appealing. New platforms are available with increased capabilities and more automation. The advent of AI-powered Feature Engineering – which allows users to discover and create features for data science processing automatically – is enabling a whole new approach to data science that, seemingly, threatens the role of the data scientist. Should data scientists be concerned about these developments? What is the role of the data scientist in an automated process? How do organizations evolve because of this newfound automation?
AutoML 2.0, More Automation for Data Science
First-generation AutoML platforms have focused on automating the machine learning part of the data science process. In a traditional data science workflow, however, the longest and most challenging part is the highly manual step known as feature engineering. Feature engineering involves connecting data sources and building a flat “feature table” with a rich, diverse set of “features” that is evaluated against multiple Machine Learning algorithms. The challenge of feature engineering is that it requires an elevated level of domain expertise to “ideate” new features and is very iterative as features are evaluated and rejected or chosen. New platforms, however, have recently emerged that provide additional capabilities and automation aimed at solving this challenge.
Platforms with “Automated Feature Engineering” capabilities now allow for the automated creation of feature-tables from relational data sources as well as flat files. This ability to “auto-generate” features in the data science process is a game-changing capability. Suddenly, the “citizen” data scientists – Business Intelligence (BI) analysts, data engineers, and other technically savvy members of the organization with deep domain knowledge – can become valuable contributors to an organization’s development of ML and AI models. Through Automated Feature Engineering, BI teams can suddenly develop sophisticated predictive analytics algorithms in days, significantly accelerating their productivity with minimal help from data scientists.
Automating Data Science: Democratization
One of the chief benefits of AutoML 2.0 platforms is true data science democratization. When data science automation can accelerate and automate the process of discovering and creating features, it allows for a more diverse and abundant group of users to contribute to the data science process. Automation of feature creation allows the “citizen” data scientist to create incredibly useful, highly optimized use-cases. Because citizen data scientists typically have a high degree of “domain expertise,” they can focus on use cases that are of high value to the organization with minimal if any assistance from the data science team.
The added benefit of enabling citizen data scientists is that it allows the business to expand their use of data science without having to worry about hiring armies of data scientists. The ability to empower new data science contributors is especially significant given the difficulty organizations in the US have had in hiring data scientists, as examined in a 2018 LinkedIn study. With economic uncertainty facing the global community, enabling a new class of AI/ML developers with minimal investments becomes a game-changing value proposition to maintain or increase competitive advantages.
Automating Data Science: Productivity, Not Replacement
Any conversation of AutoML 2.0 platforms, however, is misplaced if the focus is on replacing or displacing the data scientist. Most data scientists see feature-engineering as one of the most significant obstacles to their work. Automation can only help to accelerate the process by providing incredible productivity boosts that would not be otherwise possible without automation. By leveraging AutoML 2.0, data scientists can often accelerate their work dramatically – from months to days. Besides, the use of AI-based feature engineering in AutoML 2.0 platforms, allows data scientists to discover features that they would have never considered.
AI-based feature engineering automatically builds, evaluates, and exposes features by combining data from multiple columns, often across different tables and sources. The ability of AutoML 2.0 to self-discover features allows data scientists to explore the so-called “unknown unknowns,” the features the data scientists would have never even considered because of either lack of time or lack of domain expertise.
AutoML 2.0: Creating A More Productive, More Inclusive AI/ML Program
Rather than being a threat to the livelihood of data scientists, AutoML 2.0 platforms are, in fact, an enabling technology that helps accelerate and democratize the data science process. AutoML 2.0 provides the acceleration and automation necessary for data scientists to be more productive, giving them the ability to scale their work and providing an even more significant benefit to the business. This two-fold advantage of democratization and acceleration of the data science process are the most significant selling points of AutoML 2.0 platforms and the key to scaling the data science process in the modern organization.