fbpx
Blog - dotData.com

Does The Popularity of AutoML Mean the End of Data Scientist Era?

  • Thought Leadership

McKinsey Analytics wrote an article on the evolution of automated machine learning (AutoML) titled “Rethinking AI talent strategy as AutoML comes of Age.” McKinsey argues that the growing popularity of AutoML tools drives a radical new way of thinking about data science talent. By automating the data science process, AutoML platforms expand the reach of users to include business experts with extensive domain knowledge, non-data scientists, and operational experts. The key takeaway is that companies are best served by not putting all their resources into the fight for sparse technical data science talent but should instead focus at least part of their attention on building up their troop of AutoML practitioners, who will become a substantial proportion of the talent pool for the next decade. CIOs, data science, and analytics leaders will have to rethink their AI talent strategy fundamentally. The Covid-19 pandemic, budget cuts, and the pressure to do more with fewer resources are further adding fuel to this fire – the AutoML revolution is here to stay.

Why is the interest in AutoML technologies exploding? What are the implications for organizations, especially small and midsized businesses? What’s in store for data science professionals?

The AI-driven intelligence revolution has been gathering momentum worldwide, with large organizations experimenting with AI and ML for quite some time. There has been a strong wave in the industry to build AI-driven predictive applications from predicting financial fraud, reducing unplanned downtime, and forecasting demand. That implied hiring an army of data scientists. A LinkedIn survey from July 2018 reported a shortage of about 150,000 data scientists in the US. As organizations were moving forward with digital transformations, powered by data science and machine learning, implementing data science into various aspects of their businesses was proving difficult. Also, the vast majority of data science opportunities were with large organizations. Small and midsized companies lacked the data infrastructure and did not have the financial muscle to hire a legion of data scientists. Another critical challenge was the sheer amount of time needed to complete AI and ML projects, usually in months, and the incredible lack of qualified talent available to handle such projects, managing data pipelines and putting models in production. 

Given the variety of data science projects and the substantial amount of data manipulation required, building and retaining this team is an arduous task. AutoML addresses some of the biggest challenges of data science – automating the laborious, iterative necessary steps in building machine learning models, eliminating human errors, and reducing the time it takes to construct production-ready models.  

Data science projects require an interdisciplinary team of data scientists, ML engineers, software architects, BI analysts, and subject matter experts. As the following illustration of a data science workflow shows, data scientists spend most of their time in data preparation, modeling, and parameter tuning. The advent of AutoML tools has changed the mix of talent needed to execute data science projects.

Blog illustration - dotData.com
Source: Mckinsey Analytics


The first-generation AutoML platforms, aka AutoML 1.0, were designed to build and validate models automatically. While still being used today, these traditional platforms automate only the machine learning component of the process but do not address data preparation and feature engineering, the holy grail of data science. The next-generation platforms, aka AutoML 2.0, include end-to-end automation from data preparation, feature engineering to building and deploying models in production. These new platforms help development teams reduce the time required to develop and deploy ML models from months to days. AutoML 2.0 platforms address hundreds of use cases and dramatically accelerate enterprise AI initiatives by making AI/ML development accessible to BI developers and data engineers while also accelerating data scientists’ work.

By using Full Cycle Data Science Automation, enterprises don’t have to invest in as many skilled data scientists or teams of engineers for each project. AutoML 2.0 also empowers the so-called “citizen” data scientists bringing AI to the masses. Interpretable features help organizations stay accountable for their data-driven decisions and meet regulatory compliance requirements.  This allows domain-experts to interpret models more quickly, especially with transparent outcomes, increasing the process’s effectiveness and efficiency. This “democratization” of AI provides a unique opportunity for enterprises of all sizes to integrate machine learning into business applications with the shortest time-to-market. Empowering existing BI and data professionals enables businesses to eliminate data science dependency and address the digital transformation roadblocks in terms of costs, ROI, and scalability. By augmenting BI with AI, even smaller businesses can reduce operating costs, improve quality, reduce churn, and generate new revenue streams.

So what does that mean for the future of data scientists? The thought that AutoML will bring the end of data scientists as a profession is exaggerated. Data scientists will always be in demand to handle very complex, unique use cases that are mission-critical or require very high accuracy at the expense of interpretability. Mckinsey team is spot on in their analysis that purely technical data scientists will still be needed over the long term, but simply far fewer than most currently predict. Analytics experts estimate that demand for AutoML practitioners is likely to be twice as high as demand for data scientists as companies build out their talent strategies with both levels of expertise over the next five years.

Sachin Andhare
Sachin Andhare

Sachin is an enterprise product marketing leader with global experience in advanced analytics, digital transformation, and the IoT. He serves as Head of Product Marketing at dotData, evangelizing predictive analytics applications. Sachin has a diverse background across a variety of industries spanning software, hardware and service products including several startups as well as Fortune 500 companies.

dotData's AI Platform

dotData Feature Factory Boosting ML Accuracy through Feature Discovery

dotData Feature Factory provides data scientists to develop curated features by turning data processing know-how into reusable assets. It enables the discovery of hidden patterns in data through algorithms within a feature space built around data, improving the speed and efficiency of feature discovery while enhancing reusability, reproducibility, collaboration among experts, and the quality and transparency of the process. dotData Feature Factory strengthens all data applications, including machine learning model predictions, data visualization through business intelligence (BI), and marketing automation.

dotData Insight Unlocking Hidden Patterns

dotData Insight is an innovative data analysis platform designed for business teams to identify high-value hyper-targeted data segments with ease. It provides dotData's hidden patterns through an intuitive, approachable interface. Through the powerful combination of AI-driven data analysis and GenAI, Insight discovers actionable business drivers that impact your most critical key performance indicators (KPIs). This convergence allows business teams to intuitively understand data insights, develop new business ideas, and more effectively plan and execute strategies.

dotData Ops Self-Service Deployment of Data and Prediction Pipelines

dotData Ops offers analytics teams a self-service platform to deploy data, features, and prediction pipelines directly into real business operations. By testing and quickly validating the business value of data analytics within your workflows, you build trust with decision-makers and accelerate investment decisions for production deployment. dotData’s automated feature engineering transforms MLOps by validating business value, diagnosing feature drift, and enhancing prediction accuracy.

dotData Cloud Eliminate Infrastructure Hassles with Fully Managed SaaS

dotData Cloud delivers each of dotData’s AI platforms as a fully managed SaaS solution, eliminating the need for businesses to build and maintain a large-scale data analysis infrastructure. This minimizes Total Cost of Ownership (TCO) and allows organizations to focus on critical issues while quickly experimenting with AI development. dotData Cloud’s architecture, certified as an AWS "Competency Partner," ensures top-tier technology standards and uses a single-tenant model for enhanced data security.