Shattering 5 Misconceptions about Automated Machine Learning

Ask data engineers about the most frustrating part of their job and the answer will most likely include “data preparation.” Talk to a data scientist about the AI/ML workflow and what bogs them down, the answer invariably will be feature engineering.
Analytics and data science leaders are well aware of the limitations of current AI/ML development platforms. They often lament about their team’s ability to only manage a few projects per year. BI leaders, on the other hand, have been trying to embed predictive analytics in their dashboards but face the daunting task of learning how to build AI/ML models. Automated machine learning (AutoML) was built specifically to address some of the challenges of data science – the underlying practice at the heart of both problems.
Like every new technology, there is a lot of confusion surrounding AutoML. Here are the top 5 misconceptions about AutoML:

1. AutoML means selecting the algorithms and building ML models automatically: In the early days of AutoML, the focus was on building and validating models. But the next generation AutoML 2.0 platforms include end-to-end automation and are able to do much more – from data preparation, feature engineering to building and deploying models in production. These new platforms are helping development teams reduce the time required to build and deploy ML models from months to days. AutoML 2.0 platforms address hundreds of use cases and dramatically accelerate enterprise AI initiatives by making AI/ML development accessible to BI developers and data engineers, while also accelerating the work of data scientists.

2. Feature Engineering (FE) implies selecting features once they are manually built: FE involves exploring features, generating and selecting the best features using relational, transactional, temporal, geo-locational or text data across multiple tables. Many traditional AutoML platforms require data science teams to generate features manually, a very time-consuming process that requires a lot of domain knowledge. AutoML 2.0 platforms provide AI-powered FE that enables any user to automatically build the right features, test hypotheses and iterate rapidly. FE automation solves the biggest pain point in data science.

3. Traditional AutoML platforms can ingest raw data from enterprise data sources to build ML pipelines: A typical enterprise data architecture includes master data preparation tools designed for data cleansing, formatting and standardization before the data is stored in data lakes and data marts for further analysis. This processed data requires further manipulation that is specific to AI/ML pipelines including additional table joining and further data prep and cleansing. Traditional AutoML platforms require data engineers to write SQL code and perform manual joins to complete these remaining tasks. AutoML 2.0 platforms, on the other hand, perform automatic data pre-processing to help with profiling, cleansing, missing value imputation and outlier filtering, and help discover complex relationships between tables creating a single flat-file format ready for ML consumption.

4. Model Accuracy is more important than feature transparency and explanation: This depends on the use-case and there needs to be a balance between accuracy and interpretability. Many ML platforms and data scientists create complex features that are based on non-linear mathematical transformations. These features, however, cannot be logically explained. Incorporating these types of features leads to a lack of trust and resistance from business stakeholders and, ultimately, project failure. In the case of heavily regulated industries such as financial services, insurance and healthcare, feature explainability is critical.

5. AutoML is not for BI teams and requires a data science background: First-generation AutoML platforms were cumbersome, lacked user experiences for BI developers and provided challenging workflows. Even today many AutoML platforms are geared towards data scientists and require a strong ML background. AutoML 2.0 has unleashed a revolution by empowering citizen data scientists – BI analysts, data engineers and business users to embark on data science projects without requiring data scientists. AutoML 2.0 is the secret weapon the BI community can leverage to build powerful predictive analytics solutions in days – instead of the months typically associated with Augmented Analytics.

Learn more about dotData:
dotData Enterprise
Why dotData
Why AutoML 2.0

Sachin Andhare

Sachin is an enterprise product marketing leader with global experience in advanced analytics, digital transformation, and the IoT. He serves as Head of Product Marketing at dotData, evangelizing predictive analytics applications. Sachin has a diverse background across a variety of industries spanning software, hardware and service products including several startups as well as Fortune 500 companies.