How to Evaluate and Select the Right AutoML Platform
If you are in the market looking for automated machine learning (AutoML) tools, there are plenty of choices. Forrester Research recently published a report highlighting nine Automation Focussed Machine Learning Solutions and named dotData a leader. The report underscores the importance of Feature Engineering and Explainability as key differentiating factors for leaders in the AutoML space. But if you are new to machine learning or are part of a BI and analytics team with a mandate to incorporate predictive analytics, how do you decide which AutoML tool is right for you? What are some of the factors that you should consider as you make your decision?
The end-user & skill set
Any data science project is going to start with identifying business use cases and requirements. The process is also heavily dependent on the available resources of the business as well as the skill-set of the primary intended users. In order to make the best possible choice, organizations should start their evaluation by asking some fundamental questions:
- Who will be the primary intended users of the AutoML platform? The Data Science Team or the BI team?
- What are the skill-level and data science expertise of the primary user?
- Is the primary programming environment of the intended users Python?
The motivation for using an AutoML platform may be completely different depending on the user persona. If the intended users are data scientists, the primary environment is Python/R, then you need a platform that offers a great amount of customization. Advanced analytical developers and data scientists may want to use an AutoML platform to generate new features but prefer to tweak models manually. On the other hand, BI & analytics team may be struggling with the long lead times to prepare data, need help with algorithm selection and want to use a tool that automates the data science workflow.
The data science workflow
How much of this process do you need to automate?
Here is a quick rundown of major attributes to think through while evaluating an AutoML platform:
- Data Ingestion and Preparation:
How much manipulation of data must be performed before it is ready for ingestion by the AutoML platform? Can you upload data to the AutoML platform without having to write additional SQL code?
- Feature Engineering Automation:
How much manual work is involved in Feature Engineering? Can the system automatically explore all available database entity relationships and discover and evaluate features based on available columns and relationships?
- Machine Learning:
Does the system support state-of-the-art ML algorithms like scikit-learn, XGBoost, LightGBM, TensorFlow and PyTorch? Can the users perform an automated hyper-parameter search of ML algorithms?
- Production & Operationalization:
How easy is it to deploy ML models in a production environment? Can you monitor models, discover data drift, and quickly retrain models if production data changes over time?
Platform Accessibility, Ease of Use, and Deployment Flexibility:
Can all steps of the data science process be executed seamlessly within a single platform without the need for moving between systems and applications?
Last but not the least, is it easy for non-data scientists to understand the workflow of the application, the concepts, and steps necessary to proceed?
To learn more about Automation-Focussed Machine Learning Solutions, the Forrester Wave report is a great resource. For guidance on top factors to consider while selecting an AutoML platform , check out our latest AutoML Evaluation Guide here.
Learn more about dotData:
Why AutoML 2.0