Thought Leadership

The 10 Commandments of AI & ML (P1)

Building an enterprise AI application is problematic. Scaling ML and getting user adoption is even more challenging. Here are the rules to follow.

The big day is finally here! After months of squabbles, discussions, and fistfights, you have managed to get the budget allocated for evaluating machine learning. You have identified the right AutoML tool and are eager to get started on your first proof of concept. Where do you begin? This project’s success will determine the course of the digital transformation journey, production roll-out at other business units, and perhaps earn you a fancy new title! If you want to do ML the right way, you need to follow a set of commandments. In this two-part blog series, we’ll cover the ten rules. Here are the top 5 :

  • Start with critical objectives for the project, and think about alignment with business goals.  Identify top use cases, key performance metrics that you want to improve, and how the project can add value to the business using AI and ML. For example, banks are using AI to process large volumes of data, automate decision making to prevent financial crime, and reduce customer churn. A typical machine learning use case predicts which customers are best suited for first-time mortgage loans and automatically comes up with ideal product offerings based on the historical data. If you can improve KPIs for the business or function unit, you will always find sponsors for the project.
  • It’s all about the data, so start working on data preparation in advance. Access to clean, meaningful data representing the problem at hand is critical for AI initiatives’ success. Ensure you have the right data infrastructure and compute services to wrangle data, store data, and have thoroughly cleansed data accessible for analytics and ML. A typical enterprise data architecture should include master data preparation tools designed for data cleansing, formatting, and standardization before storing the semi-cleansed data in data lakes and analytics-ready data in data marts. Do not overlook data quality, data management, and governance issues since they can derail any AI and ML initiative.
  • Feature engineering (FE) is often the most demanding and most time-consuming part of ML workflow. FE is the process of applying domain knowledge to extract analytical representations from raw data, making it ready for machine learning. It is the first step in developing a machine learning model for prediction. It starts from many tables spread across disparate databases that are then joined, aggregated, and combined into a single flat table using statistical transformations and/or relational operations. For organizations without a big data science team, a data science platform that automates FE can be the most significant difference between success and failure. AI-powered FE can enable any user to build the right features, test hypotheses automatically, and iterate rapidly.  FE automation solves the most significant pain point in data science.
  • Make sure you understand AutoML tools and their capabilities. Educate the team about AutoML and set the right expectations. Traditional AutoML works by selecting the algorithms and building ML models automatically. In the early days of AutoML, the focus was on building and validating models. The next generation AutoML 2.0 platforms include end-to-end automation and can do much more – from data preparation, feature engineering to building and deploying models in production. These new platforms help development teams reduce the time required to develop and deploy ML models from months to days. AutoML 2.0 platforms address hundreds of use cases and dramatically accelerate enterprise AI initiatives by making AI/ML development accessible to BI developers and data engineers while also accelerating data scientists’ work.
  • Strike the right balance between prediction accuracy and model interpretability by selecting the right modeling approach. Generally speaking, higher accuracy means complex models that are hard to interpret.  Easy interpretability means using simpler models, but that comes by sacrificing a little bit of precision. Traditional data science projects tend to adopt the black-box modeling that generates minimal actionable insights, lacks accountability, and creates a transparency paradox. The solution to the transparency paradox is using a white-box approach. White-box modeling implies developing transparent features and models that empower your ML team to execute complex projects with confidence and certainty. White-box models (WBMs) provide clear explanations of how they behave, how they produce predictions, and what variables influenced the model. Explainability is very important in enterprise ML projects. By giving insight about how the prediction models work and the reasoning behind predictions, organizations can build trust and increase transparency.

In the second part, we’ll cover the remaining five rules. Stay tuned!

In the meantime, learn more about the entire ML process including data collection, last-mile ETL and feature engineering in our white paper, why machine learning automation alone is not enough and why it is vital to the entire process.

Sachin Andhare

Sachin is an enterprise product marketing leader with global experience in advanced analytics, digital transformation, and the IoT. He serves as Head of Product Marketing at dotData, evangelizing predictive analytics applications. Sachin has a diverse background across a variety of industries spanning software, hardware and service products including several startups as well as Fortune 500 companies.

Recent Posts

dotData Insight: Melding the Power of AI-Driven Insight Discovery & Generative AI

Introduction Today, we announced the launch of dotData Insight, a new platform that leverages an…

1 year ago

Boost Time-Series Modeling with Effective Temporal Feature Engineering – Part 3

Introduction Time-series modeling is a statistical technique used to analyze and predict the patterns and…

2 years ago

Practical Guide for Feature Engineering of Time Series Data

Introduction Time series modeling is one of the most impactful machine learning use cases with…

2 years ago

Maintain Model Robustness: Strategies to Combat Feature Drift in Machine Learning

Introduction Building robust and reliable models in machine learning is of utmost importance for assured…

2 years ago

The Hard Truth about Manual Feature Engineering

The past decade has seen rapid adoption of Artificial Intelligence (AI) and Machine Learning (ML)…

2 years ago

Feature Factory: A Paradigm Shift for Enterprise Data

The world of enterprise data applications such as Business Intelligence (BI), Machine Learning (ML), and…

2 years ago