Data Science Operationalization: What the heck is it?
- Thought Leadership
Data science operationalization, in concept, is simple enough: Take Machine Learning (ML) or Artificial Intelligence (AI) models and move them into production (or operational) environments. In the words of Gartner Sr. Analyst Peter Krensky, data science operationalization is the “…application and maintenance of predictive and prescriptive models…” In practice, however, operationalizing ML and AI models can be a complicated and often overwhelming challenge. In a broader concept, one of the biggest challenges of operationalization is that AI and ML models get integrated with systems that contain live data that changes quickly. For example, if your model is designed to predict customer churn, your data science operationalization process needs to be integrated with your CRM system to predict churn effectively as your data volumes grow.
There are four critical aspects of data science operationalization that make it challenging to implement. First, is the quality of code. Because data scientists use tools like Python and R to develop models, the code is often not of “production quality.” Moving the code to production means that a fair amount of rework has to take place to re-code the models using SQL code that is native to the production database.
The second problem is the integration challenge. Integrating data and scoring pipelines with the multitude of systems that are often associated with data science projects requires a lot of integration work that is time-consuming and highly technical.
Even when models are appropriately integrated, they must be maintained. Accuracy of metrics and model prediction accuracy must be continuously monitored, and models need to be adjusted over time as data changes. This process involves retraining models regularly, which is time-consuming and expensive.
Data science models often rely on a tiny subset of the full available data set. In a churn model, for example, the models might be developed on less than 40% of the available data, but in production, the models need to scale to process 100% of available customer data to predict churn. Another aspect of scalability is the ability of the server to scale up and down depending on the level of power required. Many customers underestimate the computer power required and have problems when ML models break or fail.
In most organizations, the data science team uses software tools and configurations that are often markedly different from production environments. That means that taking models developed by data scientists and operationalizing them entails porting the code to platforms and systems not initially taken into account during model development.
The answer to the many challenges of operationalizing AI and ML models is automation. By using API-based integration, AutoML platforms can accelerate AI and ML model development through automation and can alleviate the operationalization headaches associated with moving models into production. By using a standard approach to deployment, using container technology (Docker) will address compatibility and porting challenges.
Want to learn more? Download our complimentary white paper on data science operationalization and learn how you can take the headaches out of your data science process today.