Updated for 2022
According to a recent Gartner blog about analytics and BI solutions, only 20% of analytical insights will deliver business outcomes through 2022. Another article by VentureBeat AI reported that 87% of data science projects never make it into production. And a global survey by Dimensional Research concluded that 78% of their AI/ML projects stall at some stage before deployment. Even in 2022, as many as 68% of data scientists admit to abandoning 40% to 80% of their Data Science projects. These results indicate an exceptionally high failure rate across analytics, data science, and machine learning projects. There are many reasons why so many projects fail to meet their business objectives. In this blog, we look at the top practical challenges that enterprise AI projects face and how you can mitigate them:
- Start with business problems you need to solve
While AI is an incredibly powerful technology, it is not a panacea for every business problem. Building AI because everyone is doing it and throwing any problem without concrete objectives is a path to failure. AI is great at sifting through massive amounts of data, discovering patterns, and finding hidden insights that otherwise are not obvious. To get started, prioritize hard-to-solve, complex business problems with clear objectives and backing from line of business leaders who need to solve those problems. Assemble a cross-functional team of technical and functional experts, and ensure buy-in from domain experts. Finally, define success criteria and measure success with relevant key metrics. - Access to high-quality data
AI and Machine learning tools rely on data to train underlying algorithms. Access to clean, meaningful data that is representative of the problem at hand is critical for the success of AI initiatives. But, enterprise data tends to be biased, noisy, outdated, unstructured, and full of errors. Many companies lack data infrastructure or do not have enough volume or quality data. Others use antiquated error-prone manual methods for data preparation resulting in inaccurate data and ultimately wrong business decisions. A typical enterprise data architecture should include master data preparation tools for data cleansing, formatting, and standardization before storing the data in data lakes and marts. Data quality, data management, and governance issues are of paramount importance given the high reliance on good quality data and, if overlooked, can derail any AI and ML project.
- Data pipeline complexity
Data is spread across disparate databases in different formats, and you must blend and consolidate data from disconnected systems. The challenge is how to extract data, clean data, and reformat data to make it ready for predictive analytics. This processed data requires further manipulation specific to AI/ML pipelines, including additional table joining and further data prep and cleansing. The process requires data engineers to write SQL code and perform manual joins to complete the remaining tasks. This complex process of data ingestion, storage, cleansing, and transformation takes time and is a major bottleneck in scaling data science operations. New automation tools such as Feature Engineering platforms eliminate the complexity of the data pipeline by allowing data practitioners and data scientists to discover and evaluate features at scale. Through automation, these platforms transform the raw data into machine learning inputs, a.k.a. feature engineering, and produce predictions by combining hundreds of or even more features. - Balancing model accuracy and interpretability
There is a trade-off between prediction accuracy and model interpretability, and data scientists must do the balancing act by selecting the appropriate modeling approach. Generally speaking, higher accuracy means complex models that are hard to interpret. Easy interpretation means using simpler models, but that comes by sacrificing a little bit of accuracy. Traditional data science projects tend to adopt what is known as black-box approaches that generate minimal actionable insights resulting in a lack of accountability in the decision-making process. The solution to the transparency paradox is a new approach involving white-box models. White-box modeling implies generating transparent features and models that empower your AI team to execute complex projects with confidence and certainty. White-box models (WBMs) explain how they behave, produce predictions, and what variables influenced the model. WBMs are preferred in many enterprise data science use cases because of their transparent ‘inner-working’ modeling process and easily interpretable behavior. Explainability is very important in enterprise data science projects. Organizations can build trust and increase transparency by giving insight into how the prediction models work and the reasoning behind predictions. Machine Learning platforms automate the trade-off between accuracy and interpretability and give users the choice to select the right approach based on the use case. - Model operationalization and deployment
ML delivers value when a data scientist exports the final model from the Jupyter notebook to deploy it in production. Operationalization means that the model runs in a production environment (not a sandbox environment), is connected to business applications, and makes predictions using live data. This last mile deployment has been a slow, manual, and prolonged process rendering the models and insights obsolete. It can take anywhere between 8 to 90 days to deploy a single model in production. Irrespective of the AI and ML platform, it should provide end-points to run and control the developed pipeline and easily integrate with other business systems using standard APIs. There are several approaches to moving models into production. You need to think through batch vs. real-time prediction and consider whether real-time prediction service is feasible in terms of cost, infrastructure, and complexity. The deployment also includes monitoring the model performance, capturing the performance degradation, and updating models as necessary. Automation makes enterprise-level, end-to-end data science operationalization possible with minimum effort and maximum impact, enabling enterprise data science and software/IT team to operationalize complex data science projects. Every enterprise data science project should start with a plan to deploy models in production to capture the value and realize AI’s potential.