How to Operationalize Data Science in the Enterprise: The Five Challenges to Address

The end-to-end process for launching a data science project is daunting – and many enterprise projects never make it to production. The process is similar in most organizations and consists of: Data collection, last mile ETL, feature engineering, and machine learning. However, while the process is understood by most teams, the actual execution is very complex and involves a high-level of operational risk.
We recently published a complete guide to operationalizing data science. In this guide, we identified five complex issues to be addressed, for a business to derive value from operationalizing data science.

Highlights from the paper:

Issue 1: Quality

There are two groups in the data science process who are not aligned operationally:
1) Data engineers build data pipelines with SQL or GUI-based tools, 2) Data scientists build machine-learning scoring pipelines using Python or R. Software engineers must often reimplement much of the work from these two groups before production can start.

Issue 2: Integrability

Data and scoring pipelines may have been developed and implemented on different technology platforms and are difficult – or impossible – to integrate.

Issue 3: Maintainability

Data science pipelines must be maintained. The traditional approach is to manually re-create the entire data science process, which increases the amount of maintenance efforts.

Issue 4: Scalability

Limited computation resources constrain data scientists to use smaller sample data sets, that do not represent the larger data sets needed for scoring, and the process may not be scalable.

Issue 5: Portability

Developing one data science process that works well for two different environments – development vs. production – is a nontrivial task.

Download the Paper

This white paper describes a holistic, platform-level approach to the problem of data science automation. To learn more, please check out the complete white paper here.

Walter Paliska

Walter brings 25+ years of experience in enterprise marketing to dotData. Walter oversees the Marketing organization and is responsible for product marketing and demand generation for dotData. Walter’s background includes experience with both software and hardware companies, and he has worked in seven different startups, including three successful bootstrap startups.

Next Happy 4th of July! - from all of us at dotData Inc. »

Previous « 2019: The Year of AI and Machine Learning

Published by

Walter Paliska

Tags: enterprise data sciencedata science processdata science project

7 years ago

Auto Loan Pricing Model Optimization: 5 Steps to Identify Mispriced Segments

Key Takeaways The yield spread serves as a crucial lever: In December 2025, the Dealertrack…

5 days ago

Industry Use Cases

Auto Loan Delinquency Rate at a 15-Year High: What Extended Terms Are Hiding in Your Loss Forecast

Key Takeaways Crisis-level delinquency on a larger balance sheet: The 90+ DPD auto loan delinquency…

2 weeks ago

Industry Use Cases

AI-Powered Credit Decisioning Engines and Platforms: What CROs Need to Evaluate Before Buying

Key Takeaways Adoption is ahead of delivery: Nearly two-thirds of financial institutions have not yet…

4 weeks ago

Industry Use Cases

Early Payment Default in Auto Lending: How Precision Underwriting Stops It Before Funding

Key Takeaways EPD Vulnerability: Conventional credit decisioning software looks at applicant data in isolated silos…

1 month ago

Industry Use Cases

A Guide to Modern Auto Credit Decisioning Software

The Macro Reality: Total U.S. auto loan debt has reached a historic $1.685 trillion, prompting…

2 months ago

Industry Use Cases

Why Aging Reports Can Drive Auto Loan Charge-Off

Key Takeaways The Velocity Problem: Traditional 30-60 DPD (Days Past Due) reports are lagging indicators…