While most of the attention in the world of AI and Machine Learning is on the algorithms themselves, most data scientists often worry not about the outcome, but instead on the steps involved in arriving at that outcome. The reason for this is simple: building AI and ML models is tedious, complicated, requires a multitude of subject matter experts, and is a highly manual process. In our blogs, we have often highlighted the multiple steps necessary to build useful AI and ML models through data science. Today's article focuses on what data science teams can do to accelerate the building of models, while still achieving the goal of building valuable AI/ML models. As a refresher, below is an illustration of the complexity and multi-step nature of the data science process. To understand the benefits of automation in data science, we first have to know where the most manual work is…
continued from last week's post... dotData, Data Science Without The Headaches dotData is a brand new breed of AutoML product that provides what we call Full Cycle Data Science Automation. At the heart of our vision is the idea that the data science process should be fast, easy to perform, and easy to analyze and deploy, from raw business data to the business values. Our vision has led us to develop dotData Enterprise and dotData Py, two related platforms that leverage the same automation engine in uniquely different ways. dotData Enterprise is ideal for the citizen data scientist: fully automated, point-and-click driven, and ready to automate 100% of the data science process without requiring in-depth knowledge of how data science works. dotData Py, on the other hand, is ideal for data scientists. dotData Py provides a python library for Jupyter notebooks, one of the most popular data science platforms available.…
The end-to-end process for launching a data science project is daunting - and many enterprise projects never make it to production. The process is similar in most organizations and consists of: Data collection, last mile ETL, feature engineering, and machine learning. However, while the process is understood by most teams, the actual execution is very complex and involves a high-level of operational risk.We recently published a complete guide to operationalizing data science. In this guide, we identified five complex issues to be addressed, for a business to derive value from operationalizing data science. Highlights from the paper: Issue 1: Quality There are two groups in the data science process who are not aligned operationally:1) Data engineers build data pipelines with SQL or GUI-based tools, 2) Data scientists build machine-learning scoring pipelines using Python or R. Software engineers must often reimplement much of the work from these two groups before production…