Reflections from ODSC East 2021
The conference shines a spotlight on Text Analytics, MLOps, and Automated Feature Engineering.
This was the second year in a row that the premier data science conference went virtual due to the Covid-19 pandemic. Overall the experience was much better this year with a breadth of research topics as well as industry coverage from machine learning for Time Series Data, Transformers in natural language processing (NLP) to Deep Neural Networks for visual quality inspection in manufacturing. The sessions featured the latest advances in data analysis and feature engineering, explainable AI to state of the art in automated machine learning pipeline construction all the way to deploying models in production using MLOps tools. In case you missed ODSC, here are the key highlights:
- Text Analytics:
The exponential growth of unstructured data and managing it efficiently has been a big headache for enterprise customers. Text analytics seemed to be on top of mind evident across several sessions and discussions. Due to the huge amount of data, sifting through the data manually becomes almost impossible and automation is the key in enterprise text analytics. Text analytics is the process of transforming unstructured text into data to extract valuable trends and insights. Examples include customer review analysis, social media analysis, and other use cases that involve large amounts of text data. Text analytics (and sentiment analysis, which is using NLP and text analysis to analyze customer sentiment ) is helping businesses in several use cases across diverse industry verticals such as banking and financial services, healthcare and life sciences, eCommerce and retail, manufacturing and human resources, telecom and media, and many more applications.
- NLP and Image Recognition:
Dr. Sebastian Raschka, Professor, University of Wisconsin-Madison covered the advances in computing hardware, especially utilizing GPUs for deep neural network training, that make it feasible to develop predictive models that achieve human-level performance in various NLP and image recognition challenges. Dr. Raschka highlighted recent research and technology advances and trends of the last year(s) concerning GPU-accelerated machine learning and deep learning and focusing on the most profound hardware and software paradigms that have enabled it.
- Data Mastering and Analysis:
Dr. Michael Stonebraker, Professor, MIT CSAI, and Co-Founder, Tamr explained why the data accessibility gap exists and how decades of technologies have failed to address the challenge of large data volume and variety and unintended data silos. According to Dr. Stonebraker, traditional rules-based mastering for silo integration does not scale. The speaker argued why traditional MDM fails and why traditional solutions based on extract, transform, and load (ETL) and rules-based systems will never scale to the size of problems encountered in the enterprise and espoused that integration systems based on machine learning (ML) are the only viable alternatives.
- ML Operations & Production:
A recurring theme on all three days during ODSC was how to operationalize ML models and deploy them in a production environment aka MLOps and several vendors showcased MLOps platforms and solutions. Dr. Mansi Vertak, founder and CEO Verta discussed the tools and processes to support key needs of the ML operations lifecycle such as versioning, packaging, testing, deployment, and monitoring. How do you package models into formats that are ready for production, integrated with CI/CD platforms? How do you enable the start, running, and management of experiments and orchestration for ML practitioners, and how to provide a hassle-free path to deployment. The team at Algorithmia spoke about ML governance and what is required to manage it effectively. A focus on MLops is proof that more enterprise customers are moving from ML experiments towards production deployment.
- Feature Engineering: Feature engineering is the process of extracting relevant features (the input to algorithms) from raw data. Traditionally, it has been done manually (handcrafted) and hence takes so much time. Developing good features is time-consuming, difficult, and most of all requires domain knowledge. Dr. Lulu Liu, Solution Architect at dotData demonstrated how automation helps you extract the full potential of data, and how enterprise customers can leverage AI-features to augment and reinforce their AI/ML workflow. dotData’s CEO and Founder Dr. Ryohei Fujimaki gave a powerful keynote on accelerating machine learning feature engineering through automation. In his talk, Dr. Fujimaki discussed the future of Feature Engineering and how automation can provide a rapid means of testing use-cases, building feature pipelines, and creating a framework that helps organizations build ML/AI models in days – instead of months.
One takeaway from ODSC is that as a data science champion, you need to assess the AI maturity of your organization and invest in areas where you are weak, and augment capabilities that are critical to scaling data science practice. For some businesses that may be data engineering and for others it may be MLOps. Good machine learning algorithms don’t guarantee good models. Great models need great features and organizations with strong data science practice can scale AI/ML applications by leveraging automated feature engineering. If you are interested in learning more about automated feature engineering, here is the link to the video from ODSC: VIDEO | Automated Feature Engineering