Video: Ai4 Finance- AutoML & Beyond – Part 2

AI Automation requires Feature Engineering Automation

From the 2019 Ai4 Finance New York conference, Ryohei Fujimaki (PhD and CEO/dotData) discusses AutoML in the financial industry.

Video Transcript: AI4 Finance, AutoML & Beyond – Part 2

This is a concept of AutoML and what we can do beyond what we know in an automated fashion. So there are 20 minutes. So I want to take a little bit of time to explain our view of how automation can help you at a very high level. So we believe those are the four aspects for pillars where automation can help you.

One is Accelerate, and another one is in democratize argument and operationalize. To accelerate, this part is kind of, uh, easy to understand because this is automation technology. So you can, uh, turn around time of data science machine learning project is significantly shorter. Well, basically, this is very important for even very experienced data scientists.

Because eventually even if you have an automation tool, you need to do a lot of trial and error processes. Because in many cases you find a lot of ways to solve the one problem, but you don’t know how to do that, or you have a lot of problems you can solve. For example, If you are building a customer scoring model and a domain expert, provide customer segmentation points or maybe gender age, simple segmentation, or more complex segmentation.

Do you want to build a model per each segment or do you want to build one global model across all segments? Basically, you don’t know. You have to try. Or one problem can be formulated [as a] classification problem or a regression problem, which is a better way? You have to try. So there is a law or idea you can try, but you know, automation can help you to accelerate and you’ll try and run a process to find the best way to solve the problem, or to allow you to solve more problems.

So, this is one way that we believe automation is going to help you. Another part is democratizing. So today, in the poll not many people answer that have you have a challenge to hiring data scientists, but many, organizations are still struggling to have a strong data science team. Basically, democratizing the data science concept is to allow more people to execute a data science project, machine learning projects, and basically, one-third of this room are data scientists. So democratized data science is not only for BI or data analytics type of people. If we can allow them or enable them to execute basic machine learning projects, the experienced data science team can really focus on something important, something more technical that is challenging because a lot of data science teams are currently, handling too many [types of] work, data engineering work, or a lot of a templated the projects.

But these projects even a non-data scientist is going to be able to solve using automation, but the data science team can really focus on very challenging problems. So democratization is not only for someone who is not familiar with machine learning, but eventually, our view is, is that there is no “citizen data science team” or data science team, but it’s overall data science or machine learning practice as an organization.

And this is something we are really proud of, so automation is not something to replace domain expertise. It cannot. That domain expertise is your core business insight. That is extremely important, but what automation can do is to expand this insight because in one project, maybe you can explore only hundreds of hypotheses, but automation can explore millions of features.

And of course, there are you know, only so many features that are only statistically relevant and not so meaningful for business, but there’s a lot of features our customers found that are very interesting discoveries. So just imagine you can use more data that you have not yet fully used or what kind of insight you can discover If you can use tons of tables in one project.

The last apart, I’m not going to dive into the detail in this presentation, but this part is an increasingly important area in any AI machine learning, because. A lot of machine learning project is still experimental environment in a lab. It has to be integrated into a business environment, the business system operated by the business.

So operationalization is one of the key areas and it’s not very easy because typically a data science lab versus production environment has a fairly different types of requirements. Even data science teams do not recognize this difference when they are building a model and when it comes to production there are scalability or maintainability issues, different types of problems appear.

This is a typo – does AutoML replace your data scientist? I have kind of already answered this question. Basically. It’s not something automation is not something to replace expertise, expertise, and automation we have to distinguish them.

So this is a way to see automation and manual, mixing automation with your expertise. So if we are looking at the feature, is there an interpretability dimension under the predictability dimension So more predictable feature vs. a more interpretable feature. These are two directions are not necessarily alligned. Yello triangles are features based on domain expertise and blue dots are features that are automatically generated. Basically, the ideal situation is where we have a lot of features in this area, highly predictable and highly interpretable, but it’s not always the case.

Some features are very, very strongly correlated with the target variable, but it’s just a, for example, pseudo-correlation and it’s not necessarily real. Or there is some highly interpretable feature that you really like, It matches intuition, but it’s not as predictable as the other features.

Yeah. Predictive power, predictive power.

So there are different features and you have to choose and combine these features and build a machine learning model on it. Once it comes to the model, again, the same demands can be applied. Interpretability or predictive power and there are certain areas, we call this automated decision making this area, usually, we need less interpretability because anyway, decisions s automatically happening and interpretability is less important. It doesn’t mean that we don’t need interpretability because if the prediction makes some wrong decisions, you need some retrospective analysis or even automated, decision-making interpretability is very important in the enterprise.

On the other hand, there is another area called machine-guided decision-making, which means that for example, you have field salespeople and do you build a scoring model – Oh, your customer is going to churn. So why don’t you follow up? 80% of score – field sales cannot trust that you cannot say “because deep learning says”, we need to clearly show, okay these are the key factors that machine used to produce this score. That’s why the machine believes that your customer is going to churn. 

So this type of case. Basically, in these two areas, the difference is who is going to make the final decision. Automated decision-making is more like a policy with a decision being made automatically based on policy. On the other hand, machine-guided decision-making is more like information to the expert, and the final decision is made by experts. In this area, we need more interpretability.

The last kind of are is extreme, the data science project does not require any prediction. They are trying to find insights and discovery through the prediction model. So this knowledge discovery area, usually interpretability is much more important than prediction itself.

So eventually you have to identify what are your use cases and what, you know, and how you can combine the information and your expertise? Basically, it’s a best practice to include the expert and automation in the same loop. So there is a project that is templated, easy to solve. Maybe automation can automatically solve it,  or, as I said, your domain experts are one of the most important assets. So that’s what we call domain features. So combine domain features with automatically generated features, and [mix the] best feature set by leveraging both capabilities, your expertise, and automation. Once you combine these two, then automation is going to do a very good job. This type of project in our past experience AutoML is going to do better. Because it can search more hyper-parameter space and fine-tuning.  On the other hand, in many regulated industries, you cannot use AutoML. You have to carefully, manually choose the features, and control the model. In that case, you still can take advantage of automatically generated features, but you have to manually customize the model. So this is not a question of which is good and which is bad, but very different cases, so basically, the best practice is to include both capabilities your expertise, and automation.

One thing, dotData is named a leader this is important, one thing is Forester just published in May. This new report is very focused on thisAutoML, very focused on automation-focused machine learning solutions, which means that attention and importance of AutoML are rapidly increasing and this technology area is going to be very mature in the next two to five years. So, a lot of people have already started to adopt AutoML and if you are interested in this area, come to us and we can discuss [this more].