SD Times podcast - "What the Dev?"

SD Times podcast – “What the Dev?”

March 2, 2021

Aaron Cheng, our VP of Data Science, discusses how adopting or evaluating AI versus actually benefitting from it can be far different challenges and what are the key things that can make AI adoption at organizations successful with the SD Times “What the Dev” podcast.

Podcast Transcript

You’re listening to what the dev the weekly podcast of SD times. And now here’s Jacob Lewkowicz, the online and social media editor at S T times. And today’s episode of what the dev we’ll be talking about, how adopting or evaluating AI versus actually benefiting from it can be far different challenges. And what are the key things that can be done to help make AI adoption at an organization?

With me today to talk about the matter, I have Dr. Aaron Chang, the Vice President of data science at dotData, a company that offers end-to-end data science automation for the enterprise. Welcome to the show, Aaron, and thanks for coming on. 

Thank you. 

Great to start off, can you tell me a bit about what you’re seeing when it comes to the rise of AI adoption accompanies our men are many currently looking into it moving forward, or have you seen that many have already adopted it and are looking to.

I think a lot of companies nowadays are really talking about AI and machine learning. A lot of companies. As a matter of fact, have probably started recruiting data scientists. They have started to place, some kind of data science organization. But if we are talking about how many companies have really successfully implemented AI machine learning models into their day-to-day business, I think that that number is actually [very small]. [a few years ago I saw] some kind of a report from Gartner [or another] market research company saying that 85% of data science and artificial intelligence projects failed.

And that 15%, as we all know, are the Googles [and] Amazons of the world, but not every company. So what are some of the main things that, are factoring into why they’re not getting the insights that they expect? Is it [that] the companies need a mindset change when they’re going into it? Do they expect everything to just work off right off the bat? Or is there a thing that, you know, the AI solutions could be doing themselves to better the business outcomes? 

The Challenge of Value 

I think [there are] a few things people could have done better. The first one is to define the right problem, to begin with. I’ve seen a lot of companies [that] are interested in machine learning because everybody else is talking about it. Everybody else is trying to do something about it. You know, when they started their machine learning AI journey, they did not start with something that has a true impact [on the] organization, but they just want to [perform] some experiments. So from that perspective, I’m sure they can generate some kind of models, generate some kind of predictions, numbers, equations, but you know what, when it comes to the applicability to their business to get the business team interested in those models to use them in their day-to-day business, it just very, very challenged.

So from that perspective of defining a relevant problem, clearly defining a problem that has a business impact. What I’ve seen so far is some of the customers who have been very successful in using our models, one thing that we are doing with them is to clearly articulate and quantify the business impact of the AI machine learning models from the get-go. When we are doing a project, rather than building a lot of predictions already, then we say, you know, what is the impact that we are talking about? People love numbers. If we put a number in front of the business team and tell them, this is the impact. Now we are talking about a dollar amount.

Think About Deployments

I do believe that [we call ourselves] scientists, right? So we have this mindset [of experimenting]. Not the mindset of deploying, but in order to generate value, in order to have the business agree to the value of our project, it’s very important that we put our model into production, but putting data science model into production is a different exercise. The reason is very simple. I think a scientist like myself, we can develop a lot of code, but we are not a software developer. I don’t think it’s a good idea to put my code directly into a production environment. So from that perspective, after a model is developed, there is another process to productionalize that data science model. And that can be very, very complex because that has a lot with the current way of [implementing software]. So from that perspective what I’ve seen with some of our customers is that they developed a model and then when they are thinking about how they can deploy the model, all of a sudden it’s something that [is new].

So that’s one piece of advice that I can give is when we are considering a solution, when we are planning for the project, we should always have just a deployment as a very important customer. As part of that plan without this, it’s going to be very, very difficult to get the model into production and to generate the business impact.

Avoding the AI Bandwagon

I see. Great. Yeah. And kind of going back a little bit to what we were talking about before, are you seeing that some companies are just kind of jumping onto the AI bandwagon? [Because] that’s, that’s the thing that everyone’s doing and is it really for everyone to try to adopt. Yeah. So I think the first question am I seeing a lot of companies just try to be part of it?

Yes, absolutely. Because sometimes, you know, when we talk to our customers, some of our potential customers, we ask them, what is the problem they’re interested in? They have no problem. Right? A lot of times they are just like, You know what Aaron – I’ve got 20 years’ worth of data collected from 20 different sources. I don’t even know what’s in the data. Can you guys just tell us what is in the data? Frankly, speaking for someone like us, we are data scientists. We don’t know their data. How can I tell you what’s in your data? You should have a business interest, to begin with, so that you can ask the right question. Then, data scientists, data science software can help you analyze the data from the perspective that you defined. We cannot just create magic from the data that is. So to answer the first question you’ve just mentioned. Yes, I’ve seen a lot of companies, that just want to be part of “AI” without clearly understanding the real business needs. The second question is, uh, you know, is AI or machine learning for everyone? I think in the long run, yes, it is indeed because this is where this world is headed. If you look at retail, even Walmart today, they want to replicate the success of Amazon. So right now, Walmart is trying to do the same. The point is. This is the direction everybody needs to go, following that direction so that they can be more successful, not only for today, but also for tomorrow, but the challenges of what is the meaningful project, what is the right thing they need to be doing?

AI vs Machine Learning

To get started with AI, to get started with some machine learning, that’s a different conversation. So again, like what I said, someone who really understands the business will understand the pain point and also someone who really understands the readiness for machine learning and data science projects. 

So just to be clear for our listeners, I’ve often heard before that the terms AI and ML were, often conflated, within a conversation. Is there a clear differentiating factor between the two that’s important to know when talking about this?

AI is the bigger category. You know, AI, it’s mainly about anything that has some interest in using data to drive some business outcome, machine learning is a type of solution within the artificial intelligence community. So there is some machine learning and there’s deep learning. Then we know we have traditional statistical modeling. If you use those techniques correctly, they can deliver AI solutions as well. 

Accuracy vs. Explainability in Models

Great. And can you tell me a bit about what these kinds of solutions are that are coming on now and about dotData’s solution, how does it help companies with the explainability factor? As you said before, AI algorithms are relatively hard to grasp, how does this kind of help them? 

So there are a few things I can think of. For data scientists, a lot of times when they are developing a model, their head is always about getting the highest, the most accurate model possible. But on the other hand, we all know that the most accurate model is typically the black box type of model, which is accurate, but you have no idea why. What is the internal transformation logic? it’s not like linear regression, for example, [So] we don’t understand what is the intercept, but what is a coefficient for each input parameter? And you can really take that model and go to business, validate each parameter, what’s this, does this make sense? Does this not, but for more complex, more accurate, more complex AI models, where there are 50,000 feeding parameters, right? When you are looking at the linear model, you’ve got five input parameters. So you want to see what those feeding parameters are because you can try to reason that. But when you are dealing with 50,000, it’s just overwhelming. There’s no way you can possibly validate that amount of information with the user.

So from that perspective, for a lot of businesses [that] are interested in implementing AI or machine learning, They’ve got to be very practical, to begin with. It is not when it comes down to a business option, adoption, it is never about the most accurate model. Never. That explainability is the key, right? We should not always say, oh, this model is not that accurate, let’s keep tuning it. Try to get something better. I think making perfect to be the enemy of a good is not the right thing to do in this space. I think what we’ve got to do is to start with something simple. Simple offers good explainability so that you can start validating some of that information, validating those assumptions. This is the right way to get started with some machine learning practice. And speaking of the data solution, you know, within our product, we actually offer two different kinds of models. One is the so-called white-box model which is more explainable. They are more transparent but make no mistake most of the time they are not as accurate as black-box models. But on the other hand, for a lot of large banks, insurance companies, the type of practice, which requires a lot of regulatory approvals, what we’ve seen is that the adoption of white-box models, you start seeing a lot easier compared to those a black box. So I just want the audience to really understand when you are selecting a model, do not always just go with the most accurate model, because that is not a very practical consideration in the downstream of the two.

What are you seeing that customers are kind of leaning towards more with regards to the white boxes are black boxes? 

Yeah, I would say it also depends on the application for a lot of them. For example, let’s say customer 360. For example, revenue, prediction, those kinds of projects that we do a lot, this kind of a more, more white box type of model is definitely adopted a lot more because those models, as I said, the [developer] wants the business users to understand this. They want to ask a lot of whys. They ask, they want to know what is the input parameter, what is transformation, logic going to that modeling process? So if you have to deal with those kinds of situations, definitely white-box models are a lot easier It just depends on the application. I would say, like for fraud detection, those kinds of use cases. If I were to look at the state-of-the-art fraud detection type of models, I think most of them are black-box models, but it’s okay because in that particular area for those kinds of use cases, the accuracy is probably the dominant evaluation criteria. So again, it really depends. 

Are you finding that there’s still a considerable learning curve when it comes to the business side, trying to understand how AI would be implemented or is this gap kind of being closed off? The learning curve is still very steep. Right? 

And I think that there are two kinds of learning curves. One is to understand the basic statistics concept to understand the basic machine learning concept. That one, frankly speaking, there are so many books. There are so many online resources, so many universities that even offer a lot of online classes. It doesn’t take a lot of time to really get to get over that learning curve. But the bigger learning curve, you know, uh, that I’m seeing right now is a kind of try to connect this a statistical concept to a real business problem. So this is where a lot of the users understand, for example, in machine learning in a classification use case, we have a concept called the precision-recall, right? They can define the concept of recall from a statistical point of view, but to relate that to a specific business problem and now articulate which matrix is more important from the business application point of view, sort of puts those in a classroom context. There’s some kind of real application that is actually much steeper of a learning curve because to get that part, right, you need to understand more than just statistics, a definition, but rather, what is the problem? What is the business problem we are trying to solve as well, and try to understand from the business point of view, what is the true interest? So that kind of learning curve is actually a lot steeper. 

The Role of AI in Business

I see. And you said, so it really comes down to kind of understanding the data that’s being fed to this AI model. And are you seeing that companies now are really now kind of implementing that into their, their business side?

Is that becoming a phenomenon? That’s more common. Yeah, I would, I would think so. I think typically speaking data is gold, right? You know, there’s a lot of information in the data. [Data] can provide something extremely powerful and that’s why generally speaking more data will produce something. And here, the more data, I think that there are two, two meanings. One is a more volume of data. I think more volume of data, generally speaking, will present a bigger amount of information for the model to learn and then to validate and then correct itself in order to produce a better prediction. So that’s one part of it, a lot of companies instead of using one year or data, three years of data, instead of using a hundred customers’ worth of the data, they’re trying to use all customers of data. So I think that that type of process is very intuitive for us to understand the other kind of more data is actually more dimensions of a data. Let me give you an example, you know, traditionally, if you want to predict which customer is going to buy a particular product. You would want to collect a lot of information about all different kinds of customers, all different kinds of products to build a great machine learning model, but nowadays there’s a new type of new approach to solving this problem. We are adding very different dimensions of information. For example, we probably want to add [weather information]. [When the weather is severe,] people may not go to a store to shop at all. It’s not just about this product versus that product. They just do not shop. The other one is you probably want to add to some kind of GDP information, right? Because when the country is rich, everybody has more money. They shop when the GDP is low, nobody’s going shopping. So the point is nowadays when it comes to machine learning modeling, it’s not just about historical [data]. More the more, the more the merrier, but rather try to use the different dimensions of information. So that’s something I think many companies should start to pay attention to. 

Wrapping it Up, Don’t Make Perfect the Enemy of Good

Yeah. And my last question is, are there any other things you’d like to add on how organizations can more successfully implement? 

Yeah, I think I would like to go back to one point I just made earlier. Do not let perfect be the enemy of good. I think AI, a lot of times it’s more than just how accurate your model is. A lot of times the adoption part is key. So many companies out there who want to get started, was there a journey? You know, if you have some kind of model developed, put it into production, start using it, start presenting this information to the business team, right? Let them get used to this new way of doing it. And collect their feedback and improve rather than just waiting until you’ve got the best model in the world. I think that in today’s world, it’s a different approach. We account always, either, the modeling part, we can always improve the accuracy, but to get the business, understand the new way of doing sense to get their feedback right. To get their buy-in that what you’re saying? No matter how long it’s going to take time.

So let’s get that, get to their input a lot faster before we invest too much time to perfect the models. Right? You know, in an enterprise data science space, frankly speaking, it’s all about using data science, using data to drive business outcomes, to drive the business value rather than, you know, creating the highest score to make yourself feel very happy. 

Well, thank you, Aaron, for coming to the show. It was great speaking to you.

Thank you. Great. 

And also thank you to our listeners of today’s show. Be sure to check out all of our episodes on your favorite podcast listening platform. So next time, this has been what the dev.


dotData Inc.

dotData Automated Feature Engineering powers our full-cycle data science automation platform to help enterprise organizations accelerate ML and AI projects and deliver more business value by automating the hardest part of the data science and AI process – feature engineering and operationalization. Learn more at, and join us on Twitter and LinkedIn.