Types of Temporal Data - Feature Engineering

Maintain Model Robustness: Strategies to Combat Feature Drift in Machine Learning

June 7, 2023


Building robust and reliable models in machine learning is of utmost importance for assured decision-making and resilient predictions. While accuracy remains a desirable trait, the stability and durability of these models take precedence in ensuring long-term efficacy. A crucial aspect that bolsters model reliability is the stability of the features, where consistency over time can drastically improve the model’s robustness.

In this article, we will explore the concept of feature drift and analyze methods to maintain feature stability, thereby enhancing the model’s overall robustness and resistance to fluctuating conditions, even at the cost of marginal reductions in accuracy.

Challenges of Feature Drift

The Challenges of Concept & Data Drift

The evolution of features, more commonly referred to as feature drift, is typically categorized into two distinct types: data drift and concept drift.

Data Drift

Machine learning models are trained to minimize overall input errors, causing the models to be more inclined toward fitting the majority of the data. If the input feature distribution (P(x)) shifts, altering the range that constitutes the majority of the data, it can adversely affect the models. Extrapolation is one example of such a problem. 

Concept Drift

On the other hand, concept drift is concerned with changes in the relationships between input features and the target variable (P(y|x)) over time. Any alteration in these relationships can directly impact the model. The essence of machine learning algorithms lies in modeling patterns between features and the target variable. Therefore, if these relationships change, the performance of machine learning models can be adversely affected, potentially causing them to underperform.

Techniques to Mitigate Feature Drift

To effectively manage feature drifts and maintain the enduring performance of machine learning (ML) models, attention must be paid to two crucial stages: the model development phase and the model serving phase. Both phases require targeted strategies to mitigate the impact of feature drifts. Let’s delve into specific techniques tailored for each stage to combat feature drifts and ensure the sustained efficiency of our ML models.

Model Development Phase

Assess Feature Stability through Time-based Partitioning

Time-based partitioning can be an effective strategy to ensure the stability of features and the robust performance of your models in real-world operations. This involves evaluating drifts over time using historical data and making adjustments accordingly.

Assume your prediction system retrains the model monthly using a feature set based on the preceding three months’ data. By splitting the data into train and test sets, you can gauge the model’s performance and spot potential drifts.

Model Development Phase Timelines

To mitigate concept drift, monitor the model’s accuracy over time and look for sudden performance declines. Simultaneously, investigate the changes in feature importance over different months. Notable fluctuations in model accuracy or feature importance could signal potential concept drifts. If certain features are causing these drifts, consider excluding them from the feature set.

Identifying data drift necessitates the comparison of feature distributions across various temporal intervals. There are many techniques to facilitate this process, with the Kolmogorov-Smirnov test (KSTest) and the Population Stability Index (PSI) that are commonly utilized in the field. 

The KSTest is a statistical test utilized to evaluate whether there is a significant divergence in the distributions of two datasets. The derived p-value conveys the probability of the two distributions being identical. Consequently, a lower p-value suggests a heightened possibility of data drift. Conversely, PSI quantifies the discrepancy between expected and observed frequencies within defined bins or categories. A higher PSI value implies a larger divergence between the distributions.

It’s critical to understand that while the KSTest and PSI serve as valuable tools in identifying features of concern and detecting data drift, they don’t necessarily reflect the quality or relevance of the features. For example, a feature indicative of seasonal sales might show data drift during specific periods but still hold predictive value for forecasting demand. Consequently, in collaboration with domain experts, a thorough review of features demonstrating data drift is essential to comprehend the context and ascertain their significance.

Incorporate Shadow Feature Comparison in Feature Selection

The shadow feature comparison technique is a robust tool for assessing the stability of individual features, especially in scenarios where the dataset is cluttered with noise or contains a small sample size. Features might inadvertently exhibit a high importance score in such situations due to statistical randomness. The underlying principle of shadow feature comparison involves benchmarking original features against randomly generated counterparts, known as shadow features. Any feature that is less significant than its shadow counterparts is subsequently eliminated. While the shadow feature comparison technique is versatile enough to address broader issues of feature instability, it also offers substantial utility in mitigating feature drifts. Boruta is one of the algorithms that implement the shadow feature comparison technique alongside statistical verification and is available in sci-kit-learn.

The implementation of this concept can be distilled into the following steps:

  1. Prepare a candidate Feature Set: Commence with a set of features to evaluate.
  2. Generate Shadow Features: Randomly permute the values of the original features to create a set of features with no inherent relationship with the target variable.
  3. Use an ML Algorithm for Feature Selection: Apply feature selection to both the original and shadow features using your selected machine learning algorithm. Repeat this multiple times with different random seeds to yield numerous selection results.
  4. Count Feature Selection: Record each feature’s selection frequency across iterations. This count represents each feature’s significance compared to the shadow features.
  5. Compare and Remove: Lastly, remove features selected fewer times than the maximum selection count of the shadow features. These are considered statistically insignificant or less important.
Incorporating Shadow Feature Comparison in Feature Selection

Inference and Prediction Phase

At the inference phase of the machine learning model lifecycle, where a feature set is fixed for each model, monitoring predictive model performance and feature stability is critical for maintaining continued efficacy. To this end, the implementation of the following strategies is recommended:

Build Automated Alert Systems:

Implement automated monitoring systems that continuously track model performance and feature stability. Set up thresholds for various metrics such as accuracy, KSTest, and PSI. When these metrics deviate beyond predefined limits, generate alerts to notify data scientists and practitioners. These alerts act as early warning signals, prompting an immediate investigation and appropriate actions to address any identified issues.

Schedule Regular Model Retraining:

Schedule regular model retraining intervals to keep the models up to date. Models can maintain their relevance and accuracy by incorporating new data and adapting to changing patterns or trends. While retraining machine learning models can adjust ML model parameters, consider re-designing a feature set if the features’ characteristics change or you find some features unstable. This ensures the models stay aligned with the evolving data they are trained on.

Collaborate with Domain Experts:

Engage domain experts throughout the monitoring and maintenance process. Domain experts possess valuable contextual insights and domain knowledge that can aid in interpreting monitoring results and identifying the root causes of any issues. By collaborating with them, data scientists can better understand the domain-specific factors that may impact model performance. This collaboration helps guide necessary adjustments, recalibrations, or feature engineering to address issues effectively.

By implementing these strategies, data scientists and practitioners can proactively monitor the performance of predictive models and features, enabling them to promptly detect and promptly address any degradation, concept drift, or feature instability. This proactive approach ensures that the models remain reliable, accurate, and aligned with the evolving dynamics of the operational environment.


In this article, we emphasized the significance of evaluating feature stability and monitoring predictive models and features for maintaining the robustness of predictions. We highlighted the key strategies required at each phase of the process.

dotData’s Platform incorporates techniques introduced in this article and helps data scientists and analysts maintain stable features and models in model development and inference phases. Learn more about how your organization could benefit from the powerful features of dotData by signing up for a demo.

Share On

Yukitaka Kusumura

Yukitaka Kusumura

Yukitaka is the principal research engineer and a co-founder of dotData, where he leads the R&D of AI-powered feature engineering technology. He has over ten years of experience in research related to data science, including machine learning, natural language processing, and big data engineering. Prior to joining dotData, Yukitaka was a principal researcher at NEC Corporation. He led the invention of cutting-edge technologies related to automated feature engineering from various data sources and worked with clients as a data science practitioner. Yukitaka received his Ph.D. degree in Engineering from Osaka University.