fbpx

Feature Engineering for Temporal Data – Part 1

  • Technical Posts

An overview of common time-series modeling techniques

Time-series (or temporal) data are among the most common and essential data types across enterprise AI applications, such as demand forecasting, sales forecasting, price prediction, etc. Analyzing time-series data helps enterprises understand the underlying patterns in their business over time and allows them to forecast what is likely to happen in the future (a.k.a. time-series forecasting). Developing good time-series models is an important and challenging process for enterprise data science and analytics teams. 

This blog series will overview different approaches to developing AI and ML models from time-series and temporal datasets. This first blog in a three-part series will review common time-series modeling techniques and discuss their characteristics, advantages, and limitations.

Standard Time-series Modeling Techniques

Autoregressive Model

The autoregressive (AR) model is one of the most straightforward and traditional time-series modeling techniques. The model is named autoregressive because it applies linear regression to predict future values based on previous observations of the target time series. AR models utilize the most recent values in the time series as their input to predict the next values. The prediction relies on the assumption that recent fluctuations can explain the behaviors of the time series. AR models are simple and suitable for understanding a basic concept of time-series modeling techniques. AR models, however, are very restricted to capturing complex time-series behaviors and, in practice, are rarely used in modern time-series forecasting projects because they tend to yield poor prediction results.

Autoregressive Integrated Moving Average (ARIMA Model)

Like the AR model, the Autoregressive Integrated Moving Average model (ARIMA model) predicts future values based on previous values but also uses integrated moving averages to smooth the time-series data. Integrated moving averages help condense the data and find the significant features. ARIMA is valuable for various applications, such as predicting weather patterns or fluctuations in sales for upcoming months. The downside of ARIMA is that it doesn’t work well with nonlinear dependencies, requires error to be independent and identically normally distributed, and cannot take advantage of the exogenous variables. 

Long Short-Term Memory (LSTM Model)

Long short-term memory (LSTM) is a recurrent neural network to solve sequential prediction problems. LSTMs include a so-called memory cell that can capture information for a more extended period. It also has three gates: input, output, and forgets gate, which manages how much data (if any) passes onto the next layer. LSTMs learn longer-term dependencies, but simultaneously, they can capture the short-term sequential behavior of the data. Although LSTMs often achieve higher prediction accuracy than traditional time-series models, one of the critical drawbacks is the lack of transparency and interpretability as all key factors (a.k.a. features) are implicitly encoded into complex and nonlinear hidden layers.LSTMs are available as OSS in common Deep Learning frameworks such as PyTorch and TensorFlow.

A representation of neural networks by dotData

Prophet Model

Prophet is an open-source library released by Facebook. Prophet is an additive model that considers nonlinear trends and fits them with seasonality (daily, weekly, monthly, yearly) and holiday effects that make Prophet powerful and practical with real-world data – particularly suitable for data with strong seasonality. A significant advantage of Prophet is that it doesn’t require considerable prior knowledge to create forecasts, and it also offers parameters that are simple to understand and easy to use.

On the other hand, some studies have shown it has significantly poorer performance compared to ARIMA and other models. Instead of looking for causal relationships between the past and the future data, it looks for the best curve to fit the data using a linear logistic curve.

Chart representing holidays in a Prophet model
Chart representing seasonality in a prophet modelf

Limitations of Time-series Modeling

Homogenous time resolution

Time-series modeling techniques require collecting time-series data at regular intervals. In addition, when handling multi-dimensional time series, all series must have the exact time resolution (e.g., hourly, daily, monthly). These assumptions/prerequisites hold in real-world time-series data (e.g., inconsistent sensor sampling rate). To apply time-series modeling techniques, you need intelligent and cumbersome preprocessing. 

Homogenous data characteristics

Time-series modeling techniques also implicitly assume homogeneous characteristics of the data. For example, they are often incapable of handling multi-dimensional time series with numeric and categorical sequences. Or let’s say we have two sensors collecting data, one collecting numerical information – like the number of people that entered a store in a given time frame –  while the other is collecting binary data like whether the air conditioner was turned on or off in the store at a given time frame.  It is possible to develop time-series models that can handle that heterogeneous information. However, every single case requires a costly customized model and algorithm.

Temporal Transactions and Events

Time-series data is a particular type of temporal sequence data, and time-series modeling techniques are specialized for time-series data. Temporal transactions are different types of data in enterprises. These additional temporal data make your models more accurate and advanced.  

Time series data used in a prediction model

Transparency and Explainability

One major issue with time-series modeling is the lack of transparency because the explanation variables (a.k.a. features) are integrated and encoded implicitly into the model. Transparency is vital in enterprise AI for interpretability, accountability, reproducibility, ethics, etc. This limitation becomes a significant drawback, especially for applications that understand influencing factors and control the time-series behaviors beyond producing accurate predictions.  

Summary

This blog summarizes standard time-series modeling techniques to build time-series forecasting models and discusses their characteristics and limitations. In part II, we will look at an alternative approach, i.e., feature engineering for a temporal dataset.

Learn more about how your organization could benefit from the powerful features of dotData by signing up for a demo.

dotData
dotData

dotData Automated Feature Engineering powers our full-cycle data science automation platform to help enterprise organizations accelerate ML and AI projects and deliver more business value by automating the hardest part of the data science and AI process - feature engineering and operationalization. Learn more at dotdata.com, and join us on Twitter and LinkedIn.

dotData's AI Platform

dotData Feature Factory Boosting ML Accuracy through Feature Discovery

dotData Feature Factory provides data scientists to develop curated features by turning data processing know-how into reusable assets. It enables the discovery of hidden patterns in data through algorithms within a feature space built around data, improving the speed and efficiency of feature discovery while enhancing reusability, reproducibility, collaboration among experts, and the quality and transparency of the process. dotData Feature Factory strengthens all data applications, including machine learning model predictions, data visualization through business intelligence (BI), and marketing automation.

dotData Insight Unlocking Hidden Patterns

dotData Insight is an innovative data analysis platform designed for business teams to identify high-value hyper-targeted data segments with ease. It provides dotData's hidden patterns through an intuitive, approachable interface. Through the powerful combination of AI-driven data analysis and GenAI, Insight discovers actionable business drivers that impact your most critical key performance indicators (KPIs). This convergence allows business teams to intuitively understand data insights, develop new business ideas, and more effectively plan and execute strategies.

dotData Ops Self-Service Deployment of Data and Prediction Pipelines

dotData Ops offers analytics teams a self-service platform to deploy data, features, and prediction pipelines directly into real business operations. By testing and quickly validating the business value of data analytics within your workflows, you build trust with decision-makers and accelerate investment decisions for production deployment. dotData’s automated feature engineering transforms MLOps by validating business value, diagnosing feature drift, and enhancing prediction accuracy.

dotData Cloud Eliminate Infrastructure Hassles with Fully Managed SaaS

dotData Cloud delivers each of dotData’s AI platforms as a fully managed SaaS solution, eliminating the need for businesses to build and maintain a large-scale data analysis infrastructure. This minimizes Total Cost of Ownership (TCO) and allows organizations to focus on critical issues while quickly experimenting with AI development. dotData Cloud’s architecture, certified as an AWS "Competency Partner," ensures top-tier technology standards and uses a single-tenant model for enhanced data security.