fbpx

Time Series Forecasting Resource Page

Time Series Data

Time Series Forecasting

Time series forecasting is a method of predicting future values of time-stamped data using statistical, econometric, signal processing, or machine learning techniques.  Time series data sets consist of historical data points that are collected across time periods, often at regular intervals.  Each observation in the time series data sets has a time stamp ranging from very short intervals below a second, such as nanoseconds used to measure how fast computers process information, to very long intervals, such as millions of years, to describe geological processes.

The process for creating a time series forecasting model should answer the following questions:

  1. What is the target we are looking to predict?
  2. Where is the data?
  3. What do we need to do with the data to prepare it?
  4. What data is relevant to the target variable?
  5. What patterns and trends are important?
  6. What algorithms work best for this type of data?
  7. How can we operationalize (deploy) the model for business use? 
  8. When do we need to re-evaluate or retire the model?

This leads to the following steps to go from problem definition to an operational model:

Problem Identification

Define what we want to predict and the metrics to be used to evaluate the model’s predictive accuracy.

Data Collection

Gathering and storing time series data in raw form if a new project or identifying data sources (database tables, spreadsheets, files) currently exist.

Data Profiling

Identifying entity relationships, data schemas, and data type identification and canonicalization.

Data Preparation

Cleaning the data, deduplication, outlier, and illegal value elimination, missing data imputation, and string and categorical canonicalization.

Exploratory Data Analysis

Identifying patterns and characteristics of the data is useful for understanding the data and hypothesizing about the types of features that may be important.

Feature Engineering

Applying filters, aggregation, normalization, and transformation to the raw source data to create new variables that improve predictive performance relative to the target variable.

Model Selection

Choosing the appropriate forecasting model based on the characteristics of the time series data and the problem to solve.  For time series data, common approaches include autoregressive integrated moving average (ARIMA), exponential smoothing, seasonal decomposition, long-short-term memory (LSTM) neural networks, decision trees, and dynamic system models.

Model Training

Estimating model parameters from a historical training partition of the full-time series range (typically around 80% of the full-time range).

Model Evaluation

Assessing the accuracy of the fitted model on a validation partition from the full-time series range (typically 10% of the time range, post-training set) using appropriate metrics (such as mean squared error, mean absolute error, root mean squared error or percent error).

Model Testing

Making a prediction of the fitted model on a test partition from the full-time series range (the final 10% of the historical range) and evaluating the prediction accuracy.

Model Deployment

Using the model to generate predictions for the business use case on new data unseen during model development.

Model Maintenance

Periodically evaluating the model performance on new data and updating the model if required, which often is required due to new patterns or trends emerging in the data.

Time series forecasting is subject to various challenges, such as seasonality, trend shifts, irregular patterns, and noisy data.  Handling these complexities and selecting the appropriate features and model is critical for creating a reliable and robust model capable of generating accurate forecasts.  As seen in the image above, manual feature engineering can take months, limiting the ability to create models on time.  Thus, establishing an automated data-centric feature discovery process is required to remain competitive in today’s AI-driven world.


Learn more about using time series in Machine Learning


Examples and How To


Case Studies