Time Series Forecasting
Time series forecasting is a method of predicting future values of time-stamped data using statistical, econometric, signal processing, or machine learning techniques. Time series data sets consist of historical data points that are collected across time periods, often at regular intervals. Each observation in the time series data sets has a time stamp ranging from very short intervals below a second, such as nanoseconds used to measure how fast computers process information, to very long intervals, such as millions of years, to describe geological processes.
The process for creating a time series forecasting model should answer the following questions:
- What is the target we are looking to predict?
- Where is the data?
- What do we need to do with the data to prepare it?
- What data is relevant to the target variable?
- What patterns and trends are important?
- What algorithms work best for this type of data?
- How can we operationalize (deploy) the model for business use?
- When do we need to re-evaluate or retire the model?
This leads to the following steps to go from problem definition to an operational model:
Define what we want to predict and the metrics to be used to evaluate the model’s predictive accuracy.
Gathering and storing time series data in raw form if a new project or identifying data sources (database tables, spreadsheets, files) currently exist.
Identifying entity relationships, data schemas, and data type identification and canonicalization.
Cleaning the data, deduplication, outlier, and illegal value elimination, missing data imputation, and string and categorical canonicalization.
Exploratory Data Analysis
Identifying patterns and characteristics of the data is useful for understanding the data and hypothesizing about the types of features that may be important.
Applying filters, aggregation, normalization, and transformation to the raw source data to create new variables that improve predictive performance relative to the target variable.
Choosing the appropriate forecasting model based on the characteristics of the time series data and the problem to solve. For time series data, common approaches include autoregressive integrated moving average (ARIMA), exponential smoothing, seasonal decomposition, long-short-term memory (LSTM) neural networks, decision trees, and dynamic system models.
Estimating model parameters from a historical training partition of the full-time series range (typically around 80% of the full-time range).
Assessing the accuracy of the fitted model on a validation partition from the full-time series range (typically 10% of the time range, post-training set) using appropriate metrics (such as mean squared error, mean absolute error, root mean squared error or percent error).
Making a prediction of the fitted model on a test partition from the full-time series range (the final 10% of the historical range) and evaluating the prediction accuracy.
Using the model to generate predictions for the business use case on new data unseen during model development.
Periodically evaluating the model performance on new data and updating the model if required, which often is required due to new patterns or trends emerging in the data.
Time series forecasting is subject to various challenges, such as seasonality, trend shifts, irregular patterns, and noisy data. Handling these complexities and selecting the appropriate features and model is critical for creating a reliable and robust model capable of generating accurate forecasts. As seen in the image above, manual feature engineering can take months, limiting the ability to create models on time. Thus, establishing an automated data-centric feature discovery process is required to remain competitive in today’s AI-driven world.
Learn more about using time series in Machine Learning
- Temporal feature engineering [video]
- Common time series modeling techniques [blog]
- Types of temporal data [blog]
- Effective temporal feature engineering [blog]
- The Ultimate Guide to Temporal Feature Engineering [eBook]
- Solving the challenge of feature engineering with automation [blog]
Examples and How To
- Introduction to time series analysis [ODSC on-demand webinar]
- A practical guide for Time Series feature engineering [blog]
- Using machine learning to predict grocery sales [on-demand webinar]
- Sales forecasting with ARIMA, Prophet, and Feature Factory [on-demand webinar]
- 80% reduction in banking client default exposure [blog]
- 20% increase in healthcare forecast accuracy [User Story]
- 30% reduced add costs through improved TV audience prediction [User Story]
- 50% decrease in automotive demand forecast error [User Story]