In this Python Notebook, I show how to predict demand using Bayesian hierarchical models in the PyMC programming package. Predicting demand is one of the typical data science applications in industry. You can explore the notebook here.

Demand prediction is a critical application in data science that helps businesses anticipate future needs based on historical data and various influencing factors. Common methods include time series analysis techniques such as ARIMA and exponential smoothing, which capture trends and seasonality but are best suited for stationary time series. Machine learning models, such as ensemble methods, and neural networks, can handle complex patterns and non-linear relationships very well. Additionally, Media Mixed Models (MMM) evaluate the impact of marketing activities on demand, optimizing resource allocation.

Case study: Eco-Friendly Online Retail Business

In this case study, we explore a hypothetical eco-friendly online retail business specializing in health and wellness products, home office supplies, and essential goods. The business began with its first warehouse in Los Angeles, CA, in 2019. During the pandemic, sales surged, prompting the opening of two additional locations in 2020—one in San Francisco, CA, and another in Seattle, WA. As the business continued to expand, a fourth warehouse was established in Denver, CO, in 2023.

This tutorial shows the application of Bayesian hierarchical models for predicting demand across four distinct regions:

Region 1 (LA) - demand in the beginning is modest, but sees a significant increase starting in 2020.
Region 2 (SF) - demand has been consistently growing since the beginning
Region 3 (SEA) - demand also shows steady growth from the beginning
Region 4 (DEN) - demand growth is slower compared to Regions 2 and 3, and the region has the fewest data points.

We will work with time series data that exhibit seasonality and trends. Despite the differences among regions, we will explore and leverage the similarities in the data in our analysis.

Data generation

Since this is a hypothetical online business, we first generate the data. We then perform exploratory data analysis and choose our Key Performance Indicators (KPIs). In this case, the KPIs to consider are the cumulative weight of delivered products and the monthly number of orders. In the model, we will use order count because it is conceptually simpler, easier to track, and a direct indicator of business activity.

Figure 1: Time series plots showing cumulative weight (blue) and order count (green) for each region. Region 1 exhibits a moderate demand trend for the first two years, with a steady increase thereafter. Regions 2 and 3 show consistent growth from the beginning. Although Region 4 also experiences an increase in demand, it is less steady compared to Regions 2 and 3.

Bayesian hierarchical models

Bayesian hierarchical models are a class of statistical models that involve multiple levels of parameters, allowing for the modeling of complex, structured data. They are particularly useful when data is organized in groups or layers, such as nested or repeated measurements, or when modeling different regions that share common characteristics. In these models, parameters at one level of the hierarchy depend on parameters at a higher level, facilitating the sharing of information across different groups.

Bayesian hierarchical models are highly effective for retail demand forecasting because they integrate data across multiple hierarchical levels, such as regions, stores, and product categories. Since they can also incorporate prior information and uncertainties, they provide a robust framework for handling sparse or incomplete data. By capturing both global and local factors, Bayesian hierarchical models account for regional variations and trends while utilizing insights from related areas. This approach enhances forecasting accuracy, helping retailers optimize inventory, plan promotions, and improve decision-making.

In our case, this means capturing similar seasonality patterns or trends across four regions. This approach leads to more accurate and robust inferences, especially when some groups, like Region 4, have limited data.

Model Assumptions and Overview

In this analysis, I make the following key assumptions about the data and the model:

Distribution of Monthly Order Count (or Demand): I assume that the monthly order count follows a Negative Binomial distribution. This is appropriate because data represents nonnegative count values, and the Negative Binomial distribution is well-suited for modeling count data with overdispersion (i.e., when the variance exceeds the mean).
Parameterization of the Negative Binomial Distribution: The Negative Binomial distribution can be parametrized using an overdispersion parameter, α. This parameter is directly related to the variance of the distribution and allows us to model the extra variability observed in the data.
Data Characteristics: Our dataset is limited in size. Although each region has its own unique characteristics, there are commonalities between regions that we can use for modeling.
Bayesian Hierarchical Models:
- Combining Data: Bayesian hierarchical models are used to combine data from different regions. This approach allows us to capture both shared patterns and individual variations among regions.
- Group-Specific Variations: The hierarchical structure of the model enables us to account for group-specific variations while still pooling information across regions.
Expected Value Components: The expected value of the monthly demand is modeled as a product of seasonality and trend:
- Seasonality: We assume that the seasonal patterns are similar across regions (e.g., peaks occurring in summer). Therefore, the Bayesian hierarchical model will learn common seasonal parameters for all regions.
- Trend: We expect the trend to differ among regions. Thus, the model will estimate distinct trend parameters for each region. In this case, the trend is modeled as a second-order polynomial to capture more complex trends over time.

Markov Chain Monte Carlo sampling

Markov Chain Monte Carlo (MCMC) sampling is a set of algorithms used to approximate the posterior distribution of a model's parameters when direct computation is intractable, which is often the case with Bayesian hierarchical models. MCMC generates samples from this distribution, which can then be used to estimate summary statistics or make predictions. I use PyMC No-U-Turn-Sampler (NUTS) to generate MCMC samples.

Figure 2: PyMC code for a Bayesian hierarchical model.

Convergence checks

The R-hat statistic, also known as the Gelman-Rubin diagnostic, is used to assess the convergence of MCMC chains. It compares the variance within each chain to the variance between chains. If the chains have converged, the within-chain variance should be similar to the between-chain variance, and R-hat should be close to 1. An R-hat value significantly greater than 1 indicates that the chains have not yet converged, suggesting that more iterations may be needed to obtain reliable estimates from the MCMC sampling.

Figure 3: Convergence diagnostics for all variables and chains. In PyMC, convergence can be assessed using the trace summary. All R-hat values are 1, indicating convergence.

Computing predictions

We will compute the mean of the posterior predictive samples to obtain predictions and calculate percentiles from these samples to represent prediction uncertainty. Although the Bayesian approach also allows for determining credible intervals for model parameters to assess their uncertainty, I will focus on using percentiles of the predictions for easier data visualization. I then calculate R² scores for each region and create an 'observed vs. predicted' plot along with separate prediction plots for each region. Afterward, I run a prediction model on unseen data.

Figure 4: This plot compares the observed data (x-axis) with the predicted values (y-axis) generated by the model for 4 regions. The diagonal red dashed line represents the line of perfect prediction. Points close to this line indicate better predictions (as in Region 1, and some points in Region 2 and 3), while scattered points (as in Region 4, and some point in Region 2 and 3) indicate more significant discrepancies between the model’s predictions and the observed data.

Figure 5: Observed data (black dots) and predictions (blue lines) are shown for four regions. The shaded areas represent the 5%-95% percentiles of the predictions, indicating the uncertainty range. The model fits Region 1 the best. For Regions 2 and 3, overestimation of data points tends to occur as the effects of seasonality decrease. The model also performs well for Region 4. Although it occasionally overestimates data points, the discrepancies are relatively minor.

Conclusion

Consistent with the results above, I believe that predictions for region 1 are likely the most accurate, while those for region 4 are the least accurate.

I’m using a simple Bayesian hierarchical model that incorporates seasonality, trend, and shared characteristics among regions. In this model, seasonality weights are consistent across all regions, meaning that seasonal patterns (e.g., peak periods in a year) are assumed to be the same for each region. However, trend weights vary by region, allowing for unique growth or decline patterns in each area.

By pooling information across regions, the model estimates common parameters (such as seasonal effects) more robustly, which is crucial given the sparse data for individual regions. This approach stabilizes estimates and enhances predictions.

Understanding these seasonal and trend effects is valuable for resource allocation and planning. It can inform adjustments to delivery routes and schedules and reveal important regional differences that might impact future strategies.

To improve the model's accuracy further, I recommend compiling additional data or incorporating other relevant information for these regions, such as customer demographics, income levels, or the effects of marketing strategies.

Predicting Demand