long-term forecasting of daily average gold price with NHITS. The notebook I put here is only for forecasting the gold future price in the third quarter of 2024 (2024 Q3). Watch the video linked below to see how the model predicts the price for 2024 Q4.
All datasets are downloaded from NASDAQ
To know more about NHITS, check out this video
The plot below shows the ground truth gold price (in yellow) and the forecasts generated by NHITS using different interpolations.
The best predictions are roughly from the end of October to the first week of November. This is when the real gold price reached its highest level in the last quarter of 2024. Then, even though the models predict a downward trend in price, they can't see that it would plunge that fast, decreasing by more than 100 USD in just a week. This was around the time after the US Presidential election, so if we had included a static exogenous variable (for example, an indicator of whether there was an election during that quarter of the year), the model might have generated a better forecast.
The forecast for the tail of the prediction series from the nearest neighbor interpolation (blue line) is the best. NHITS model using the nearest interpolation can predict the level price towards the end of 2024, whereas the forecasts from the linear (pink line) and quadratic (green line) interpolations expect the price to go up (even though the price eventually increased to the forecasted level in January; so, those forecasts might actually not that bad since we could use their forecast directions). In my opinion, the NHITS model using the Nearest Neighbor interpolation is the best model for this dataset. Perhaps it is due to the nature of financial time series data: high fluctuation and non-stationary.
The forecasts seem to have different frequencies than the ground truth. The frequency for all forecast series appears to be 1, which is lower than that of the ground truth series. Recall that we built the NHITS model with all default values except for the interpolation types (nearest, linear, and quadratic). For the frequency downsampling rate, we use 4, 2, and 1, meaning that we take every four values of the input in the first stack, every two values in the second stack, and every single value in the third stack. Perhaps this is the reason for the models' not-so-good performance. If we take 25, 5, and 1 as our rates, we might have gotten more accurate forecasts.
One important note: the gold price data for training and the ground truth are from different sources. I used gold price GC:CMX data from NASDAQ for training but took the true price data from Yahoo Finance. Unfortunately, that is inevitable since the gold price data is not publicly available on NASDAQ any longer.