Advanced Time Series Forecasting Architecture
Advanced Time Series Forecasting Architecture
Three-stage hybrid model for electricity load forecasting - VMD decomposition, LSTM-Transformer architecture, and Bayesian hyperparameter optimization (MAE 544.12, R² 0.9828).
Research Article Published: 9 February 2026 https://doi.org/10.20935/AcadEnergy8123 1Department of Artificial Intelligence Engineering, Faculty of Computer and Informatics, Adana Alparslan Türkeş Science and Technology University, Adana, Turkey. 2Department of Energy, Faculty of Engineering and Science, Aalborg University, Aalborg, Denmark. ∗email: zaltiparmak@atu.edu.tr Hybrid VMD–LSTM–transformer model with Bayesian optimization for electricity load forecasting Zeynep Altiparmak Guler1,*, İnayet Özge Aksu1, Sina Ghaemi2 Academic Editor: Marcos Tostado-Véliz Abstract Electricity load forecasting constitutes a critical component in optimizing energy resource allocation and grid management. However, the proliferation of flexible load integration has increased temporal volatility, seasonal variations, and non-linear dynamics within electricity consumption patterns, substantially limiting the predictive capabilities of contemporary deep learning models. To address this challenge, this study proposes a hybrid model integrating Variational Mode Decomposition (VMD), the Transformer mechanism, and Bayesian Optimization (BO) for enhanced electricity load forecasting. In the proposed model, electricity load data are first decomposed into intrinsic mode functions through VMD. Then, these decomposed components are processed using Long Short-Term Memory (LSTM) network, with Transformer architecture employed to provide the attention mechanism for enhanced temporal feature extraction. In addition, the parameters of the prediction model are optimized using the BO algorithm. Finally, the proposed model’s performance is evaluated using established statistical metrics including Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and R2. Moreover, comprehensive comparative analyses are carried out against baseline models as well as versions integrated with VMD, Transformer, and BO. Upon examining the results, it was observed that the proposed hybrid model achieved the lowest error rates among all models, with MAE 544.12, RMSE 788.80, and R2 0.9828. These findings demonstrate the efficacy of the proposed model managing the inherent complexities of electricity load time series, thereby validating the strategic integration of decomposition techniques, recurrent networks, and attention mechanisms for robust forecasting performance. Keywords: load forecast, decomposition, hybrid, deep learning, transformer, hyperparameter tuning Citation: Altiparmak Guler Z, Aksu İÖ, Ghaemi S. Hybrid VMD–LSTM–transformer model with Bayesian optimization for electricity load forecasting. Academia Green Energy 2026;3. https://doi.org/10.20935/AcadEnergy8123
https://doi.org/10.20935/AcadEnergy8123 to model long-term dependencies and spatiotemporal interactions in electricity load forecast data [9, 10]. Decomposition techniques, commonly used to preprocess linear and non-stationary time series data, help machine learning mod- els capture hidden patterns more effectively by breaking down complex signals into sub-components [11]. In this regard, Fang et al. [12] aimed to improve the performance of wind energy prediction results by combining the optimized Variational Mode Decomposition (VMD) method with Long Short-Term Memory (LSTM) and developed a prediction model. By applying the decomposition method, the signal was decomposed into sub- components and each component can focus on its own structural patterns. In conclusion, a more successful model against overfit- ting has been obtained and the overall learning capacity of the model has increased. Xu et al. [13] utilized VMD to improve wind energy forecast accuracy. In their work, by decomposing the input data into more meaningful sub-components, VMD enabled more efficient feature extraction. This approach, combined with deep learning methods, provided more reliable and accurate forecasts compared to traditional deep learning methods. Hyperparameter optimization is another critical and challenging component of deep learning models because it directly impacts their prediction accuracy and generalization ability. The selection of parameters such as the number of layers, learning rate, batch size, and optimizer can significantly impact the model’s general- ization ability, runtime, protection against overfitting, and con- vergence. However, the interdependence and high-dimensional nature of these parameters make the search space extremely complex. This makes finding the optimal parameter combination both time-consuming and computationally expensive [14, 15]. Li et al. [16] enhanced water quality prediction accuracy by integrat- ing VMD with a Gated Recurrent Unit (GRU) model, where the GRU architecture was optimized using the Grey Wolf Optimizer (GWO) algorithm. The VMD technique effectively decomposed complex frequency components within the water quality data, thereby augmenting the model’s learning capability. Compara- tive analysis demonstrated that this hybrid approach achieved substantial reductions in prediction errors and computational delays relative to conventional forecasting models. Wang at al. [17] suggested a short-term prediction model based on the integration of VMD and LSTM to reduce the uncertainties arising from the stochastic nature of wind energy. VMD and LSTM parameters in the model were optimized using the Butterfly Optimization Algorithm (BOA). It has been observed that the proposed ap- proach has lower error rates in seasonal time intervals compared to traditional LSTM methods. The hybrid integration of signal decomposition methods and deep learning models can yield more accurate and reliable results in prediction models. In this regard, Liu et al. [18] proposed a model that combines VMD with Bidirectional Gated Recurrent Unit (Bi- GRU) to predict short-term buoy motion. In this model, VMD de- composes the motion signal into multiple sub-components, while BiGRU captures the temporal dependencies in each component. In the study, it was observed that the average absolute errors for the results of the model trained with the VMD-integrated dataset decreased significantly by more than 62%. Likewise, Xu et al. in [19] proposed a model for electricity price prediction consisting of VMD, GWO, an attention mechanism, and an LSTM architecture. The proposed model performs data parsing, fea- ture weighting, and hyperparameter optimization simultaneously. VMD reduced noise in the data, the attention mechanism focused on important information, and model performance was improved with GWO. The prediction success of the resulting hybrid model has been significantly improved. Ouyang et al. in [20] established a model integrated with GRU architecture for wind power pre- diction, using a data decomposition process based on Singular Spectrum Analysis (SSA) and VMD. The basic components are separated by SSA, while the remaining components are divided into sub-components by VMD. This method effectively mitigated noise interference and minimized inter-component interactions. The proposed model has shown significant improvements in pre- diction accuracy and contributed to the planning and operation processes of wind farms. Beyond conventional signal decomposition and deep learning- based prediction methodologies, Transformer-based hybrid ar- chitectures have emerged as a prominent research direction in the recent literature. This growing interest is attributed to a Trans- former’s ability to model long-term dependencies through self- attention mechanisms, providing improved efficiency in learning complex temporal patterns and parallel computations [21, 22]. Feng et al. proposed in [23] an LSTM and Transformer-based model that can predict the energy consumption of electric vehi- cles. This proposed model successfully captured the long-term de- pendencies in time series data considering not only environmental and vehicle parameters but also the individual driving habits of drivers. The results improved prediction performance compared to classical LSTM methods. Yu et al. in [24] developed a model called WOA-VMD-FE–Transformer for the successful prediction of crude oil prices. In that model, VMD decomposed the signal into sub-components using the Whale Optimization Algorithm and trained each sub-component by using the Transfer mechanism. In this work, WOA was used for hyperparameter optimization. The results showed that the model achieved high accuracy and improved energy market predictions. Unlike other studies, Yao et al. in [25] utilized Complete Ensemble Empirical Mode Decom- position (CEEMDAN) and Fuzzy Entropy as signal decomposition methods. The proposed model is a CF-LT hybrid model consisting of LSTM and Transformer structures, developed to predict total phosphorus (TP) concentrations. Thanks to the integrated atten- tion mechanism of the model, higher accuracy is achieved com- pared to traditional methods and effectively captures long-term dependencies. Similarly, Agbehadji et al. in [26] established a model called EEMD-CEEMDAN-BiLSTM-AMT, which focuses on predicting ozone concentration in South Africa. This model com- bines signal decomposition, Transformers, and attention mecha- nisms to accommodate non-linear and variable components. The model demonstrates superior performance with low error values, demonstrating a positive impact on the predictive performance of hybrid models. This study presents a three-stage method for electricity load fore- casting. It does so by building upon other studies examined and emphasizing the importance of accurate load forecasting. This method incorporates innovations. The innovation here is the in- tegration of signal decomposition, deep learning-based tempo- ral modeling, and data-driven hyperparameter optimization—all within a unified forecasting framework, using a real-world na- tional electricity load dataset from Türkiye. First, VMD decom- poses the complex load signal into intrinsic frequency compo- nents, effectively separating meaningful patterns from noise in- terference. Second, a synergistic LSTM–Transformer architecture ACADEMIA GREEN ENERGY 2026, 3 2 of 15
https://doi.org/10.20935/AcadEnergy8123 processes these components, where LSTM networks capture lo- cal temporal dynamics while Transformer attention mechanisms model long-range dependencies. Finally, BO systematically iden- tifies optimal hyperparameters across both decomposition and neural network modules, maximizing predictive performance. In fact, this comprehensive methodology addresses critical limita- tions in current forecasting approaches, demonstrating superior generalization and accuracy for practical energy management ap- plications. The organization of the paper is as follows: Section 2 introduces the electricity load dataset and deep learning methods, along with the proposed hybrid model, hyperparameter optimization process, and performance metrics, used for evaluation. Section 3 discusses the experimental results and analysis. Finally, Section 4 provides a conclusion of the study.