Hybrid VMD-LSTM-Transformer with Bayesian Optimization

Advanced Time Series Forecasting Architecture

Quick Navigation:

Overview
Key Findings
Full Content
References

Hybrid VMD-LSTM-Transformer with Bayesian Optimization

Advanced Time Series Forecasting Architecture

📄 Research Document ⏱️ 20 min read 📂 AI/ML Theory

Three-stage hybrid model for electricity load forecasting - VMD decomposition, LSTM-Transformer architecture, and Bayesian hyperparameter optimization (MAE 544.12, R² 0.9828).

Time SeriesVMDLSTMTransformerBayesian Optimization

🎯 Key Insight: This document is part of the Phoenix Technical Documentation Library - a curated collection of peer-reviewed research papers and official guidelines for AI/ML implementation in healthcare, security, and enterprise systems.

Full Document

Research Article Published: 9 February 2026 https://doi.org/10.20935/AcadEnergy8123 1Department of Artificial Intelligence Engineering, Faculty of Computer and Informatics, Adana Alparslan Türkeş Science and Technology University, Adana, Turkey. 2Department of Energy, Faculty of Engineering and Science, Aalborg University, Aalborg, Denmark. ∗email: zaltiparmak@atu.edu.tr Hybrid VMD–LSTM–transformer model with Bayesian optimization for electricity load forecasting Zeynep Altiparmak Guler1,*, İnayet Özge Aksu1, Sina Ghaemi2 Academic Editor: Marcos Tostado-Véliz Abstract Electricity load forecasting constitutes a critical component in optimizing energy resource allocation and grid management. However, the proliferation of flexible load integration has increased temporal volatility, seasonal variations, and non-linear dynamics within electricity consumption patterns, substantially limiting the predictive capabilities of contemporary deep learning models. To address this challenge, this study proposes a hybrid model integrating Variational Mode Decomposition (VMD), the Transformer mechanism, and Bayesian Optimization (BO) for enhanced electricity load forecasting. In the proposed model, electricity load data are first decomposed into intrinsic mode functions through VMD. Then, these decomposed components are processed using Long Short-Term Memory (LSTM) network, with Transformer architecture employed to provide the attention mechanism for enhanced temporal feature extraction. In addition, the parameters of the prediction model are optimized using the BO algorithm. Finally, the proposed model’s performance is evaluated using established statistical metrics including Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and R2. Moreover, comprehensive comparative analyses are carried out against baseline models as well as versions integrated with VMD, Transformer, and BO. Upon examining the results, it was observed that the proposed hybrid model achieved the lowest error rates among all models, with MAE 544.12, RMSE 788.80, and R2 0.9828. These findings demonstrate the efficacy of the proposed model managing the inherent complexities of electricity load time series, thereby validating the strategic integration of decomposition techniques, recurrent networks, and attention mechanisms for robust forecasting performance. Keywords: load forecast, decomposition, hybrid, deep learning, transformer, hyperparameter tuning Citation: Altiparmak Guler Z, Aksu İÖ, Ghaemi S. Hybrid VMD–LSTM–transformer model with Bayesian optimization for electricity load forecasting. Academia Green Energy 2026;3. https://doi.org/10.20935/AcadEnergy8123

Introduction Electric load prediction plays a significant role in ensuring effi- cient energy management, grid reliability, and optimal resource allocation within modern power systems. The economic growth and operational sustainability of nations are intrinsically linked to accurate demand predictions, as electricity infrastructure has become central to contemporary society [1]. Therefore, developing an accurate model for load forecasting is essential. Numerous studies have attempted to improve the load forecasting accuracy in the recent years. Load forecasting models are used to predict future electricity demand using historical load data. These models are crucial for the planning and efficient operation of energy systems. Accurate load forecasts help reduce operating costs by supporting important decisions such as supply–demand balancing, production planning, and resource allocation. In this context, load forecasting provides a foundation for creating an informed decision-making mechanism in modern electricity systems. In this regard, Muzaffar and Afshari [2] proposed an LSTM-based load forecasting model that effectively captures seasonal patterns and temporal dependencies. The model achieved significantly lower forecasting errors compared to traditional statistical approaches such as Autoregressive Moving Average (ARMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA). Abumohsen et al. [3] proposed deep learning models, including Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN), for load forecasting. Among them, the GRU model achieved the best performance, demonstrating higher accuracy and reliability compared to other approaches. Elsaraiti [4] proposed a neural network-based model. The model demonstrated reliable short-term load forecasting performance, outperforming traditional statistical methods. Jain et al. [5] researched different machine learning algorithms, including Support Vector Machine (SVM), RNN, and LSTM, for short-term load forecasting. Their results indicated that the LSTM model effectively captured temporal dependencies and non-linear patterns in electrical load data, resulting in the most accurate prediction performance. However, conventional deep learning models face significant limitations in addressing the inherent complexities of load data, including temporal volatility, seasonal patterns, and non-linear dependencies [6]. These challenges have necessitated a paradigm shift toward hybrid modeling approaches that synergistically combine deep learning architectures with statistical methods to capture both linear and non-linear dynamics effectively [7, 8]. Among them, structures such as temporal convolutional networks (TCNs), attention mechanisms, and graph-based neural networks stand out with their ability ACADEMIA GREEN ENERGY 2026, 3 1 of 15
https://doi.org/10.20935/AcadEnergy8123 to model long-term dependencies and spatiotemporal interactions in electricity load forecast data [9, 10]. Decomposition techniques, commonly used to preprocess linear and non-stationary time series data, help machine learning mod- els capture hidden patterns more effectively by breaking down complex signals into sub-components [11]. In this regard, Fang et al. [12] aimed to improve the performance of wind energy prediction results by combining the optimized Variational Mode Decomposition (VMD) method with Long Short-Term Memory (LSTM) and developed a prediction model. By applying the decomposition method, the signal was decomposed into sub- components and each component can focus on its own structural patterns. In conclusion, a more successful model against overfit- ting has been obtained and the overall learning capacity of the model has increased. Xu et al. [13] utilized VMD to improve wind energy forecast accuracy. In their work, by decomposing the input data into more meaningful sub-components, VMD enabled more efficient feature extraction. This approach, combined with deep learning methods, provided more reliable and accurate forecasts compared to traditional deep learning methods. Hyperparameter optimization is another critical and challenging component of deep learning models because it directly impacts their prediction accuracy and generalization ability. The selection of parameters such as the number of layers, learning rate, batch size, and optimizer can significantly impact the model’s general- ization ability, runtime, protection against overfitting, and con- vergence. However, the interdependence and high-dimensional nature of these parameters make the search space extremely complex. This makes finding the optimal parameter combination both time-consuming and computationally expensive [14, 15]. Li et al. [16] enhanced water quality prediction accuracy by integrat- ing VMD with a Gated Recurrent Unit (GRU) model, where the GRU architecture was optimized using the Grey Wolf Optimizer (GWO) algorithm. The VMD technique effectively decomposed complex frequency components within the water quality data, thereby augmenting the model’s learning capability. Compara- tive analysis demonstrated that this hybrid approach achieved substantial reductions in prediction errors and computational delays relative to conventional forecasting models. Wang at al. [17] suggested a short-term prediction model based on the integration of VMD and LSTM to reduce the uncertainties arising from the stochastic nature of wind energy. VMD and LSTM parameters in the model were optimized using the Butterfly Optimization Algorithm (BOA). It has been observed that the proposed ap- proach has lower error rates in seasonal time intervals compared to traditional LSTM methods. The hybrid integration of signal decomposition methods and deep learning models can yield more accurate and reliable results in prediction models. In this regard, Liu et al. [18] proposed a model that combines VMD with Bidirectional Gated Recurrent Unit (Bi- GRU) to predict short-term buoy motion. In this model, VMD de- composes the motion signal into multiple sub-components, while BiGRU captures the temporal dependencies in each component. In the study, it was observed that the average absolute errors for the results of the model trained with the VMD-integrated dataset decreased significantly by more than 62%. Likewise, Xu et al. in [19] proposed a model for electricity price prediction consisting of VMD, GWO, an attention mechanism, and an LSTM architecture. The proposed model performs data parsing, fea- ture weighting, and hyperparameter optimization simultaneously. VMD reduced noise in the data, the attention mechanism focused on important information, and model performance was improved with GWO. The prediction success of the resulting hybrid model has been significantly improved. Ouyang et al. in [20] established a model integrated with GRU architecture for wind power pre- diction, using a data decomposition process based on Singular Spectrum Analysis (SSA) and VMD. The basic components are separated by SSA, while the remaining components are divided into sub-components by VMD. This method effectively mitigated noise interference and minimized inter-component interactions. The proposed model has shown significant improvements in pre- diction accuracy and contributed to the planning and operation processes of wind farms. Beyond conventional signal decomposition and deep learning- based prediction methodologies, Transformer-based hybrid ar- chitectures have emerged as a prominent research direction in the recent literature. This growing interest is attributed to a Trans- former’s ability to model long-term dependencies through self- attention mechanisms, providing improved efficiency in learning complex temporal patterns and parallel computations [21, 22]. Feng et al. proposed in [23] an LSTM and Transformer-based model that can predict the energy consumption of electric vehi- cles. This proposed model successfully captured the long-term de- pendencies in time series data considering not only environmental and vehicle parameters but also the individual driving habits of drivers. The results improved prediction performance compared to classical LSTM methods. Yu et al. in [24] developed a model called WOA-VMD-FE–Transformer for the successful prediction of crude oil prices. In that model, VMD decomposed the signal into sub-components using the Whale Optimization Algorithm and trained each sub-component by using the Transfer mechanism. In this work, WOA was used for hyperparameter optimization. The results showed that the model achieved high accuracy and improved energy market predictions. Unlike other studies, Yao et al. in [25] utilized Complete Ensemble Empirical Mode Decom- position (CEEMDAN) and Fuzzy Entropy as signal decomposition methods. The proposed model is a CF-LT hybrid model consisting of LSTM and Transformer structures, developed to predict total phosphorus (TP) concentrations. Thanks to the integrated atten- tion mechanism of the model, higher accuracy is achieved com- pared to traditional methods and effectively captures long-term dependencies. Similarly, Agbehadji et al. in [26] established a model called EEMD-CEEMDAN-BiLSTM-AMT, which focuses on predicting ozone concentration in South Africa. This model com- bines signal decomposition, Transformers, and attention mecha- nisms to accommodate non-linear and variable components. The model demonstrates superior performance with low error values, demonstrating a positive impact on the predictive performance of hybrid models. This study presents a three-stage method for electricity load fore- casting. It does so by building upon other studies examined and emphasizing the importance of accurate load forecasting. This method incorporates innovations. The innovation here is the in- tegration of signal decomposition, deep learning-based tempo- ral modeling, and data-driven hyperparameter optimization—all within a unified forecasting framework, using a real-world na- tional electricity load dataset from Türkiye. First, VMD decom- poses the complex load signal into intrinsic frequency compo- nents, effectively separating meaningful patterns from noise in- terference. Second, a synergistic LSTM–Transformer architecture ACADEMIA GREEN ENERGY 2026, 3 2 of 15
https://doi.org/10.20935/AcadEnergy8123 processes these components, where LSTM networks capture lo- cal temporal dynamics while Transformer attention mechanisms model long-range dependencies. Finally, BO systematically iden- tifies optimal hyperparameters across both decomposition and neural network modules, maximizing predictive performance. In fact, this comprehensive methodology addresses critical limita- tions in current forecasting approaches, demonstrating superior generalization and accuracy for practical energy management ap- plications. The organization of the paper is as follows: Section 2 introduces the electricity load dataset and deep learning methods, along with the proposed hybrid model, hyperparameter optimization process, and performance metrics, used for evaluation. Section 3 discusses the experimental results and analysis. Finally, Section 4 provides a conclusion of the study.

Materials and methods In recent years, deep learning models have made significant progress in machine learning and artificial intelligence. They have