Hydrogen-electricity coupling energy storage systems: Models, applications, and deep reinforcement learning algorithms

: With the maturity of hydrogen storage technologies, hydrogen-electricity coupling energy storage in green electricity and green hydrogen modes is an ideal energy system. The construction of hydrogen-electricity coupling energy storage systems (HECESSs) is one of the important technological pathways for energy supply and deep decarbonization. In a HECESS, hydrogen storage can maintain the energy balance between supply and demand and increase the utilization efficiency of energy. However, its scenario models in power system establishment and the corresponding solution methods still need to be studied in depth. For accelerating the construction of HECESSs, firstly, this paper describes the current applications of hydrogen storage technologies from three aspects: hydrogen production, hydrogen power generation, and hydrogen storage. Secondly, based on the complementary synergistic mechanism of hydrogen energy and electric energy, the structure of the HECESS and its operation mode are described. To study the engineering applications of HECESSs more deeply, the recent progress of HECESS application at the source, grid, and load sides is reviewed. For the application of the models of hydrogen storage at the source/grid/load side, the selection of the solution method will affect the optimal solution of the model and solution efficiency. As solving complex multi-energy coupling models using traditional optimization methods is difficult, the paper therefore explored the advantages of deep reinforcement learning (DRL) algorithms and their applications in HECESSs. Finally, the technical application in the construction of new power systems supported by HECESSs is prospected. The study aims to provide a reference for the research on hydrogen storage in power systems.


Introduction
To actively respond to global climate change, countries have been engaged in a low-carbon transformation of their energy mix.Substantial changes are needed in the global energy system to achieve sustainable development goals [1], with a focus on green and clean energy sources, such as hydrogen, solar, wind, and nuclear.With the introduction of the dual-carbon strategy, the long-term abundance of electricity brought about by the increased penetration of renewable energy and the withdrawal of traditional fossil energy units need to be urgently addressed.Traditional power systems are mainly based on centralized power generation, relying on large thermal power plants, nuclear power plants, and other power generation facilities to supply large-scale power.New power systems, on the other hand, focus more on distributed power generation, such as solar photovoltaic, wind power, and small hydroelectric power plants.The power supply side of new power systems is gradually transformed into a high-uncertainty energy supply system dominated by new energy sources [2].
Compared with traditional power systems, new power systems undergo profound changes in primary energy characteristics, load structure characteristics, and gridbalance mode [3].New power systems face the following challenges: 1) In extreme climates, the minimum output of new energy is at a low level.There may be several consecutive days of small output, and hence power is difficult to balance and the capacity for power supply security support is insufficient.
2) The seasonal mismatch between new energy generation and electricity consumption leads to difficulties in seasonal balance.
The abandonment of wind and light can be widespread, and there is a high proportion of new energy consumption problems.Therefore, energy storage, as a flexible regulating device, can provide power support during the peak period of electricity consumption load.When the electricity consumption load is low, energy storage can consume excess renewable energy.It is designed to help balance the difference between energy supply and demand and improve energy utilization efficiency.Rational allocation in power systems is expected to solve system balance problems caused by the randomness, volatility, and seasonality of renewable energy generation.Therefore, new power systems are also gradually transformed from the three elements of source, grid, and load into the four elements of source, grid, load, and storage.There are abundant types of energy storage resources, among which is hydrogen energy, which is a green, clean, and low-carbon energy source with high energy density, easy storage, and transportation [4].The most important feature of hydrogen energy is its ability to be stored for long periods, at high capacity, and across seasons, which is not available in other ways.The integration and control of a power storage system and a hydrogen storage system required for effective interaction with the grid further strengthen the need for hydrogen-electricity coupling.
A hydrogen-electricity coupling energy storage system (HECESS) is a new lowcarbon and sustainable energy system that uses electric energy and hydrogen energy as energy carriers to aim at a high percentage of renewable energy consumption and meet multiple energy demands on the electricity consumption side [5].The synergistic development of hydrogen-electricity coupling refers to an energy network in which hydrogen energy and electric energy are transformed into each other with high efficiency and synergy.This is the development direction and an important feature of new power systems.Building HECESSs can promote the synergistic development of hydrogen power and electricity and realize the mutual transformation and efficient synergy of hydrogen power and electricity.
HECESSs provide the capability for peak and frequency adjustments for grids and power systems and can guarantee a stable operation of electricity production.Therefore, energy storage technologies based on electricity to hydrogen, as well as hydrogen storage and hydrogen power generation, as a flexible means of scheduling can provide power support for power systems through the conversion mode of electricity-hydrogen-electricity and promote the consumption of renewable energy sources, such as wind and light.Hydrogen storage meets energy storage needs on a large time scale, ranging from short-term system frequency control to seasonal balance of energy supply and demand.Hydrogen can achieve emission reductions in all areas and can be widely used at the source side, the grid side, and the load side for deep decarbonization policy scenarios [6].Currently, scholars in this field have been exploring hydrogen storage technologies, HECESS operation, and HECESS control, and some progress has been made [7,8].
Many studies have discussed hydrogen storage from aspects of mathematical models, technical characteristics, and development status.Razzhivin et al. [9] considered the applications of energy storage devices in power systems.For the proposed hydrogen storage system, the principle of implementation of the detailed mathematical model and the principle of the control system were described.The production, storage, delivery, and utilization of hydrogen were found to be the key aspects of hydrogen storage.Moradi et al. [10] provided an in-depth discussion on hydrogen storage and delivery schemes and looked into future research issues of related technical risks and reliability analysis.Pei et al. [11] classified and analyzed the operation and control strategies of an energy storage system in terms of static and dynamic characteristics of a hydrogen storage system, the power distribution of the HECESS, and the optimization of the efficiency of hydrogen storage.Eriksen et al. [12] reviewed the latest developments and current state-of-the-art technologies of hydrogen-based systems and analyzed the advantages and challenges of hydrogen storage technologies.On this basis, some scholars comprehensively reviewed the characteristics and research trends of hydrogen production and storage and looked forward to future research trends in these fields in light of the latest research results of hydrogen storage technologies [13].The above works mainly studied the technological development of hydrogen storage and discussed the functional characteristics and progress of energy storage.However, the analysis of the application of hydrogen storage at the source/grid/load side in power systems is still scarce, and there is no relatively efficient solution method.
In the current research context, the paper analyzed the role of coupling hydrogen storage and a power system to comprehensively introduce the HECESS.In a HECESS, hydrogen storage flexibly regulates resources through three stages, which are production, storage, and power generation, and realizes the electricity-hydrogenelectricity conversion mode.It highly consumes surplus renewable energy to maintain the power supply for the system and can solve the problem of seasonal power imbalance.In this paper, the applications of the HECESS at the source/grid/load side is comprehensively summarized.The paper also deeply explores the effectiveness of deep reinforcement learning (DRL) algorithms as the solution for HECESS models.One of the core problems that the paper aimed to solve is how to aggregate coordinate and optimize the allocation of hydrogen storage resources and construct a model with balanced power and with the integrated participation of source/grid/load/storage in a larger time scale and spatial scope.The related research on HECESSs needs to be further developed.
The rest of the paper is organized as follows: in Section 2, current applications of hydrogen storage technologies, open research issues, and the prospects in HECESSs are outlined.In Section 3, HECESS models are summarized.Section 4, the different applications of HECESSs the source/grid/load side are summarized and analyzed.In Section 5, models using DRL solution algorithms in different scenarios are discussed.
In Section 6, challenges and open research issues on the future technological development of hydrogen storage are provided.In Section 7, the study is summarized.

Open issues and prospects of hydrogen storage technologies
In a HECESS, hydrogen storage realizes the flexible regulation of resources through electricity-hydrogen-electricity energy conversion.It converts electrical energy into hydrogen energy through the interconvertibility between electricity and hydrogen energy, realizing the long-term storage of energy.When the demand for electricity increases, it converts hydrogen energy into electricity, realizing the efficient use of energy.Hydrogen storage technologies can eliminate energy volatility and uncertainty, especially by absorbing excess renewable energy generation.It can address the gap between electricity supply and demand and provide a reliable energy supply.The structure of hydrogen storage technology and the applications of HECESSs are shown in Figure 2. Specifically, hydrogen storage technologies utilize electric power electrolysis to produce hydrogen, which is stored in a hydrogen storage device.Electrolytic hydrogen production equipment and fuel cells realize the conversion from electricity to hydrogen power and from hydrogen power to electricity, respectively.When the demand for electricity increases or the supply of electricity is insufficient, the stored hydrogen is utilized to feed hydrogen power back into the grid through fuel cells or other reaction equipment.To realize the full applications of hydrogen storage, a complete conversion chain of hydrogen energy needs to be established.This includes the three links of hydrogen generation, storage generation, and power generation, and key technological breakthroughs in these three links.

Hydrogen production technologies
There are many ways to produce hydrogen, such as electrolyzing water to produce hydrogen, gasifying coal to produce hydrogen, and so on.Among them, electrolyzing water to produce hydrogen is a completely clean method of hydrogen production, with low technology costs and high product purity, and is the basis for hydrogen storage.Currently, hydrogen production technologies can be divided into alkaline electrolysis cells (AECs), proton exchange membrane electrolysis cells (PEMECs), solid oxide electrolysis cells (SOECs), and anion exchange membrane electrolysis cells (AEMECs).A schematic diagram of these four technologies is shown in Figure 3. AECs are the most mature and widely used electrolysis technology, with the advantages of a low cost and easy operation.However, there are some problems with AEC hydrogen production, such as its low current density and contamination of the electrolyte.Also, during the electrolysis process, if hydrogen and oxygen cross the diaphragm, an explosion can easily occur.Moreover, if high-purity hydrogen separation is required, other equipment is needed, which increases the cost and complexity of the equipment.
A PEMEC electrolyzer adopts a proton exchange membrane to transport protons and isolate each electrode from hydrogen and oxygen precipitation, and it has a compact structure.It has the advantages of high current density, high hydrogen purity, and high conversion efficiency.Especially, its high flexibility and excellent power regulation function are very suitable for the randomness of renewable energy sources, such as wind, light, water, etc.The dynamic response time of AECs and PEMECs is in the order of milliseconds, which can be adjusted quickly and flexibly according to the uncertainty of the renewable energy sources to support a stable operation of the power system.Compared with AECs, PEMEC technology has a greater improvement in fluctuation adaptability.It is more suitable for flexible-regulation scenarios that orient toward the applications of HECESSs.However, it also has shortcomings, such as being relatively expensive and less durable.
AEMEC technology utilizes an anion-exchange diaphragm to prevent gas from traveling through the diaphragm.It offers the advantages of a low cost, fast start-up, and flexibility.AEMEC technology combines the advantages of AECs and PEMECs.However, it is still in the research and development stage due to the problems of chemical and mechanical stability and the low maturity of the technology.
SOEC technology is a high-temperature water-electrolyzing technology developed in recent years, with high energy conversion efficiency.It is suitable for applications of hydrogen production from nuclear power waste heat and hydrogen production from ammonia waste heat.However, its complex structure, faster performance decay, and gas cross-contamination caused by the high-temperature operation are still to be solved, and it is still in the research and development stage [14].
The current main problem of electrolytic hydrogen production is high energy consumption and low efficiency.Breakthroughs in key technologies should focus on reducing the equipment cost, improving the energy efficiency of electrolyzers, and building a centralized large-scale production system.

Hydrogen power generation technologies
Fuel cells convert the chemical energy of hydrogen directly into electrical energy, which can avoid the conversion loss of intermediate energy.Therefore, hydrogen power generation can achieve higher power generation efficiency and is more efficient and environmentally friendly, making it more practical.Fuel cell technologies can be categorized into two types according to the operating temperature: high temperature and low temperature.Low-temperature fuel cell technologies include alkaline fuel cells (AFCs), proton exchange membrane fuel cells (PEMFCs), and phosphoric acid fuel cells (PAFCs).High-temperature fuel cell power generation technologies include solid oxide fuel cells (SOFCs) and molten carbonate fuel cells (MCFCs).
In applications of hydrogen storage for renewable energy, the focus is on solidpolymer-type PEMFC technology using pure hydrogen as fuel, which has the advantages of high power density, high energy conversion efficiency, low-temperature startup, and environmental protection.It is suitable for distributed power generation, removable power, and emergency power scenarios.A comparison of the characteristics of different types of fuel cells is shown in Table 1 [15].According to the Carnot cycle, the maximum efficiency of a fuel cell is related to its operating temperature.However, the efficiency of an actual fuel cell is affected by a variety of factors, such as the design and materials of the cell, operating conditions, the load demand, and the purity of fuel and oxygen.Different types of fuel cells have different operating temperatures and reaction characteristics, which affect the range of their maximum efficiency.Reducing the current density of a fuel cell below its maximum power density value helps to reduce cell voltage loss, thus increasing its efficiency.The net efficiency equation for a fuel cell system is given below: where Wout is stack output energy, ηout is power output efficiency, Wcon is ancillary consumption energy, HHV represents the high heating value of fuel cells, and mcon is the mass of hydrogen consumed.Among them, PEMFCs and AFCs have a fast start/stop speed and are suitable for hydrogen fuel cell vehicles and backup power generators in electric power systems.PAFCs, MCFCs, and SOFCs have high operating temperatures and are suitable for distributed power generation and cogeneration.In terms of conversion efficiency, PEMFCs, AFCs, MCFCs, and SOFCs have an electrical conversion efficiency of about 60%.The efficiency of PAFCs, MCFCs, and SOFCs can be up to 85% when they operate as cogeneration.

Hydrogen storage technologies
Economic, efficient, and safe hydrogen storage technologies are the key to promoting the scale applications of hydrogen storage in power systems.The development of hydrogen storage technologies is the basic premise of hydrogen power energy systems.Compared with other fuels, hydrogen has high energy density but low bulk energy density.Therefore, a major prerequisite for building a hydrogen storage system is to store and transport hydrogen at a higher volumetric energy density.Hydrogen storage technologies are categorized according to the physical state of hydrogen, as shown in Figure 4, which mainly comprise high-pressure gaseous hydrogen storage, liquid hydrogen storage, and solid hydrogen storage.Currently, high-pressure gaseous hydrogen storage technology is the most commonly used hydrogen storage technologies [16].The most common and direct method of hydrogen storage is high-pressure gaseous hydrogen storage.High-pressure gaseous hydrogen storage has the advantages of low cost and fast charging and discharging speeds.Therefore, it is widely used in HECESSs.However, it has the disadvantages of low density and poor safety.Cryogenic liquid hydrogen storage liquefies hydrogen and stores it in a cryogenic vacuum adiabatic device.It has high volumetric hydrogen storage density and high purity of liquefied hydrogen, but the liquefaction process consumes a lot of energy.Organic liquid hydrogen storage makes up for the low density of high-pressure gaseous hydrogen storage, and it can be recycled many times.However, the storage process is costly and the operating conditions are harsh.Liquid ammonia hydrogen storage is an emerging chemical hydrogen storage method with high hydrogen mass capacity, which makes liquid ammonia hydrogen storage a better potential for hydrogen storage.However, there are some challenges and limitations of liquid ammonia hydrogen storage technology, such as evaporation loss of liquid ammonia and hydrogen release and recovery.In addition, the preparation and handling processes of liquid ammonia need to be considered in terms of environmental safety and energy consumption.
In terms of physical adsorption, hydrogen is adsorbed on the solid surface of carbon-based materials.The adsorbed hydrogen can be adsorbed and desorbed at high rates.However, the technologies of carbon-based materials are not yet mature [14].In chemical hydride hydrogen storage, hydrogen is chemically combined with a metal or metal alloy to form a metal hydride.Its hydrogen storage bulk density and high safety have a greater potential for development.
The future lies in the field of HECESSs, which can maintain the flexibility and economy of hydrogen storage.HECESSs can strengthen the scale applications of pure hydrogen, pure ammonia combustion engines, and other technologies [17].At the same time, it is also necessary to establish the mechanism of electricity-hydrogen synergy to promote better complementarity and synergy between the two systems.

Models of hydrogen-electricity coupling energy storage systems
Hydrogen has dual attributes of energy and resource, bridging and linking all types of energy sources.Among common secondary energy sources, green power and green hydrogen are the best choices.Hydrogen-electric coupling can reduce or balance the impact of randomness and volatility on power systems and can consume new energy sources.This is the development direction and an important feature of new power systems.At present, experts and scholars have carried out a series of prospective research works on HECESSs.

Coupling characteristics of HECESS
A HECESS converts fossil and renewable energy sources into two types of secondary energy, which are electricity and hydrogen, based on technologies related to hydrogen storage.As shown in Figure 5, in the analysis of the coupling mechanism of the HECESS, both the power system and the hydrogen system have complementary potentials.At the system operation level, the power system can realize flexible hydrogen production based on the surplus renewable energy in the system, supplying power to key equipment in the hydrogen supply chain, such as compressors, and providing diversified hydrogen applications.Considering that power systems need to meet real-time supply and demand balance, hydrogen energy systems have a certain buffering capacity in all aspects of production, storage, and generation.Therefore, hydrogen energy systems can flexibly and efficiently provide auxiliary services, such as backup power generation and long-term energy storage for the power system.The establishment of HECESSs is conducive to easing the transmission and transportation pressure of power grids at lower voltage levels.At the same time, power systems with a high percentage of renewable energy consumption can produce high-purity and lowcarbon hydrogen energy through electric hydrogen generation technologies.The hydrogen system then provides load management services to the power system based on hydrogen generation technologies, thereby improving power quality and reliability.

Models of HECESS
Considering the fast dynamic response of electrolyzers, the electric hydrogen production technologies are generally described by a simple linear model in optimization studies of HECESSs [18].A linear efficiency constant is usually used to describe the relationship between input power and output power [19].The electrolytic cell can decompose water into hydrogen and oxygen, and its output power can be expressed as [20]: where Pel and ηel are the input power and the conversion coefficient of an AEC, respectively.
A fuel cell uses hydrogen and oxygen as fuel and converts chemical energy into electrical energy to be stored, and its output power can be expressed as: where Mfc and ηfc are the power input from the hydrogen storage tank to the fuel cell and the conversion coefficient of a PEMFC, respectively.
In terms of thermodynamic characteristics, the energy conservation equations of the simple linear model are shown as: (5) where Hel and Hfc are the heating absorption power of the circulating water in the heat supply network produced by the AEC and PEMFC, respectively, and ηheat is the heating conversion coefficient.
Hydrogen storage tanks are used to store hydrogen produced by AEC electrolysis and also to provide hydrogen for chemical reactions in PEMFCs.The mathematical model of the hydrogen storage tank energy storage at time t can be expressed as [21]: fc  tank ′ fc ∆t (6) where Etank(t) is the energy stored in the hydrogen storage tank at time t, while η'el and η'fc are the efficiency values of the electrolyzer and fuel cell, respectively, and ηtank is the working efficiency of the hydrogen storage tank.
Based on this, in hydrogen production, Li et al. [22] developed a model of an electric hydrogen plant that accounted for the cogeneration of heat and hydrogen.The model scheduled hydrogen and thermoelectricity according to temperature and was integrated with an active distribution network and district heating network systems.Pan et al. [23] further developed a cogeneration model, taking into account the startstop characteristics of electric hydrogen production and a seasonal hydrogen storage model.Lin et al. [24] modeled a more complex nonlinear electric hydrogen generation system by considering various factors, such as stack lifetime degradation rate, auxiliary equipment energy consumption, etc., and processed it into a linear model by using a segmented linear method.The models mentioned above contributed to the optimization of scheduling and hydrogen storage schemes for electrolytic hydrogen production facilities, thereby enhancing system efficiency and sustainability.
In terms of hydrogen power generation, a study constructed a day-ahead optimal scheduling model of a regional HECESS that converted electricity into gas [25].The fuel cell power generation and heat production efficiencies were characterized as a quintuple function of the output electric power specification.Tao et al. [26] further established a hydrogen fuel cell model that took into account the load-fuel consumption variation to minimize the energy consumption of fuel cell vehicles.The establishment of these models provides reference and guidance for the optimal scheduling of HECESSs and hydrogen fuel cell vehicles.By considering various factors, such as energy consumption and efficiency, a more dynamic and effective energy scheduling strategy is realized.
In terms of hydrogen storage, Li et al. [27] proposed the concept of a hydrogen supply chain integrated into an electric network for seasonal storage.A daytime hydrogen storage model, taking into account charging and discharging constraints and hydrogen storage constraints, was developed to solve regional and seasonal imbalance problems and guarantee the hydrogen energy supply.Considering model charging and discharging constraints and hydrogen storage constraints, Taweel et al. [28] established an hourly hydrogen storage model based on the demand response, which further considered the minimum hydrogen storage constraints to participate in optimal scheduling.Hydrogen storage is required to realize energy storage with large planning and a long-term scale.To solve the difficult problem of inter-seasonal hydrogen storage, Pan et al. [29] proposed a two-layer mixed-integer planning model for an energy system integrating electricity and hydrogen.The operational state of seasonal hydrogen storage was considered in the two-layer model to highlight the role of hydrogen in renewable energy penetration and seasonal complementarity.These studies provide important theoretical and methodological insights into the field of hydrogen storage and help to address the challenges of balancing supply and demand for long-duration storage and inter-seasonal storage.
The above analysis shows that researchers have adopted different approaches and models to solve the equilibrium problem of hydrogen supply.On the one hand, they focused on the construction of hydrogen production facilities to meet the local demand and to reduce the cost and energy consumption of inter-regional delivery of hydrogen.On the other hand, they considered large-scale long-term hydrogen storage technologies to enable the release of hydrogen supply during peak demand.In addition, they modeled inter-seasonal hydrogen storage, which provides important efforts and contributions in eliminating regional and seasonal imbalances and guaranteeing the balance and reliability of hydrogen supply.

Optimal planning and operation of HECESS
Compared with electric-to-gas conversion models of electro-gas coupled integrated energy systems, HECESSs omit the hydrogen-to-gas conversion and deliver hydrogen directly to the hydrogen fuel cell after the electrolytic hydrogen production process.This prevents the loss of energy during the conversion process, and hence the energy conversion efficiency is higher.Therefore, many scholars have further carried out research on coupled systems.Considering the advantages of multi-energy complementarity between different energy sources, Zheng et al. [30] developed hydrogen storage devices for multi-level energy development.The corresponding optimization design problem was established, and the optimal capacity configuration of the system and the corresponding operation strategy were determined.Cheng et al. [31] proposed a two-tier decentralized planning approach for multi-energy coupled systems and a two-tier extended planning model for multi-energy coupled systems that considered decentralized emission constraints.The upper-layer planning took into account the optimal solution of the multi-area HECESS with electrical networks.The lower-layer planning investigated the optimal energy-supply allocation method for the regional electrical HECESS, taking into account carbon emission constraints.Jiang et al. [32] proposed a planning method for a coupled wind-hydrogen-electric network by considering traffic flow capture and solving the problem of siting and sizing of hydrogen refueling stations and wind farms under this coupled network.
The basic process of system optimization is as follows.First, the planning model of the scene is established.Then the renewable energy output is predicted by combining equipment selection and site conditions.The optimization variables of the model are selected by comprehensively considering the relationship between load and power.An optimization model with a system constraint and a fixed capacity is constructed based on the power balance of the grid and the actual physical limitations of each power generation subject.It is usually a set of equations containing component characteristics and system operation characteristics.The optimization objective generally needs to consider technical, economic, and environmental indicators [33] to achieve high efficiency, feasibility, and sustainability of optimization.Economic indicators include energy cost, whole-life-cycle cost, and cost of lost power.Technical indicators include equipment performance degradation and response time.Environmental indicators include energy storage efficiency, environmental friendliness, etc.Finally, the optimization model is solved to determine the hydrogen storage configuration.The reasonable selection of the optimization method will also affect the optimal solution and solving efficiency of the model.
At present, the commonly used optimization methods mainly include classical optimization methods, such as the probabilistic method and linear or nonlinear programming method [34].Such methods are suitable for solving single-peak functions, but their shortcoming is that they are easy to fall into local optimal solutions in optimization problems with multiple local optima.Stochastic optimization methods include genetic algorithms [35], particle swarm algorithms [36], and many other algorithms.Stochastic optimization methods, compared with classical optimization methods, can optimize to obtain the global optimal solution, depending on the initial value.However, its search efficiency is low and it is difficult to obtain the same optimized solution in multiple optimizations.In recent years, machine learning techniques have been integrated into metaheuristics to solve combinatorial optimization problems [37].The aim is to improve the performance of the algorithms in terms of solution quality, convergence rate, and robustness.Among them, DRL has received much attention due to its excellent performance in high-uncertainty operational problems.To this end, the paper presents a comprehensive literature review of DRL and its application to HECESSs, as will be shown in Section 5.
It is worth noting that current experts and scholars pay a high degree of attention to the planning and operation of HECESSs.The existing research and practice mainly focus on optimization planning and the benefit analysis of a single or partial electrichydrogen coupling link or technology under the established optimization goal.The potential multidimensional value of HRCESSs has not yet been comprehensively analyzed from the perspectives of system modeling, benefit assessment, investment planning, and optimized operation.For example, there is still a lack of work on analyzing the cost of detailed hydrogen production, power generation, storage, and transportation.In the synergistic planning studies of HECESSs, there are relatively few studies that analyzed the architecture of future energy forms of electric power systems and hydrogen systems.

Applications of HECESS in power systems
The positioning of a HECESS in a power system is different from those of other energy storage modes, mainly in terms of long periods of action, trans-seasonal storage, and large-scale storage.An overview of hydrogen storage and its applications in power systems is shown in Figure 6.In this section, the applications of HECESSs at the source, grid, and load sides of power systems are summarized and analyzed.

Applications of HECESS at source side
At the power supply side, direct grid integration of a high percentage of wind power has a higher impact on a power system, and the output exhibits more stochasticity and volatility.This is mainly due to the uncertainty of weather conditions and energy supply from renewable energy sources, leading to instability in power system operation.This situation can lead to the occurrence of wind and light abandonment.Hence, hydrogen is used as inter-seasonal energy storage, and the hydrogen produced by consuming excess energy is kept in hydrogen storage devices that can also supply hydrogen loads.When the peak demand for electricity or renewable energy supply is insufficient, hydrogen storage can cut the peaks and fill the valleys to smooth the power output curve of wind, light, and other power sources and enhance the deep consumption of renewable energy [38].Hydrogen storage technologies can realize the smooth scheduling of renewable energy and extend the power supply time.Based on the rapid-response capability of hydrogen storage to fluctuating renewable energy, output is smoothed to help solve the volatility and intermittency of renewable energy [39].This improves the stability and reliability of power systems, which in turn realizes the friendly grid connection of clean energy sources, such as wind and solar.In addition, the construction of hydrogen fuel power stations based on hydrogen-rich or pure-hydrogen gas turbine technologies can provide the inertia support for new power systems.A part of the flexible load is curtailed to enhance the frequency stability of systems under load fluctuation and contact line interruption [40].

Applications of HECESS at grid side
At the grid side, new power systems present low inertia characteristics, which makes the stability of grid operation deteriorate, easily triggering oscillations [41].Generally, hydrogen storage power stations are reasonably deployed at key nodes in a system, such as large-scale new energy aggregation and intensive load access.The hydrogen gas turbine in hydrogen storage power stations can provide part of the inertia support on the grid side, slowing down the fluctuation of grid frequency.In addition, hydrogen storage power stations also have the ability of a two-way speed change and can be used in a short period for rapid power output or energy storage.This capability allows hydrogen storage power plants to provide peak-frequency regulation services for the power grid.A hydrogen storage power station adopts the conversion method of electricity-hydrogen-electricity: when power supply and demand are imbalanced, the hydrogen storage power station will be connected to the end of the blocking line of the transmission and distribution system.At this time, the high-capacity hydrogen storage can act as a virtual transmission line.It will be charging during the low valley load time and discharging during the peak load time.This reduces transmission and distribution system capacity requirements and alleviates the impact of the system blocking transmission and distribution line capacity on grid power [42].
Due to climatic factors, renewable energy output shows seasonal characteristics, as well as uncertainty [43].This leads to an imbalance between the supply and demand of electricity in power systems on a long-term scale.The intervention of long-term hydrogen storage can realize the seasonal regulation of electricity.Therefore, the use of hydrogen production technologies in hydrogen storage power plants combined with hydrogen storage technologies can store hydrogen for a long period to realize the interseasonal and inter-regional supply of energy.At the same time, hydrogen power generation technologies can be utilized for off-site transmission of power to ensure a balanced power supply.In addition, using the grid as a bridge for energy transfer and adopting seasonal hydrogen storage technologies can solve the inherent spatial and temporal imbalance between renewable energy hydrogen production and hydrogen loads.

Applications of HECESS at load side
At the load side, hydrogen storage can provide various types of auxiliary services for the grid to meet the diversified needs of peak shifting, frequency regulation, grid rotation, and power standby.Hydrogen storage buildings/parks can be broadly categorized into centralized energy storage and distributed energy storage.The centralized storage structure is mainly for multiple buildings and energy centers to share power.Distributed energy storage structures, on the other hand, share power among buildings.Fan et al. [44] proposed a distributed real-time scheduling method for multiple buildings in a smart park based on the DRL of multiple intelligences by utilizing the complementary characteristics of electricity and hydrogen and the energy interaction between buildings.Utilizing HECESSs to build hydrogen-energy buildings or parks can ensure the continuous supply of electric power to the buildings and time can assist in peak shifting and frequency regulation to send the excess energy back to the grid.
The new power system construction concept evolves from the traditional sourcefollows-load to load-follows-source.In this context, it is very important to tap the flexibility resources at the load side.Hydrogen generation and refueling stations can be an important new type of flexible regulation resource at the load side to participate in the load demand response.Hydrogen refueling stations connect upstream hydrogen production and transportation to downstream applications and are an important hub for hydrogen energy transportation.In addition, connecting renewable energy sources to the power grid, hydrogen grid, gas grid, and heat grid can accelerate the energy transition process.In the future, building multi-energy HECESSs, such as electricityhydrogen systems, will become one of the typical scenarios at the load side.

Deep reinforcement learning algorithms for HECESS
Due to the complexity of planning models of HECESSs, the performance of traditional optimization methods in terms of solution quality, convergence rate, and robustness is not outstanding.HECESSs that consider multiple energies, such as electricity-hydrogen-cooling-heating, have higher requirements on an algorithm's solution efficiency and accuracy.Traditional model-based methods make it difficult to choose appropriate models for an actual energy system [45,46], and many of the assumptions made to simplify the model make it not applicable to real-world situations [47].To improve the efficiency as well as the accuracy of solving the models of HECESSs, this paper proposed model-free algorithms, such as DRL.The algorithms show great potential for online optimization by learning strategies in the interaction of intelligence with their environment [48,49].Overall, in this section, the fundamental knowledge of the Markov decision process (MDP), reinforcement learning (RL), and deep learning (DL) is introduced.Then, the combination of RL and DL is described, which results in the formation of DRL.Finally, the details and motivation of applications of DRL behind the relevant literature are reviewed.

Markov decision process
RL and DRL are types of learning that map environment states to actions.Through feedback from the environment, agents perceive the strengths and weaknesses of their behavior and continuously modify it to ultimately obtain the maximum cumulative reward.This kind of learning problem is generally described by the MDP [50].
In a stochastic dynamic system, if the next state, St+1, of the agent is only related to the current state, St, but not related to the earlier historical state, the system is said to have the Markov property: p (st+1 | st) = p (st+1 | st , st−1, . . ., s1, s0) (7) When the environment state of the agent has the Markov property, the agent selects a behavior in the current state and transfers it to the next state.Such a sequential process is the MDP.The MDP can generally be represented by five tuples: MDP ~ {S, A, P , r, γ} (8) where, S is the state space of the agent; A represents the behavior strategy space of the agent; P: S × A × S → [0, 1] is the transition probability; r : S × A → ℝ is the immediate return that the agent gets from the environment; γ ∈ [0, 1] is the discount factor, which reflects the value proportion of future rewards at the current moment; and cumulative return . The above MDP assumes that the system state is completely observed by the agent.However, in most cases, the agent can only observe a part of the system state.Therefore, considering the uncertainty introduced by partial observation, a partially observable Markov decision process (POMDP) is proposed to establish the decision model.The POMDP is a mathematical framework for modeling the situation in which the decision-maker only has part of the information of the system state.The POMDP is an extension of the MDP, which considers the situation that some status data are missing or considered uncertain.It can be described as a six-tuple (S, A, P, r, Ω, O), where (S, A, P, r) are denoted in the same way as in the MDPs, while Ω and O represent the set of observations and their corresponding observation probabilities, respectively.
Under the MDP, Li et al. [51] defined state value function vπ(s) and behavior value function qπ(s) based on policy π.

Reinforcement learning
Reinforcement learning mainly focuses on how agents make decisions on environmental stimuli to maximize long-term cumulative rewards, thus forming a mapping relationship between the state and the behavior [52].The interaction between an intelligent agent and an unknown environment in reinforcement learning is shown in Figure 7, which mainly consists of the following steps: 1) In the current state, st, the agent selects behavior at according to behavior value function Q and behavior strategy πt; the corresponding response of the unknown environment to behavior at taken by the agent will be transferred to the next state, st+1, and a reward signal, rt, will be fed back to the agent 2) The agent updates its behavior value function Q through environmental feedback rt 3) Guide the subsequent behavior strategy πt 4) Return to Step 1 and repeat the above process Within this learning mechanism, the agent's behavior in a certain state will be selected many times through the role of value function Q and behavior strategy πt.Therefore, in the case of a time-varying environment, the agent can seek the maximum cumulative return through continuous interaction with the environment to continuously update the strategy so that the algorithm can achieve the purpose of tracking the changes in the environment [53,54].

Deep learning
In RL, the agent typically uses a table or a simple function to represent its policy or value function, which limits the agent's ability to handle complicated problems with large-scale environments and high uncertainty, thus further limiting its applications in HECESS operations.Therefore, DL has been introduced to assist RL in dealing with these challenges.DL is a subset of machine learning based on deep neural networks (DNNs).It attempts to simulate human brain behavior and extract important features from massive raw data.There are two classical DNN models, which are the convolutional neural network (CNN) [55] and recurrent neural network (RNN) [56].
As a strong feature extraction structure, the CNN has drawn the great attention of numerous researchers in recent years.The CNN layer functions in the CNN-BiLSTM model as in Step 1 in Section 5.2 of accepting selected variables as input to the input layer and in Step 2 in Section 5.2 of extracting features of the variables to the input layer of the BiLSTM layer.The structure of the CNN is shown in Figure 8.The core of the CNN is the convolutional layer, which reduces network complexity and the number of parameters.In this layer, the characteristics of input data are revealed, which can be expressed as: ) where f is the activation function, and W m and bm denote the weight and bias of the kernel to the mth feature map, respectively.The pooling layer reduces volume size and improves computational performance, thus making computation easier, which can use maximum or average pooling.Unlike the CNN, the RNN extracts the information from prior inputs to determine the current input and output.As a typical RNN, a long short-term memory (LSTM) network stands out for its excellent ability to capture and retain long-term dependencies through integrated memory units and different gating mechanisms [57,58].The main architecture of LSTM can be seen in Figure 9.The memory unit in LSTM allows the network to store and access information for a long time.In addition, the gating mechanism (including the input gate, forgetting gate, and output gate) controls the information flow and enables the network to selectively retain or forget information according to the relevance of information.

Deep reinforcement learning applied for HECESS
The simple tabular structure limits RL's ability to describe system characteristics, and therefore, DRL combines deep learning and reinforcement learning to deal with more complex tasks, such as decision optimization under the settings of highdimensional continuous state space and high-dimensional action space [59,60].The general framework of DRL can be seen in  Value-based DRL, such as deep Q learning (DQN), tends to optimize action value function Q(s, a) to obtain the preference of action selection [61,62].Therefore, valuebased DRL has higher sampling efficiency and smaller value function estimation variance and does not easily fall into local optimization.However, value-based DRL methods cannot deal with the continuous action space problem, which limits the use of this method in HECESSs.Different from value-based DRL, policy-based DRL (i.e., proximal policy optimization) relies on using gradient descent to optimize the parameterized strategy, taking into account the expected reward rather than optimizing the action value function, which can deal with the issue of high continuous action space [63,64].
In recent years, HECESSs have attracted extensive attention all over the world due to their sustainable development and environment-friendly characteristics [65].However, as more and more renewable energy and flexible loads are incorporated into power grids, HECESSs have become a complex dynamic system with strong uncertainty, which brings huge challenges to the safe and economic operation of HECESSs.Moreover, as mentioned above, traditional model-based methods are not suitable for HECESS optimization problems when considering strong randomness.Therefore, as model-free optimization algorithms, DRL algorithms have been introduced to solve the optimal scheduling problem of HECESSs and have achieved a series of successful applications.In the rest of this section, a comprehensive review of DRL-based optimization and operation of HECESSs is discussed.
DRL has been used to deal with the optimal dispatch problems of the source side in HECESSs.For example, Yi et al. [66] developed a scalable computational framework to facilitate the research of DRL algorithms for the optimization of HECESSs that consider nuclear resources.By analyzing the benchmark performances of various DRL algorithms, the superiority of DRL algorithms over the traditional particle swarm optimization (PSO) algorithm was proved.Yang et al. [67] proposed an improved DDPG-based algorithm to cope with the dispatch problems for a HECESS while considering the uncertainty of distributed generation, flexible load, and heat load.Numerical simulation results showed that this method can adapt to the uncertainty of system energy demand and photovoltaic power generation, dynamically optimize the output of each energy unit, and reduce the operation cost of the system.Zhang et al. [68] applied proximal policy optimization (P-PO) to obtain the energy management policy, which allowed for several optimization targets, including costs of operation, battery storage system, and pollution costs.The simulation results showed that the total daily cost of the system can be reduced by approximately 2.6% compared with those of other methods.In addition, Zhang et al. [69] proposed a dynamic energy dispatch strategy solved by PPO for a HECESS combining renewable energy, while considering the uncertainty of the load side, intermittency of renewable energy, and flexibility of upper-level electricity prices.Xu et al. [70] investigated a DRL-based optimization model for the unit commitment problem that can hedge against wind power uncertainty.The results showed that the proposed method can effectively address the increasingly critical need for solving the unit commitment problem in a computationally efficient manner under high penetrations of renewable energy.Alabi et al. [71] proposed DRL with an automated hyperparameter selection feature to dispatch a real-time multi-energy system, which achieved great success compared with rule-based scheduling.These works in the literature mainly focus on solving power scheduling problems in HECESSs.By introducing the DRL algorithm, these methods can effectively handle the requirements of uncertainty and multi-objective optimization.They achieve good performance in terms of reducing system operating costs, improving efficiency, and realizing sustainable energy management.
Moreover, multi-agent DRL (MADRL) has also shown great potential in the optimization issue of HECESSs.For example, Monfaredi et al. [72] proposed a MADRL-based method for optimal energy dispatch, which integrated gas and power systems that considered distributed energy resources, energy storage systems, and thermal and electric loads.The simulation results showed that the operating profit was significantly optimized and reasonable operating cost was achieved, as well as ensuring the safety of the power system.The energy management problem of a multienergy hub was transformed into a multi-agent coordination optimization problem based on MADRL, which minimized the system operation cost and carbon dioxide emissions under the premise of meeting the constraints [73].Guo et al. [74] proposed a real-time decentralized control strategy based on MADRL, which made full use of the residual capacity of the photovoltaic inverter to minimize power loss under the premise of ensuring voltage safety.In summary, these studies proposed methods for using MADRL in HECESSs.By considering the coordinated optimization of multiple participants, these methods were able to maximize the operational benefits and reduce the operational costs of power systems.At the same time, these methods were also able to minimize carbon dioxide emissions and power losses, provided that the constraints were satisfied.
On the other hand, the applications of DRL on the demand side of HECESSs were investigated [75].Zhong et al. [76] proposed a deep reinforcement learning framework based on DQN, which realized the dynamic generation of user subsidy price to maximize the profit of the load aggregator while promoting demand response.Numerical studies showed that users saved up to 8.7% of heating costs and power grid companies saved 56.6% of investment.Li et al. [77] constructed a coordinated power dispatching framework based on a multi-agent deep deterministic policy gradient, combining imitation learning with course learning.The scheduling performance of the algorithm under renewable power fluctuations and random loads was verified by case studies.Ye et al. [78] proposed a model-free data-driven method based on the priority depth determination policy gradient (PDDPG) method, which can determine the realtime autonomous control strategy of multi-energy systems and can also achieve significantly lower daily energy costs.Zhou et al. [79] established the constrained scheduling problem of a combined heat and power system as MDP.On this basis, an improved strategy gradient DRL algorithm was proposed.The simulation results showed that the algorithm can handle different running scenarios and obtain better optimization performance than other methods.Given the uncertainty caused by renewable energy and demand response (DR), Dong et al. [80] proposed an optimal scheduling framework based on the combination of a soft actor-critic DRL algorithm and the interval optimization theory, which led to significant improvements in the system economy.Yun et al. [81] proposed a new interpretable multi-agent DRL method to realize the automatic production control of a manufacturing system under dynamic DR, while maintaining the constraints of production objectives.The simulation results showed that this method can save 13.6% and 30.7% of the energy cost in one day and three days production cycles, respectively.Xie et al. [82] proposed a MADRL method approach that employed an actor-critic algorithm, including a shared attention mechanism, to achieve effective and scalable real-time coordinated demand response in a grid response architecture.MADRL reduced the net load demand by more than 6% compared with those of traditional and state-of-the-art reinforcement learning methods.These studies show that the application of DRL on the demand side of HECESSs can maximize profit, minimize cost, and optimize energy efficiency compared with those of traditional optimization algorithms.Introducing the DRL algorithm to solve the optimal scheduling problem of HECESSs and promoting the innovation iteration of optimization algorithms and scheduling strategies make the application of HECESSs more promising.
Overall, the above DRL-based approaches can efficiently handle highdimensional optimal scheduling problems with uncertainty.Both on the source side and the demand side of HECESSs, DRL can achieve better performance than traditional methods.The utilization of this method not only improves the scheduling performance but also enhances the robustness and adaptability of a system, making optimal scheduling more accurate and reliable.

Challenges and open research issues of HECESS
HECESSs promote the efficient coupling of hydrogen storage with the multienergy integration of a power system to enhance its resiliency and flexibility, resulting in a more efficient and higher-quality power supply.With the continuous improvement of hydrogen production, hydrogen power generation, hydrogen storage, and other technologies, HECESSs with multiple energy sources, such as cold, heat, electricity, and hydrogen, promote the realization of the goal of deep decarbonization.
However, the applications of HECESSs still need to continue to be addressed and broken through.Hydrogen is categorized into gray hydrogen, blue hydrogen, bluegreen hydrogen, and green hydrogen according to its production source.Green hydrogen is hydrogen produced through renewable electricity by electrolyzing water; the production process does not produce carbon dioxide and hence green hydrogen is most suitable for realizing a sustainable energy transition.The application of solar hydrogen production, biomass hydrogen production, and other methods should be actively promoted to facilitate the development of diversified hydrogen production methods.But at present, the production cost of green hydrogen is high, and cost reduction is the main goal in developing green hydrogen utilization.The economic advantages of green hydrogen can be demonstrated through the synergistic reduction of electricity costs and equipment costs in hydrogen production.In the future, 100 MW and above for a hydrogen production system via electrolyzing water will become the mainstream scale.
As the cost of renewable energy power generation continues to decline, renewable energy power generation will become the mainstream form of power in the future.Coupled with the limited acceptance of grids, off-grid renewable energy hydrogen production will become an important green hydrogen production scenario in the future.Off-grid wind/photovoltaic hydrogen production will continue to develop in terms of system planning and operation, optimal capacity allocation, and economic stability control.Grid requirements for flexibility are increasing day by day, and hydrogen storage systems will be deeply involved in demand-side response services.With the emergence of hydrogen storage in grid peakfrequency regulation scenarios, the frequency regulation strategy that takes into account the start-stop and dynamic response characteristics of hydrogen or fuel cells will gradually emerge.Different types of hydrogen storage multi-temporal optimization configuration technologies will be gradually improved, and hydrogen storage will be used as multi-timescale storage to support the grid inter-temporal power balance.
Considering HECESSs, a zero-carbon energy supply for parks or buildings will be realized.Efficient solution techniques for system planning modeling will be the key to developing hydrogen storage for scaled access to power systems.Establishing effective models of HECESSs can provide the decision support for power grids and planning layout for the operation of systems.Various algorithms, such as heuristics and artificial intelligence, have been applied to solve the models and realize the selflearning optimal configuration of HECESSs.Among them, solving HECESS models with DRL algorithms has higher solving efficiency and accuracy than traditional optimization algorithms.In addition, DRL algorithms perform better for complex systems requiring large-scale data processing and complex model training.

Conclusion
Oriented towards the utilization of new energy development and the realization of the goal of deep decarbonization, hydrogen energy is a green energy source that can simultaneously solve the energy crisis and environmental pollution problems in the future.HECESSs, which couple hydrogen power and electricity, can promote the development of a higher proportion of new energy sources and realize the mutual transformation and efficient synergy between hydrogen power and electricity.This paper took HECESSs as the research object and provided an in-depth summary and analysis of the current status of the application technology of electric-hydrogenelectric conversion from aspects of hydrogen production, hydrogen power generation, and hydrogen storage.The synergistic mechanism of hydrogen storage and electric energy has been studied, and the structural model, planning method, and optimal dispatch of HECESSs were discussed.The application scenarios of hydrogen storage were explored with the links of the source, grid, and load sides as the main line.In terms of model solving, considering the shortcomings of traditional optimization methods, such as slow solving speed and easily falling into the local optimum, the applications of DRL algorithms in multi-energy HECESSs were explored.Finally, challenges and salient research questions for the future development of HECESSs were presented to inform researchers.

Conflict of interest:
The authors declare no conflict of interest.

Figure 1
compares the applicable scale and storage duration of various energy storage technologies.Distinguished from other energy storage methods, hydrogen storage shows a better long-term energy storage performance in terms of storage time and storage capacity.It has the advantages of large energy-storage capacity, long storage period, and good flexibility.

Figure 1 .
Figure 1.Applicable scale and storage duration of different energy storage technologies.

Figure 2 .
Figure 2. Structure of hydrogen storage technology and applications.

Figure 3 .
Figure 3. Diagrams of hydrogen production from different processes of electrolyzing water.

Figure 6 .
Figure 6.Hydrogen storage and applications in power systems.

Figure 10 .
According to policy optimization, DRL algorithms can be divided into value-based and policy-based algorithms.

Table 1 .
Comparison of characteristics of different fuel cells. :