Abstract
Advancements in computing power have recently made it possible to utilize machine learning and deep learning to push scientific computing forward in a range of disciplines, such as fluid mechanics, solid mechanics, materials science, etc. The incorporation of neural networks is particularly crucial in this hybridization process. Due to their intrinsic architecture, conventional neural networks cannot be successfully trained and scoped when data are sparse, which is the case in many scientific and engineering domains. Nonetheless, neural networks provide a solid foundation to respect physics-driven or knowledge-based constraints during training. Generally speaking, there are three distinct neural network frameworks to enforce the underlying physics: (i) physics-guided neural networks (PgNNs), (ii) physics-informed neural networks (PiNNs), and (iii) physics-encoded neural networks (PeNNs). These methods provide distinct advantages for accelerating the numerical modeling of complex multiscale multiphysics phenomena. In addition, the recent developments in neural operators (NOs) add another dimension to these new simulation paradigms, especially when the real-time prediction of complex multiphysics systems is required. All these models also come with their own unique drawbacks and limitations that call for further fundamental research. This study aims to present a review of the four neural network frameworks (i.e., PgNNs, PiNNs, PeNNs, and NOs) used in scientific computing research. The state-of-the-art architectures and their applications are reviewed, limitations are discussed, and future research opportunities are presented in terms of improving algorithms, considering causalities, expanding applications, and coupling scientific and deep learning solvers.
1 Introduction
Machine learning (ML) and deep learning (DL) are emerging as pivotal technologies to propel scientific research and computing across a range of fields, including but not limited to fluid mechanics [1], solid mechanics [2], materials science [3,4], and so on. The advent of multi-teraflop machines equipped with thousands of processors designed for scientific computing, in conjunction with advanced sensory-based experimentation, has led to an exponential growth of structured and unstructured heterogeneous data in science and engineering disciplines. ML and DL methodologies were initially introduced to address the challenge of developing efficient data modeling procedures, which impeded scientists’ ability to interact with heterogeneous and complex data swiftly [5]. These approaches have demonstrated the potential to transform scientific computing by enabling the exploration of vast design spaces, identifying multidimensional connections, and resolving ill-posed issues [6–8].
However, conventional ML and DL methods are unable to extract interpretative information and expertise from complex multidimensional data. They may be effective in mapping observational or computational data, but their predictions may be physically irrational or dubious, resulting in poor generalization [9–11]. For this reason, scientists initially considered these methodologies as a magic black box devoid of a solid mathematical foundation and incapable of interpretation. Notwithstanding, learning techniques constitute a new paradigm for accurately solving scientific and practical problems orders of magnitude faster than conventional solvers.
Deep learning (i.e., neural networks (NNs) mimicking the human brain) and scientific computing share common historical and intellectual links that are normally unrealized, e.g., differentiability [9]. Figure 1 shows a schematic representation of the history of development for a plethora of scientific computing and DL approaches (only seminal works are included). In the last decade, breakthroughs in DL and computing power have enabled the use of DL in a broad variety of scientific computing, especially in fluid mechanics [1,11–13], solid mechanics [2,14–16], and materials science [17–20], albeit at the cost of accuracy and loss of generality [21]. These data-driven methods are routinely applied to fulfill one of the following goals: (i) accelerate direct numerical simulations using surrogate modeling [22], (ii) accelerate adjoint sensitivity analysis [9] and uncertainty quantification [23], (iii) accelerate probabilistic programming [24], and (iv) accelerate inverse problems [25,26]. For example, in the first goal, the physical parameters of the system (e.g., dimensions, mass, momentum, temperature) are used as inputs to predict the next state of the system or its effects (i.e., outputs), and in the last goal, the outputs of a system (e.g., a material with targeted properties) are used as inputs to infer the intrinsic physical attributes that meet the requirements (i.e., the model’s outputs). To accomplish these goals, lightweight DL models can be constructed to partially or fully replace a bottleneck step in the scientific computing processes [21,27,28].
Due to the intrinsic architecture of conventional DL methods, their learning is limited to the scope of the datasets with which the training is conducted (e.g., specific boundary conditions, material types, spatiotemporal discretization), and inference cannot be successfully scoped under any unseen conditions (e.g., new geometries, new material types, new boundary conditions). Because the majority of the scientific fields are not (big) data-oriented domains and cannot provide comprehensive datasets that cover all possible conditions, these models trained based on sparse datasets are accelerated but not predictive [28]. Thus, it is logical to leverage the wealth of prior knowledge, underlying physics, and domain expertise to further constrain these models while training on available and sparse data points. NNs are better suited to digest physical-driven or knowledge-based constraints during training. Based on how underlying physics is incorporated, the authors classified neural network applications in scientific computing into three separate types: (i) physics-guided neural networks (PgNNs), (ii) physics-informed neural networks (PiNNs), and (iii) physics-encoded neural networks (PeNNs).
In PgNN-based models, standard supervised DL techniques are used to construct surrogate mappings between formatted inputs and outputs that are generated using experiments and computations in a controlled setting and curated through extensive processes to ensure compliance with physics principles and fundamental rules [28,47]. Such models require a large and sufficient dataset to be trained and used reliably. A PgNN-based model maps a set of inputs x to a related set of outputs y using an appropriate function F with unknown parameters w such that y = F(x; w). By specifying a particular structure for F, a data-driven approach generally attempts to fine-tune the parameters w so that the overall error between true values, , and those from model predictions, y, is minimized [8]. For complex physical systems, the data are likely sparse due to the high cost of data acquisition [48]. The majority of state-of-the-art PgNNs lack robustness and fail to fulfill any guarantees of generalization (i.e., interpolation [44,49] and extrapolation [50]). To remedy this issue, PiNNs have been introduced to perform supervised learning tasks while obeying given laws of physics in the form of general nonlinear differential equations [7,11,51–53].
The PiNN-based models respect the physical laws by incorporating a weakly imposed loss function consisting of the residuals of physics equations and boundary constraints. To differentiate the neural network outputs with respect to their inputs (i.e., spatiotemporal coordinates and model parameters), these methods leverage automatic differentiation (AD) [54]. By minimizing the loss function, the neural network can closely approximate the solution [55,56]. As a result, PiNNs lay the groundwork for a new modeling and computation paradigm that enriches DL with long-standing achievements in mathematical physics [44,51]. The PiNN models face a number of limitations relating to theoretical considerations (e.g., convergence and stability [7,57,58]) and implementation considerations (e.g., neural network design, boundary condition management, and optimization aspects) [11,46]. In addition, in cases where the explicit form of differential equations governing the complex dynamics is not fully known a priori, PiNNs encounter serious limitations [59]. For such cases, another family of DL approaches known as PeNNs has been proposed [46].
The PeNN-based models leverage advanced architectures to address issues with data sparsity and the lack of generalization encountered by both PgNNs and PiNNs models. PeNN-based models forcibly encode known physics into their core architecture (e.g., NeuralODE [60]). By construction, PeNN-based models extend the learning capability of a neural network from instance learning (imposed by PgNN and PiNN architectures) to continuous learning [60]. The encoding mechanisms of the underlying physics in PeNNs are fundamentally different from those in PiNNs [61,62], although they can be integrated to achieve the desired nonlinearity of the model. Compared to PgNNs and PiNNs, the neural networks generated by the PeNN paradigm offer better performance against data sparsity and model generalizability [46].
There is another family of supervised learning methods that do not fit well in the PgNN, PiNN, and PeNN categories, as defined earlier. These models, dubbed neural operators (NOs), learn the underlying linear and nonlinear continuous operators, such as integrals and fractional Laplacians, using advanced architectures (e.g., deep operator networks (DeepONets) [45,63]). The data-intensive learning procedure of a neural operator may resemble the PgNN-based model learning, as both enforce the physics of the problem using labeled input–output dataset pairs. However, a neural operator is very different from a PgNN-based model that lacks generalization properties due to under-parameterization. A neural operator can be combined with the PiNN and PeNN methods to train a model that can learn complex nonlinearity in physical systems with extremely high generalization accuracy [50]. The robustness of this combination (e.g., PiNN and neural operators [64,65]) for applications requiring real-time inference is a distinguishing characteristic. This review article is primarily intended for the scientific computing community interested in the application of neural networks in computational fluid and solid mechanics. It discusses the general architectures, advantages, and limitations of PgNNs, PiNNs, PeNNs, and neural operators and reviews the most prominent applications of these methods in fluid and solid mechanics. The remainder of this work is structured as follows: In Sec. 2, the potential of PgNNs to accelerate scientific computing is discussed. Section 3 provides an overview of PiNNs and discusses their potential to advance PgNNs. In Sec. 4, several leading PeNN architectures are discussed to address critical limitations in PgNN and PiNNs. Section 5 reviews recent developments in neural operators. Finally, in Sec. 6, an outlook for future research directions is provided.
2 Physics-Guided Neural Networks
PgNNs use standard supervised DL models to statistically learn the known physics of a desired phenomenon by extracting features or attributes from training datasets obtained through well-controlled experiments from complex dynamical systems and well-calibrated computations from multiscale multiphysics systems [66]. PgNNs consist of one or a combination of multilayer perceptron (MLP; alternatively called artificial neural networks (ANNs) or deep neural networks (DNNs) in different studies relevant to this review) [66], convolutional neural network (CNN) [66], RNN [66], GAN [67], and graph neural networks (GRNNs) [68]. Although GAN models are classified as unsupervised learning, they can be classified as PgNNs, in the context of this review article, because their underlying training is framed as a supervised learning problem [67,69]. A schematic representation of a sample PgNN architecture is illustrated in Fig. 2. Any physical problem includes a set of independent features or input features as x = [X1, X2, X3, …, Xn] and a set of dependent variables or desired outputs as y = [Y1, Y2, Y3, …, Yn]. Data describing this physical phenomenon can be generated by experimentation (e.g., sensor-based observation, etc.), closure laws (e.g., Fourier’s law, Darcy’s law, drag force), or the solution of governing ordinary differential equations (ODEs) and/or partial differential equations (PDEs), e.g., Burger’s equation, Navier–Stokes equations, and so on. The dependent variables and independent features thus comply with physics principles, and the trained neural network is guided inherently by physics throughout training.
In PgNNs, the neurons of each layer are linked to those of the subsequent layer through a set of weights. An activation function (such as ReLU, Tanh, Sigmoid, and Linear) is then applied to the weighted sum of the outputs of the neurons in the previous layer, along with an extra bias, to obtain the output of each node [70]. This procedure sequentially obtains the output of the neurons in each layer, starting with the input. This process is typically called forward propagation. A loss function (or, alternatively, a cost function) is subsequently defined and calculated in order to evaluate the accuracy of the prediction. Commonly used loss functions for regression are L1 [71] and mean-squared error (MSE) [71]. The next step in training involves error backpropagation, which calculates the partial derivatives/gradients of the cost function with respect to weights and biases (i.e., θ as shown in Fig. 2). Finally, an optimization technique, such as gradient descent, stochastic gradient descent, adaptive moment estimation (Adam), or mini-batch gradient descent [72,73], is used to minimize the loss function and simultaneously compute and update θ parameters using the calculated gradients from the backpropagation procedure. The process is iterated until the desired level of accuracy is obtained for a PgNN.
In recent years, PgNN has been widely adopted to accelerate computational fluid dynamics (CFD) [74], computational solid mechanics [75], and multifunctional material designs [76]. PgNN has been utilized in all computationally intensive and time-consuming aspects of scientific computing, including (i) preprocessing [74,77,78], such as mesh generation; (ii) discretization and modeling [79–81], such as finite difference method (FDM), finite volume method (FVM), finite element method (FEM), discrete element method, and molecular dynamics (MD); and (iii) postprocessing, such as output assimilation and visualization [82–84]. These studies are conducted to (i) train shallow networks on small datasets to replace a bottleneck (i.e., a computationally expensive step) in conventional forward numerical modeling, for example, the calculation of the drag coefficient in complex fluid flow modeling [28,85–88]; or (ii) train relatively deep networks on larger datasets generated for a particular problem, such as targeted sequence design within the coarse-grained polymer genome [89]. These networks take into account the physical principles underlying the training data and accelerate the simulation process [28,84].
Although PgNNs training appears to be straightforward, generating the data by tackling the underlying physics for complex physical problems could require substantial computational cost [7,15]. Once trained, a PgNN can significantly accelerate the computation speed for the phenomena of interest. It is worth noting that, while a PgNN model may achieve good accuracy on the training set based on numerous attempts, it is more likely to memorize the trends, noise, and detail in the training set rather than intuitively comprehend the pattern in the dataset. This is one of the reasons why PgNNs lose their prediction ability when inferred/tested outside the scope of the training datasets. The overfitting of PgNN can be mitigated in different ways [90–92] to enhance the predictability of the model within the scope of the training data. In the following subsections, we review the existing literature and highlight some of the most recent studies that applied PgNNs to accelerate different steps in scientific computing for applications in fluid and solid mechanics.
2.1 Preprocessing.
Preprocessing is often the most work-intensive component in scientific computing, regardless of the numerical model type (e.g., FEM, FDM, FVM). The main steps in this component are the disassembly of the domain into small, but finite, parts (i.e., mesh generation, evaluation, and optimization), and the upscaling and/or downscaling of the mesh properties to use a spatiotemporally coarse mesh while implicitly solving for unresolved fine-scale physics. These two steps are time consuming and require expert-level knowledge; hence, they are potential candidates to be replaced by accelerated PgNN-based models.
2.1.1 Mesh Generation.
Mesh generation is a critical step for numerical simulations. Zhang et al. [77] proposed the automatic generation of an unstructured mesh based on the prediction of the required local mesh density throughout the domain. For this purpose, an ANN was trained to guide a standard mesh generation algorithm. They also proposed extending the study to other architectures, such as CNN or GRNN, for future studies including larger datasets and/or higher-dimensional problems. Huang et al. [74] adopted a DL approach to identify optimal mesh densities. They generated optimized meshes using classical CFD tools (e.g., Simcenter STAR-CCM+ [93]) and proposed training a CNN to predict optimal mesh densities for arbitrary geometries. The addition of an adaptive mesh refinement version accelerated the overall process without compromising accuracy and resolution. The authors proposed learning optimal meshes (generated by the corresponding solvers with adjoint functionality) using ANN, which may be used as a starting point in other simulation tools irrespective of the specific numerical approach [74]. Wu et al. [78] also proposed a mesh optimization method by integrating the moving mesh method with DL in order to solve the mesh optimization problem. With the experiments carried out, a neural network with high accuracy was constructed to optimize the mesh while preserving the specified number of nodes and topology of the initial given mesh. By using this technique, they also demonstrated that the moving mesh algorithm is independent of the CFD computation [78].
In mesh generation, a critical issue has been the evaluation of mesh quality due to a lack of general and effective criteria. Chen et al. [94] presented a benchmark dataset (i.e., the NACA-Market reference dataset) to facilitate the evaluation of the quality of a mesh. They presented GridNet, a technique that uses a deep CNN to perform an automatic evaluation of the mesh quality. This method receives the mesh as input and performs the evaluation. The evaluation of mesh quality using a deep CNN model trained on the NACA-Market dataset proved to be viable with an accuracy of 92.5% [94].
2.1.2 Cross-Scaling Techniques.
It is always desirable to numerically solve a multiphysics problem on a spatiotemporally coarser mesh to minimize computational cost. For this reason, different upscaling [95,96], downscaling [97], and cross-scaling [98] methods have been developed to determine accurate numerical solutions to nonlinear problems across a broad range of length- and time-scales. One viable choice is to use a coarse mesh that reliably depicts long-wavelength dynamics and accounts for unresolved small-scale physics. Deriving the mathematical model (e.g., boundary conditions) for coarse representations, on the other hand, is relatively difficult. Bar-Sinai et al. [96] proposed a PgNN model to learn optimal PDE approximations based on actual solutions to known underlying equations. The ANN outputs spatial derivatives which are then optimized to best satisfy the equations on a low-resolution grid. Compared to typical discretization methods (e.g., finite difference), the recommended ANN method was considerably more accurate while integrating the set of nonlinear equations at a resolution that was four to eight times coarser [96]. The main challenge in this approach, however, is to systematically derive these kinds of solution-adaptive discrete operators. Maddu et al. [95] developed a PgNN, dubbed STENCIL-NET, for learning resolution-specific local discretization of nonlinear PDEs. By combining spatially and temporally adaptive parametric pooling on regular Cartesian grids with knowledge about discrete-time integration, STENCIL-NET can accomplish numerically stable discretization of the operators for any arbitrary nonlinear PDE. The STENCIL-NET model can also be used to determine PDE solutions over a spatiotemporal scale wider than the training dataset. In their article, the authors employed STENCIL-NET for long-term forecasting of chaotic PDE solutions on coarse spatiotemporal grids to test their hypothesis. Comparing the STENCIL-NET model with baseline numerical techniques (e.g., fully vectorized WENO [99]), the predictions on coarser grids were faster by up to 25–150 times on GPUs and 2–14 times on CPUs, while maintaining the same accuracy [95].
Table 1 reports a nonexhaustive list of recent works that leveraged PgNNs to accelerate the preprocessing part of scientific computing. These studies collectively concluded that PgNN can be successfully integrated to achieve a considerable speedup factor in mesh generation, mesh evaluation, and cross-scaling, which are vital for many complex problems explored using scientific computing techniques. Section 2.2 discusses the potential of PgNN to be incorporated into the modeling components, hence yielding a higher speedup factor or greater accuracy.
Area of application | NN type | Objective | Reference |
---|---|---|---|
Mesh generation | ANN | Generating unstructured mesh | [77] |
CNN | Predicting meshes with optimal density and accelerating meshing process without compromising performance or resolution | [74] | |
ANN | Generating high-quality tetrahedral meshes | [100] | |
ANN | Developing a mesh generator tool to produce high-quality FEM meshes | [101] | |
ANN | Generating finite element mesh with less complexities | [102] | |
Mesh evaluation | CNN | Conducting automatic mesh evaluation and quality assessment | [94] |
Mesh optimization | ANN | Optimizing mesh while retaining the same number of nodes and topology as the initially given mesh | [78] |
ANN | Enhancing mesh density for stress fields in plane-strain problems. | [103] | |
Cross-scaling | ANN | Utilizing data-driven discretization to estimate spatial derivatives that are tuned to best fulfill the equations on a low-resolution grid. | [96] |
ANN (STENCIL-NET) | Providing solution-adaptive discrete operators to predict PDE solutions on bigger spatial domains and for longer time frames than it was trained | [95] |
Area of application | NN type | Objective | Reference |
---|---|---|---|
Mesh generation | ANN | Generating unstructured mesh | [77] |
CNN | Predicting meshes with optimal density and accelerating meshing process without compromising performance or resolution | [74] | |
ANN | Generating high-quality tetrahedral meshes | [100] | |
ANN | Developing a mesh generator tool to produce high-quality FEM meshes | [101] | |
ANN | Generating finite element mesh with less complexities | [102] | |
Mesh evaluation | CNN | Conducting automatic mesh evaluation and quality assessment | [94] |
Mesh optimization | ANN | Optimizing mesh while retaining the same number of nodes and topology as the initially given mesh | [78] |
ANN | Enhancing mesh density for stress fields in plane-strain problems. | [103] | |
Cross-scaling | ANN | Utilizing data-driven discretization to estimate spatial derivatives that are tuned to best fulfill the equations on a low-resolution grid. | [96] |
ANN (STENCIL-NET) | Providing solution-adaptive discrete operators to predict PDE solutions on bigger spatial domains and for longer time frames than it was trained | [95] |
2.2 Modeling and Postprocessing
2.2.1 PgNNs for Fluid Mechanics.
PgNN has gained considerable attention from the fluid mechanics community. The study by Lee and Chen [104] on estimating fluid properties using ANN was among the first studies to apply PgNN to fluid mechanics. Since then, the application of PgNNs in fluid mechanics has been extended to a wide range of applications, e.g., laminar and turbulent flows, non-Newtonian fluid flows, aerodynamics, especially to speed up the traditional computational fluid dynamics (CFD) solvers.
For incompressible laminar flow simulations, the numerical procedure to solve the Navier–Stokes equations is considered the main bottleneck. To alleviate this issue, PgNNs have been used as part of the resolution process. For example, Yang et al. [105] proposed a novel data-driven projection method using an ANN to avoid iterative computation of the projection step in grid-based fluid simulations. The efficiency of the proposed data-driven projection method was shown to be significant, especially in large-scale fluid flow simulations. Tompson et al. [106] used a CNN to predict numerical solutions to the inviscid Euler equations for fluid flows. Unsupervised training that incorporates multiframe information was proposed to improve long-term stability. The CNN model produced very stable divergence-free velocity fields with improved accuracy compared to those obtained by the commonly used Jacobi method [107]. Chen et al. [108] later developed a U-net-based architecture, a particular case of a CNN model, for the prediction of velocity and pressure field maps around arbitrary 2D shapes in laminar flows. The CNN model is trained with a dataset composed of random shapes constructed using Bézier curves and then by solving the Navier–Stokes equations using a CFD solver. The predictive efficiency of the CNN model was also assessed on unseen shapes, using ad hoc error functions, specifically, the MSE levels for these predictions were found to be of the same order of magnitude as those obtained on the test subset, i.e., between 1.0 × 10−5 and 5.0 × 10−5 for both pressure and velocity, respectively.
Moving from laminar to turbulent flow regimes, PgNNs have also been widely used for the formulation of turbulence closure models [109]. For example, Ling et al. [110] utilized a feed-forward MLP and a specialized neural network to predict Reynolds-averaged Navier–Stokes (RANS) and large Eddy simulation (LES) turbulence problems. Their specialized neural network embeds Galilean invariance [111] using a higher-order multiplicative layer. The performance of this model was compared with that of MLP and ground truth simulations. They concluded that the specialized neural network can predict the anisotropy tensor on an invariant tensor basis, resulting in significantly more accurate predictions than MLP. Maulik et al. [112] presented a closure framework for subgrid modeling of Kraichnan turbulence [113]. To determine the dynamic closure strength, the proposed framework used an implicit map with inputs such as grid-resolved variables and eddy viscosities. Training an ANN with extremely subsampled data obtained from high-fidelity direct numerical simulations (DNSs) yields the optimal map. The ANN model was found to be successful in imbuing the decaying turbulence problem with dynamic kinetic energy dissipation, allowing accurate capture of coherent structures and inertial range fidelity. Later, Kim and Lee [114] used simple linear regression, SLinear, multiple linear regression, MLinear, and a CNN to predict turbulent heat transfer (i.e., the wall-normal heat flux, qw) using other wall information, including the streamwise wall-shear stress, spanwise wall-shear stress or streamwise vorticity, and pressure fluctuations, obtained by DNSs of a channel flow (see Fig. 3(a)). The constructed network was trained using adaptive moment estimation (ADAM) [115,116], and the grid searching method [117,118] was performed to optimize the depth and the width of the CNN. Their finding showed that the PgNN model is less sensitive to the input resolution, indicating its potential as a good heat flux model in turbulent flow simulation. Yousif et al. [119] also proposed an efficient method for generating turbulent inflow conditions based on a PgNN formed by a combination of a multiscale convolutional autoencoder with a subpixel convolution layer (MSCSP-AE) [120,121] and long short-term memory (LSTM) [122,123] model. The proposed model was found to have the capability to deal with spatial mapping of turbulent flow fields.
PgNNs have also been applied in the field of aerodynamics. Kou and Zhang [124] presented a review paper on typical data-driven methods, including system identification, feature extraction, and data fusion, that have been used to model unsteady aerodynamics. The efficacy of these data-driven methods is described by several benchmark cases in aeroelasticity. Wang et al. [125] described the application of ANN to the modeling of the swirling flow field in a combustor (see Fig. 3(b)). Swirling flow field data from particle image velocimetry (PIV) was used to train an ANN model. The trained PgNN model was successfully tested to predict the swirling flow field under unknown inlet conditions. Chowdhary et al. [126] studied the efficacy of combining ANN models with projection-based model reduction techniques [127,128] to develop an ANN-surrogate model for computationally expensive high-fidelity physics models, specifically for complex hypersonic turbulent flows. The surrogate model was used to perform a Bayesian estimation of the freestream conditions and parameters of the shear stress transport turbulence model. The surrogate model was then embedded in the high-fidelity (Reynolds-averaged Navier–Stokes) flow simulator, using shock-tunnel data. Siddiqui et al. [129] developed a nonlinear data-driven model, encompassing time delay neural networks, for a pitching wing. The pitch angle was considered as input to the model, while the lift coefficient was considered as output. The results showed that the trained models were able to capture the nonlinear aerodynamic forces more accurately than linear and semi-empirical models, especially at higher offset angles. Wang et al. [130] also proposed a multifidelity reduced-order model based on multitask learning ANNs to efficiently predict the unsteady aerodynamic performance of an iced airfoil. The results indicated that the proposed model achieves higher accuracy and better generalization capability compared to single-fidelity and single-task modeling approaches.
The simulation of complex fluid flows, specifically using fluids that exhibit viscoelastic nature and nonlinear rheological behaviors, is another topic where PgNNs have been applied [132–134]. The dynamics of these fluids is generally governed by nonlinear constitutive equations that lead to stiff numerical problems [135,136]. Faroughi et al. [28] developed a PgNN model to predict the drag coefficient of a spherical particle translating in viscoelastic fluids (see Fig. 3(c)). The PgNN considered a stacking technique (i.e., ensembling random forrest [137], extreme gradient boosting [138], and ANN models) to digest inputs (Reynolds number, Weissenberg number, viscosity ratio, and mobility factor considering both Oldroyd-B and Giesekus fluids), and outputs drag predictions based on individual learner’s predictions and an ANN meta-regressor. The accuracy of the model was successfully checked against blind datasets generated by DNSs. Lennon et al. [139] also developed a tensor basis neural network (TBNN) allowing rheologists to construct learnable constitutive models that incorporate essential physical information while remaining agnostic to details regarding particular experimental protocols or flow kinematics. The TBNN model incorporates a universal approximator within a materially objective tensorial constitutive framework that, by construction, respects physical constraints, such as frame-invariance and tensor symmetry, required by continuum mechanics. Due to the embedded TBNN, the developed rheological universal differential equation quickly learns simple yet accurate and highly general models to describe the provided training data, allowing for rapid discoery of constitutive equations.
Lastly, PgNNs have also been extensively used to improve both the accuracy and speed of CFD solvers. Stevens and Colonius [131] developed a DL model (the weighted essentially nonoscillatory neural network (WENO-NN)) to enhance a finite volume method used to discretize PDEs with discontinuous solutions such as the turbulence–shock wave interactions (see Fig. 3(d)). Kochkov et al. [22] used hybrid discretizations, combining CNNs and subcomponents of a numerical solver, to interpolate differential operators onto a coarse mesh with high accuracy. The training of the model was performed within a standard numerical method to solve the underlying PDEs as a differentiable program, and the method allows for end-to-end gradient-based optimization of the entire algorithm. The method learns accurate local operators for convective fluxes and residual terms and matches the accuracy of an advanced numerical solver running at 8–10 times finer resolution while performing the computation 40–80 times faster. Cai et al. [140] implemented a least-squares ReLU neural network (LSNN) to solve the linear advection–reaction problem with a discontinuous solution. They showed that the proposed method outperformed the mesh-based numerical methods in terms of the number of degrees-of-freedom. Haber et al. [141] suggested an autoencoder CNN to reduce the resolution cost of a scalar transport equation coupled to the Navier–Stokes equations. Lara and Ferrer [142] proposed to accelerate high-order discontinuous Galerkin methods using neural networks. The methodology and bounds were examined for a variety of meshes, polynomial orders, and viscosity values for the 1D Burgers’ equation. List et al. [143] employed CNN to train turbulence models to improve under-resolved, low-resolution solutions to the incompressible Navier–Stokes equations at simulation time. The developed method consistently outperforms simulations with a twofold higher resolution in both spatial and temporal dimensions. For mixing layer cases, the hybrid model on average resembles the performance of threefold reference simulations, which corresponds to a speedup of 7.0 times for the temporal layer and 3.7 times for the spatial mixing layer.
Table 2 reports a nonexhaustive list of recent studies that leveraged PgNN to model fluid flow problems. These studies collectively concluded that PgNNs can be successfully integrated with CFD solvers or used as standalone surrogate models to develop accurate and yet faster modeling components for scientific computing in fluid mechanics. In the next section, the potential application of PgNNs in computational solid mechanics is discussed.
Area of application | NN type | Objective | Reference |
---|---|---|---|
Laminar flows | CNN | Calculating numerical solutions to the inviscid Euler equations | [106] |
CNN | Predicting the velocity and pressure fields around arbitrary 2D shapes | [108] | |
Turbulent flows | ANN | Developing a model for the Reynolds stress anisotropy tensor using high-fidelity simulation data | [110] |
CNN | Designing and training artificial neural networks based on local convolution filters for LES | [144] | |
ANN | Developing subgrid modelling of Kraichnan turbulence | [112] | |
CNN | Estimating turbulent heat transfer based on other wall information acquired from channel flow DNSs | [114] | |
CNN-LSTM | Generating turbulent inflow conditions with accurate statistics and spectra | [119] | |
Aerodynamics | CNN-MLP | Predicting incompressible laminar steady flow field over airfoils | [145] |
ANN | Modeling the swirling flow field in a combustor | [125] | |
PCA-ANN | Creating surrogate models of computationally expensive, high-fidelity physics models for complex hypersonic turbulent flows | [126] | |
ANN | Predicting unsteady aerodynamic performance of iced airfoil | [130] | |
Viscoelastic flows | ANN | Predicting drag coefficient of a spherical particle translating in viscoelastic fluids | [28] |
ANN | Constructing learnable constitutive models using a universal approximator within a materially objective tensorial constitutive framework | [139] | |
Enhance CFD solvers | ANN | Developing an improved finite volume method for simulating PDEs with discontinuous solutions | [131] |
CNN | Interpolating differential operators onto a coarse mesh with high accuracy | [22] | |
LSNN | Solving the linear advection–reaction problem with discontinuous solution | [140] | |
CNN | Accelerating high-order discontinuous Galerkin methods | [142] | |
CNN | Developing turbulence model to improve under-resolved low-resolution solutions to the incompressible Navier–Stokes equations at simulation time | [143] | |
ANN | Estimating the fluid field for a steady-state hydraulic valve | [146] |
Area of application | NN type | Objective | Reference |
---|---|---|---|
Laminar flows | CNN | Calculating numerical solutions to the inviscid Euler equations | [106] |
CNN | Predicting the velocity and pressure fields around arbitrary 2D shapes | [108] | |
Turbulent flows | ANN | Developing a model for the Reynolds stress anisotropy tensor using high-fidelity simulation data | [110] |
CNN | Designing and training artificial neural networks based on local convolution filters for LES | [144] | |
ANN | Developing subgrid modelling of Kraichnan turbulence | [112] | |
CNN | Estimating turbulent heat transfer based on other wall information acquired from channel flow DNSs | [114] | |
CNN-LSTM | Generating turbulent inflow conditions with accurate statistics and spectra | [119] | |
Aerodynamics | CNN-MLP | Predicting incompressible laminar steady flow field over airfoils | [145] |
ANN | Modeling the swirling flow field in a combustor | [125] | |
PCA-ANN | Creating surrogate models of computationally expensive, high-fidelity physics models for complex hypersonic turbulent flows | [126] | |
ANN | Predicting unsteady aerodynamic performance of iced airfoil | [130] | |
Viscoelastic flows | ANN | Predicting drag coefficient of a spherical particle translating in viscoelastic fluids | [28] |
ANN | Constructing learnable constitutive models using a universal approximator within a materially objective tensorial constitutive framework | [139] | |
Enhance CFD solvers | ANN | Developing an improved finite volume method for simulating PDEs with discontinuous solutions | [131] |
CNN | Interpolating differential operators onto a coarse mesh with high accuracy | [22] | |
LSNN | Solving the linear advection–reaction problem with discontinuous solution | [140] | |
CNN | Accelerating high-order discontinuous Galerkin methods | [142] | |
CNN | Developing turbulence model to improve under-resolved low-resolution solutions to the incompressible Navier–Stokes equations at simulation time | [143] | |
ANN | Estimating the fluid field for a steady-state hydraulic valve | [146] |
2.2.2 PgNNs for Solid Mechanics.
PgNNs have also been extensively adopted by the computational solid mechanics community [147]. The study by Andersen et al. [41] on welding modeling using ANN was among the first studies that applied PgNN to solid mechanics. Since then, the application of PgNN has been extended to a wide range of problems, e.g., structural analysis, topology optimization, inverse materials design and modeling, health condition assessment, especially to speed up the traditional forward and inverse modeling methods in computational mechanics.
In the area of structural analysis, Tadesse et al. [148] proposed an ANN to predict the mid-span deflections of a composite bridge with flexible shear connectors. The ANN was tested on six different bridges, yielding a maximum root-mean-square error (RMSE) of 3.79%, which can be negligible in practice. In addition, the authors developed closed-form solutions based on ANN that can be used to quickly predict deflection in everyday design. In a study by Güneyisi et al. [149], ANN was employed to develop a new formulation for the flexural overstrength factor for steel beams, using 141 experimental data samples with different cross-sectional typologies for model training. The model achieved a comparable training and testing accuracy of 99%, indicating its reliability in estimating beams’ over-strength. Hung et al. [150] used ANN to predict the ultimate load factor of a nonlinear, inelastic steel truss, using the cross sections of the members as input and load factor as output. A planar 39-bar steel truss was used to demonstrate the efficiency of the proposed ANN. The ANN-based model yielded a high degree of accuracy, with an average loss of less than 0.02, in predicting the ultimate load factor of the nonlinear inelastic steel truss. Chen et al. [151] also used ANN to solve a three-dimensional (3D) inverse problem of a collision between an elastoplastic hemispherical metal shell and a rigid impactor. The goal was to predict the position, velocity, and duration of the collision based on the permanent plastic deformation of the shell. For static and dynamic loading, the ANN model predicted location, velocity, and collision duration with high accuracy. Hosseinpour et al. [152] used PgNN to assess the buckling capacity of castellated steel beams subjected to lateral-distortional buckling. As shown in Fig. 4(a), the ANN-based model provided higher accuracy than well-known design codes such as AS4100 [153], AISC [154], and EC3 [155] to model and predict the ultimate moment capacities.
PgNNs have also been utilized in the field of topology optimization for materials and meta-materials [156,157]. Topology optimization is a method that identifies the optimal placement of materials within a defined space to achieve the best structural performance [158,159]. Researchers have developed various PgNN models for this purpose; for instance, Abueidda et al. [160] developed a CNN model that performs real-time topology optimization of linear and nonlinear elastic materials under small and large deformations. The trained model can accurately predict optimal designs without an iterative process and with low inference computation time. Another two-stage approach was proposed by Yu et al. [161], which integrates a CNN-based encoder and decoder in the first stage and a conditional GAN in the second stage to determine near-optimal topological designs. The resulting model can efficiently produce near-optimal structures in terms of pixel values and compliance with significantly reduced computational time. Banga et al. [162] proposed a 3D encoder–decoder CNN for speeding up 3D topology optimization and determining optimal computational strategies for deployment. The proposed model was found to reduce the overall computation time by 40% while achieving an accuracy range of 96%. Lastly, Li et al. [163] presented a GAN-based noniterative near-optimal topology optimizer for conductive heat transfer structures trained on black-and-white density distributions. They combined a GAN for low-resolution topology with a super-resolution generative adversarial network (SRGAN) [164,165], for high-resolution topology solutions in a two-stage hierarchical prediction–refinement pipeline. Compared to conventional topology optimization techniques, this strategy demonstrated clear advantages in terms of computational cost and efficiency.
PgNN has also been applied for inverse design and modeling in solid mechanics. Messner [166] employed a CNN to develop surrogate models that estimate the effective mechanical properties of periodic composites. As an example, the CNN-based model was applied to solve the inverse design problem of finding structures with optimal mechanical properties. The surrogate models were in good agreement with well-established topology optimization methods, such as solid isotropic material with penalization [167], and were sufficiently accurate to recover optimal solutions for topology optimization. Lininger et al. [168] also used CNN to solve an inverse design problem for meta-materials made of thin film stacks. The authors demonstrated the CNN’s remarkable ability to explore the large global design space (up to 1012 parameter combinations) and resolve all relationships between meta-material structure and associated ellipsometric and reflectance/transmittance spectra [168,169]. Kumar et al. [170] proposed a two-stage ANN model, as shown in Fig. 4(b), for inverse design of meta-materials. The model generates uniform and functionally graded cellular mechanical meta-materials with tailored anisotropic stiffness and density for spinodoid topologies. The ANN model used in this study is a combination of two-stage ANN, first ANN (i.e., inverse PgNN) takes query stiffness as input and output design parameters, e.g., Θ. The second ANN (i.e., forward PgNN) takes the predicted design parameters as input and reconstructs the stiffness to verify the first ANN results. The prediction accuracy for stiffness and the design parameter was validated against ground truth data for both networks; sample comparisons and their corresponding R2 values are shown in Fig. 4(b). Ni and Gao [171] proposed a combination of representative sampling spaces and conditional GAN and cGAN [172,173], to address the inverse problem of modulus identification in the field of elasticity. They showed that the proposed approach can be deployed with high accuracy, as shown in Fig. 4(c) while avoiding the use of costly iterative solvers used in conventional methods, such as the adjoint weighted approach [174]. This model is especially suitable for real-time elastography and high-throughput nondestructive testing techniques used in geological exploration, quality control, and composite material evaluation.
The PgNN models have also been used to overcome some of the computational limitations of multiscale simulations in solid mechanics. This is achieved by (i) bypassing the costly lower-scale calculations and thereby speeding the macro-scale simulations [75], or (ii) replacing a step or the complete simulation with surrogate models [75]. For example, Liang et al. [175] developed an ANN model that takes finite element-based aorta geometry as input and the aortic wall stress distribution as output directly, bypassing FEM calculation. The difference between the stress calculated by FEM and the one estimated by the PgNN model is practically negligible, while the PgNN model produces the output in just a fraction of the FEM computational time. Mozaffar et al. [176] successfully employed RNN-based surrogate models for material modeling by learning the reversible, irreversible, and history-dependent phenomena that occur when studying material plasticity. Mianroodi et al. [2] used a CNN-based solver to predict local stresses in heterogeneous solids with highly nonlinear material response and mechanical contrast features. Compared to common solvers such as FEM, the CNN-based solver offered an acceleration factor of 8300x for elastoplastic materials. Im et al. [6] proposed a PgNN framework to construct a surrogate model for a high-dimensional elasto-plastic FEM model by integrating an LSTM network with the proper orthogonal decomposition (POD) method [177,178]. The suggested POD-LSTM surrogate model allows rapid, precise, and reliable predictions of elasto-plastic structures based on the provided training dataset exclusively. For the first time, Long et al. [179] used a CNN to estimate the stress intensity factor of planar cracks. Compared to FEM, the key benefit of the proposed lightweight CNN-based crack evaluation methodology is that it can be installed on an unmanned machine to automatically monitor the severity of a crack in real time.
Table 3 reports a nonexhaustive list of recent studies that leveraged PgNNs in solid mechanics and materials design problems. These studies collectively concluded that PgNNs can be successfully integrated with conventional solvers (e.g., FEM solvers) or used as standalone surrogate models to develop accurate and yet faster modeling components for scientific computing in solid mechanics. However, PgNNs come with their own limitations and shortcomings that might compromise solutions under different conditions, as discussed in the next section.
Area of application | NN type | Objective | Reference |
---|---|---|---|
Accelerating simulations | ANN | Predicting the aortic wall stress distribution using FEM aorta geometry | [175] |
RNN | Developing surrogate models for material modeling by learning reversible, irreversible, and history-dependent phenomena | [176] | |
CNN | Predicting local stresses in heterogeneous solids with the highly nonlinear material response and mechanical contrast features | [2] | |
CNN | Estimating stress intensity factor of planar cracks | [179] | |
Topology optimization | CNN | Optimizing topology of linear and nonlinear elastic materials under large and small deformations | [160] |
CNN-GAN | Determining near-optimal topological design | [161] | |
CNN | Accelerating 3D topology optimization | [162] | |
GAN-SRGAN | Generating near-optimal topologies for conductive heat transfer structures | [163] | |
Inverse modeling | CNN | Estimating effective mechanical properties for periodic composites | [166] |
CNN | Solving an inverse design problem for meta-materials made of thin film stacks | [168] | |
cGAN | Addressing inverse problem of modulus identification in elasticity | [171] | |
CVAE | Designing nano-patterned power splitters for photonic integrated circuits | [171] | |
Structural elements | ANN | Predicting nonlinear buckling load of an imperfect reticulated shell | [180] |
ANN | Optimizing dynamic behavior of thin-walled laminated cylindrical shells | [181] | |
ANN | Determining and identifying loading conditions for shell structures | [151] | |
RNN | Identifying and classifying surface defects in industrial steel | [182] | |
Structural analysis | CNN | Forecasting stress fields in 2D linear elastic cantilevered structures subjected to external static loads | [183] |
ANN | Estimating the thickness and length of reinforced walls based on previous architectural projects | [184] | |
Condition assessment | Autoencoder-NN | Learning mapping between vibration characteristics and structural damage | [185] |
CNN | Providing a real-time crack assessment method | [186] | |
RNN | Nonparametric identification of large civil structures subjected to dynamic loadings | [187] | |
CNN | Damage Identification of truss structures using noisy incomplete modal data | [188] |
Area of application | NN type | Objective | Reference |
---|---|---|---|
Accelerating simulations | ANN | Predicting the aortic wall stress distribution using FEM aorta geometry | [175] |
RNN | Developing surrogate models for material modeling by learning reversible, irreversible, and history-dependent phenomena | [176] | |
CNN | Predicting local stresses in heterogeneous solids with the highly nonlinear material response and mechanical contrast features | [2] | |
CNN | Estimating stress intensity factor of planar cracks | [179] | |
Topology optimization | CNN | Optimizing topology of linear and nonlinear elastic materials under large and small deformations | [160] |
CNN-GAN | Determining near-optimal topological design | [161] | |
CNN | Accelerating 3D topology optimization | [162] | |
GAN-SRGAN | Generating near-optimal topologies for conductive heat transfer structures | [163] | |
Inverse modeling | CNN | Estimating effective mechanical properties for periodic composites | [166] |
CNN | Solving an inverse design problem for meta-materials made of thin film stacks | [168] | |
cGAN | Addressing inverse problem of modulus identification in elasticity | [171] | |
CVAE | Designing nano-patterned power splitters for photonic integrated circuits | [171] | |
Structural elements | ANN | Predicting nonlinear buckling load of an imperfect reticulated shell | [180] |
ANN | Optimizing dynamic behavior of thin-walled laminated cylindrical shells | [181] | |
ANN | Determining and identifying loading conditions for shell structures | [151] | |
RNN | Identifying and classifying surface defects in industrial steel | [182] | |
Structural analysis | CNN | Forecasting stress fields in 2D linear elastic cantilevered structures subjected to external static loads | [183] |
ANN | Estimating the thickness and length of reinforced walls based on previous architectural projects | [184] | |
Condition assessment | Autoencoder-NN | Learning mapping between vibration characteristics and structural damage | [185] |
CNN | Providing a real-time crack assessment method | [186] | |
RNN | Nonparametric identification of large civil structures subjected to dynamic loadings | [187] | |
CNN | Damage Identification of truss structures using noisy incomplete modal data | [188] |
2.3 PgNNs’ Limitations.
Even though PgNN-based models show great potential to accelerate the modeling of nonlinear phenomena described by input–output interdependencies, they suffer from several critical limitations and shortcomings. Some of these limitations become more pronounced when training datasets are sparse.
The main limitation of PgNNs is the fact that their training process is based solely on statistics [66]. Even though the training datasets are inherently constrained by physics (e.g., developed by direct numerical simulation, closure laws, and de-noised experimentation), PgNN generates models based on correlations in statistical variations. The outputs (predictions), thus, are naturally physics agnostic [44,189] and may violate the underlying physics [7].
Another important limitation of PgNNs stems from the fact that training datasets are usually sparse, especially in the scientific fields discussed in this article. When the training data are sparse and does not cover the entire range of underlying physiological attributes, PgNN-based models fail in blind testing on conditions outside the scope of training [50,190], i.e., they do not offer extrapolation capabilities in terms of spatiotemporal variables and/or other physical attributes.
The predictions of PgNN might be severely compromised, even for inputs within the scope of sparse training datasets [28]. The lack of interpolation capabilities is more pronounced in complex and nonlinear problems where the range of the physicochemical attributes is extremely wide (e.g., the range of Reynolds numbers from creeping flow to turbulent flow).
PgNNs may not fully satisfy the initial conditions and boundary conditions using which the training datasets are generated [44]. The boundary conditions and computational domain vary from one problem to another, making the data generation and training process prohibitively costly. In addition, a significant portion of scientific computing research involves inverse problems in which unknown physicochemical attributes of interest are estimated from measurements or calculations that are only indirectly related to these attributes [11,15,191,192]. For example, in groundwater flow modeling, we leverage measurements of the pressure of a fluid immersed in an aquifer to estimate the geometry and/or material characteristics of the aquifer [193]. Such requirements further complicate the process of developing a simple neural network that is predictive under any conditions.
PgNN-based models are not resolution invariant by construction [194], and hence, they cannot be trained on a lower resolution and be directly inferred on a higher resolution. This shortcoming is due to the fact that PgNN is only designed to learn the solution of physical phenomena for a single instance (i.e., inputs–outputs).
Through the training process, PgNN-based networks learn the input–output interdependencies across the entire dataset. Such a process could potentially consider slight variations in the functional dependencies between different input and output pairs as noise and produce an average solution. Consequently, while these models are optimal with respect to the entire dataset, they may produce suboptimal results in individual cases.
PgNN models may struggle to learn the underlying process when the training dataset is diverse, i.e., when the interdependencies between different input and output pairs are drastically different [195]. Although this issue can be mitigated by increasing the model size, more data are required to train such a network, making the training costly and, in some cases, impractical.
One way to resolve some of the PgNNs’ limitations is to generate more training data. However, this is not always a feasible solution due to the high cost of data acquisition. Alternatively, PgNNs can be further constrained by governing physical laws without any prior assumptions, reducing the need for large datasets. The latter is a plausible solution because, in most cases, the physical phenomenon can be fully and partially described using explicit ODEs, PDEs, and/or closure laws. This approach led to the development of the PiNN [44,51], which is described and reviewed in Sec. 3.
3 Physics-Informed Neural Networks
In scientific computing, physical phenomena are frequently represented mathematically using a strong form consisting of governing differential equations together with initial and boundary conditions. The strong form sets out the constraints that a solution must meet at every point within a domain. The governing equations are often nonlinear or linear PDEs and/or ODEs. Some PDEs, such as the Navier–Stokes equations for fluid flows, and the Föppl–von Kármán equations for solids experiencing large deflections, are particularly challenging to solve [11,196]. There are also other crucial examples of PDEs, including the heat equation [197], wave equation [198], Burgers’ equation [199], Laplace’s equation [200], and Poisson’s equation [201]. This vast body of knowledge can be employed to provide additional constraints on PgNNs while training on available data points [44], if any. To this end, mesh-free PiNNs have been developed [44,51], quickly extended [202–204], and extensively deployed in a variety of scientific and applied fields [205–209]. The readers are referred to the studies by Karniadakis et al. [7] and Cai et al. [11] for the foundational review on how PiNNs function. This section briefly reviews PiNN’s core architecture and its state-of-the-art applications in computational fluid and solid mechanics and discusses some of the major limitations.
The PiNN architecture, as depicted in Fig. 5, incorporates the underlying physics outside the neural network to impose known physical laws while training, resulting in outputs that adhere to known physical laws. The widely adopted strategy to address this problem involves the use of a weakly imposed penalty loss to penalize the network if it fails to comply with the physical constraints during training. The PiNN (see Fig. 5) takes spatiotemporal features such as x and t as input parameters and outputs PDE solution elements, i.e., u, thus making it capable of emulating any PDE. The remaining components of the PiNN architecture, including the hidden layers and activation function, are similar to those of PgNN.
The network’s outputs are then fed into the next layer, which is an automated differentiation layer. In this case, multiple partial derivatives are generated by differentiating the outputs with respect to the input parameters (x and t). With the goal of optimizing the PDE solution, these partial derivatives are used to generate the required terms in the loss function. The loss function in PiNN is a combination of the loss due to labeled data (), governing PDEs (), applied initial conditions () and applied boundary conditions () [11]. The ensures that PiNN’s solution meets the specified boundary constraints, whereas ensures that the PiNN follows the trend in the training dataset (i.e., historical data, if any). Furthermore, the structure of the PDE is enforced in PiNN through the , which specifies the collocation points where the solution to the PDE holds [44]. The weights for the loss due to the initial conditions, the boundary conditions, the data, and the PDE can be specified as wi, wb, wd, and wp, respectively. The next step is to check, for a given iteration, if the loss is within the accepted tolerance, . If not, the learnable parameters of the network (θ) and the unknown PDE parameters (λ) are updated through error backpropagation. For a given number of iterations, the entire cycle is repeated until the PiNN model produces learnable parameters with loss functions less than . Note that the training of PiNNs is more complicated compared to PgNNs, as PiNNs are composed of sophisticated nonconvex and multi-objective loss functions that can result in instability during optimization [7,11,44].
Dissanayake and Phan-Thien [210] were the first to investigate the incorporation of prior knowledge into a neural network. Subsequently, Owhadi [211] introduced the concept of physics-informed learning models as a result of the everincreasing computing power, which enables the use of increasingly complex networks with more learnable parameters and layers. The PiNN, as a new computing paradigm for forward and inverse modeling, was introduced by Raissi et al. in a series of papers [44,51,212]. Raissi et al. [44] deployed two PiNN models, a continuous- and discrete-time model, on examples consisting of different boundary conditions, critical nonlinearities, and complex-valued solutions such as Burgers’, Schrodinger’s, and Allen-Cahn’s equations. The results for Burgers’ equation demonstrated that, given a sufficient number of collocation points (i.e., as the basis for the continuous model), an accurate and data-efficient learning procedure can be obtained [44].
In continuous PiNN models, when dealing with higher-dimensional problems, the number of collocation points increases exponentially, making learning processing difficult and computationally expensive [7,44]. Raissi et al. [44] presented a discrete-time model based on the Runge–Kutta technique [213] to address the computational cost issue. This model simply takes a spatial feature as input, and over time-steps, PiNN converges to the underlying physics. For all the examples explored by Raissi et al. [44], continuous and discrete PiNN models were able to satisfactorily build physics-informed surrogate models. Nabian et al. [214] proposed an alternate method for managing collocation points. They investigated the effect of sampling collocation points according to distribution and discovered that it was proportional to the loss function. This concept does not require additional hyperparameters and is simpler to deploy in existing PiNN models. In their study, they claimed that a sampling approach for collocation points enhanced the behavior of the PiNN model during training. The results were validated by deploying the hypothesis on PDEs to solve problems related to elasticity, diffusion, and plane stress physics. Yang et al. [215] proposed Bayesian physics-informed neural network (B-PiNN) for addressing both forward and inverse nonlinear problems governed by PDEs and noisy data. Within this Bayesian framework, a Bayesian neural network is integrated with a PiNN for PDEs to act as the prior, while either Hamiltonian Monte Carlo or variational inference methods are employed as estimators of the posterior. B-PINNs can leverage both the constraints of physical laws and the scattered noisy measurements to provide predictions and quantify the aleatoric uncertainty stemming from the noisy data within the Bayesian framework.
In order to use PiNN to handle inverse problems, the loss function of the deep neural network must satisfy both measured and unknown values at a collection of collocation sites distributed throughout the problem domain. Raissi et al. [51] showed the potential of both continuous and discrete-time PiNN models to solve benchmark inverse problems such as the propagation of nonlinear shallow-water waves (Korteweg–De Vries equation) [216] and incompressible fluid flows (Navier–Stokes equations) [217].
Compared to PgNNs, PiNN models provide more accurate predictions for forward and inverse modeling, particularly in scenarios with high nonlinearities, limited data, or noisy data [218]. As a result, it has been implemented in several fundamental scientific and applied fields. In addition to forward and inverse problems, the PiNN can also be used to develop partial differential equations for unknown phenomena if training data representing the underlying physics of the phenomenon are available [51]. Raissi et al. [51] leveraged both continuous- and discrete-time PiNN models to generate universal PDEs depending on the type and the structure of the available data. In the remainder of this section, we review the recent literature on PiNN’s applications in the computational fluid and solid mechanics fields.
3.1 PiNNs for Fluid Mechanics.
The application of PiNNs to problems involving fluid flow is an active and ongoing field of study [219,220]. Raissi et al. [212], in a seminal work, proposed the hidden fluid mechanics (HFM) method, which uses PiNNs to encode the governing physical laws of fluid motion, specifically Navier–Stokes equations. By leveraging conservation laws, they extracted hidden variables of interest, such as velocity and pressure fields, from spatiotemporal data visualizations of a passive scalar concentration (such as dye) transported in complex domains. Their PiNN-based algorithm can solve the data assimilation problem without requiring prior knowledge of the geometry, initial, or boundary conditions. The HFM model has demonstrated successful predictions of 2D and 3D velocity and pressure fields in benchmark problems inspired by real-world fluid flow applications. Figure 6, adapted from the study by Raissi et al. [212], compares the prediction of PiNN with the ground truth for the classical problem of a 2D flow past a cylinder. The model can be used to extract valuable quantitative information such as wall-shear stresses and lift and drag forces, for which direct measurements are difficult to obtain.
Zhang et al. [221] also developed a PiNN framework for incompressible fluid flow past a cylinder governed by the Navier–Stokes equations. PiNN learns the relationship between simulation output (i.e., velocity and pressure) and the underlying geometry, boundary, initial conditions, and inherently fluid properties. They showed that generalization performance is enhanced in both the temporal domain and the design space by including Fourier features [222], such as frequency and phase offset parameters. Cheng and Zhang [223] developed Res-PiNN (i.e., Resnet blocks along with PiNN) to simulate cavity flow and flow past a cylinder governed by Burgers’ and Navier–Stokes equations. Their results showed that Res-PiNN had better predictive ability than conventional PgNN and vanilla PiNN algorithms. Lou et al. [224] also demonstrated the potential of PiNN to solve inverse multiscale flow problems. They used PiNN for inverse modeling in both the continuum and rare-field regimes represented by the Boltzmann–Bhatnagar-Gross–Krook (BGK) collision model. The results showed that PiNN-BGK is a unified method (i.e., it can be used for forward and inverse modeling), easy to implement, and effective in solving ill-posed inverse problems [224]. Wessels et al. [73] utilized PiNN to develop an updated Lagrangian method for the solution of incompressible free surface flow governed by the inviscid Euler equations, the so-called neural particle method (NPM). The NPM algorithm does not necessitate any specific algorithmic adjustments, which are typically required to accurately address the incompressibility constraint. In their work, it was demonstrated that NPM is capable of accurately computing a pressure field that satisfies the incompressibility condition while avoiding topological constraints in the discretization process [73]. In addition, PiNN has also been employed to model complex non-Newtonian fluid flows involving nonlinear constitutive PDEs able to characterize fluid’s rheological behavior [225].
Haghighat et al. [226] trained a PiNN model to solve the dimensionless form of the governing equations of coupled multiphase flow and deformation in porous media. Almajid and Abu-Al-Saud [227] compared the predictions of PiNN with those of PgNN, i.e., a conventional artificial neural network, for solving the gas drainage problem of water-filled porous media. The study showed that PgNN performs well under certain conditions (i.e., when the observed data consist of early and late time saturation profiles), while the PiNN model performs robustly even when the observed data contain only an early time saturation profile (where extrapolations are needed). Depina et al. [228] applied PiNN to model unsaturated groundwater flow problems governed by the Richards PDE and van Genuchten constitutive model [229]. They demonstrated that PiNNs can efficiently estimate the van Genuchten model parameters and solve the inverse problem with a relatively accurate approximation of the solution to the Richards equation.
Some of the other variants of PiNN models employed in fluid mechanics are as follows: nn-PiNN, where PiNN is employed to solve constitutive models in conjunction with conservation of mass and momentum for non-Newtonian fluids [225]; ViscoelasticNet, where PiNN is used for stress discovery and viscoelastic flow models selection [230], such as Oldroyd-B [135], Giesekus, and Linear Phan-Thien-Tanner (PTT) [231]; RhINN, which is a rheology-informed neural networks employed to solve constitutive equations for a thixotropic-elasto-visco-plastic complex fluid for a series of flow protocols [205]; CAN-PiNN, which is a coupled-automatic-numerical differential framework that combines the benefits of numerical differentiation (ND) and AD for robust and efficient training of PiNN [232]; ModalPiNN, which is a combination of PiNN with enforced truncated Fourier decomposition [233] for periodic flow reconstruction [234]; GA-PiNN, which is a geometry aware PiNN consisted of variational autoencoder, PiNN and boundary constraining network for real-world applications with irregular geometries without parameterization [235]; Spline-PiNN, which is a combination of PiNN and hermite spline kernels-based CNN employed to train a PiNN without any precomputed training data and provide fast, continuous solutions that generalize to unseen domains [236]; conservative PiNN (cPiNN), which is a conservative physics-informed neural network consisting of several PiNNs communicating through the subdomain interfaces flux continuity for solving conservation laws [202]; SA-PiNN, which is a self-adaptive PiNN to address the adaptive procedures needed to force PiNN to fit accurately the stubborn spots in the solution of stiff PDEs [57]; and XPiNN, which is an extended PiNN to enhance the representation and parallelization capacity of PiNN and generalization to any type of PDEs with respect to cPINN [203].
Table 4 reports a nonexhaustive list of recent studies that leveraged PiNN to model fluid flow problems. Furthermore, Table 5 reports a nonexhaustive list of recent studies that developed other variants of PiNN architectures to improve the overall prediction accuracy and computational cost in fluid flow problems.
Area of application | Objectives | Reference |
---|---|---|
Incompressible flows | Accelerating the modeling of Navier–Stokes equations to infer the solution for various 2D and 3D flow problems | [51] |
Learning the relationship between output and underlying geometry as well as boundary conditions | [221] | |
Simulating ill-posed (e.g., lacking boundary conditions) or inverse laminar and turbulent flow problems | [237] | |
Turbulent flows | Solving vortex-induced and wake-induced vibration of a cylinder at high Reynolds number | [238] |
Simulating turbulent incompressible flows without using any specific model or making turbulence assumptions | [239] | |
Reconstructing Reynolds stress disparities described by Reynolds-averaged Navier–Stokes equations | [240] | |
Geofluid flows | Solving well-based groundwater flow equations without utilizing labeled data | [241] |
Predicting high-fidelity multiphysics data from low-fidelity fluid flow and transport phenomena in porous media | [242] | |
Estimating Darcy’s law-governed hydraulic conductivity for both saturated and unsaturated flows | [243] | |
Solving solute transport problems in homogeneous and heterogeneous porous media governed by the advection–dispersion equation | [56] | |
Predicting fluid flow in porous media by sparse observations and physics-informed PointNet | [244] | |
Non-Newtonian flows | Solving systems of coupled PDEs adopted for non-Newtonian fluid flow modeling | [225] |
Simulating linear viscoelastic flow models such as Oldroyd-B, Giesekus, and Linear PTT | [230] | |
Simulating direct and inverse solutions of rheological constitutive models for complex fluids | [205] | |
Biomedical flows | Enabling the seamless synthesis of noninvasive in vivo measurement techniques and computational flow dynamics models derived from first physical principles | [245] |
Enhancing the quantification of near-wall blood flow and wall-shear stress arterial in diseased arterial flows | [246] | |
Supersonic flows | Solving inverse supersonic flow problems involving expansion and compression waves | [220] |
Surface water flows | Solving ill-posed strongly nonlinear and weakly dispersive surface water waves governed by Serre–Green–Naghdi equations using only data of the free surface elevation and the depth of the water | [247] |
Area of application | Objectives | Reference |
---|---|---|
Incompressible flows | Accelerating the modeling of Navier–Stokes equations to infer the solution for various 2D and 3D flow problems | [51] |
Learning the relationship between output and underlying geometry as well as boundary conditions | [221] | |
Simulating ill-posed (e.g., lacking boundary conditions) or inverse laminar and turbulent flow problems | [237] | |
Turbulent flows | Solving vortex-induced and wake-induced vibration of a cylinder at high Reynolds number | [238] |
Simulating turbulent incompressible flows without using any specific model or making turbulence assumptions | [239] | |
Reconstructing Reynolds stress disparities described by Reynolds-averaged Navier–Stokes equations | [240] | |
Geofluid flows | Solving well-based groundwater flow equations without utilizing labeled data | [241] |
Predicting high-fidelity multiphysics data from low-fidelity fluid flow and transport phenomena in porous media | [242] | |
Estimating Darcy’s law-governed hydraulic conductivity for both saturated and unsaturated flows | [243] | |
Solving solute transport problems in homogeneous and heterogeneous porous media governed by the advection–dispersion equation | [56] | |
Predicting fluid flow in porous media by sparse observations and physics-informed PointNet | [244] | |
Non-Newtonian flows | Solving systems of coupled PDEs adopted for non-Newtonian fluid flow modeling | [225] |
Simulating linear viscoelastic flow models such as Oldroyd-B, Giesekus, and Linear PTT | [230] | |
Simulating direct and inverse solutions of rheological constitutive models for complex fluids | [205] | |
Biomedical flows | Enabling the seamless synthesis of noninvasive in vivo measurement techniques and computational flow dynamics models derived from first physical principles | [245] |
Enhancing the quantification of near-wall blood flow and wall-shear stress arterial in diseased arterial flows | [246] | |
Supersonic flows | Solving inverse supersonic flow problems involving expansion and compression waves | [220] |
Surface water flows | Solving ill-posed strongly nonlinear and weakly dispersive surface water waves governed by Serre–Green–Naghdi equations using only data of the free surface elevation and the depth of the water | [247] |
PiNN structure | Objective | Reference |
---|---|---|
CAN-PiNN | Providing a PiNN with more accuracy and efficient training by integrating ND- and AD-based approaches | [232] |
ModalPiNN | Providing a simpler representation of PiNN for oscillating phenomena to improve performance with respect to sparsity, noise, and lack of synchronization in the data | [234] |
GA-PiNN | Enhancing PiNN to develop a parameter-free, irregular geometry-based surrogate model for fluid flow modeling | [235] |
Spline-PiNN | Improving the generalization of PiNN by combining it with Hermite splines CNN to solve the incompressible Navier–Stokes equations | [236] |
cPiNN | Enhancing PiNN to solve high-dimensional nonlinear conservation laws requiring high computational and memory requirements | [202] |
SA-PiNN | Improving the PiNN’s convergence and accuracy problem for stiff PDEs using self-adaptive weights in the training | [57] |
XPiNN | Improving PiNN and cPiNN in terms of generalization, representation, parallelization capacity, and computational cost | [203] |
PiPN | Using point cloud-based neural network to incorporate geometric features of computational domains in PiNN and generalize the solution to unseen domains | [248] |
PiGAN | Integrating a physics-informed loss function into SRGAN to address multiphase turbulent fluid flow super-resolution | [249] |
PiNN structure | Objective | Reference |
---|---|---|
CAN-PiNN | Providing a PiNN with more accuracy and efficient training by integrating ND- and AD-based approaches | [232] |
ModalPiNN | Providing a simpler representation of PiNN for oscillating phenomena to improve performance with respect to sparsity, noise, and lack of synchronization in the data | [234] |
GA-PiNN | Enhancing PiNN to develop a parameter-free, irregular geometry-based surrogate model for fluid flow modeling | [235] |
Spline-PiNN | Improving the generalization of PiNN by combining it with Hermite splines CNN to solve the incompressible Navier–Stokes equations | [236] |
cPiNN | Enhancing PiNN to solve high-dimensional nonlinear conservation laws requiring high computational and memory requirements | [202] |
SA-PiNN | Improving the PiNN’s convergence and accuracy problem for stiff PDEs using self-adaptive weights in the training | [57] |
XPiNN | Improving PiNN and cPiNN in terms of generalization, representation, parallelization capacity, and computational cost | [203] |
PiPN | Using point cloud-based neural network to incorporate geometric features of computational domains in PiNN and generalize the solution to unseen domains | [248] |
PiGAN | Integrating a physics-informed loss function into SRGAN to address multiphase turbulent fluid flow super-resolution | [249] |
3.2 PiNNs for Solid Mechanics.
The application of PiNNs in computational solid mechanics is also an active field of study [250,251]. The study by Haghighat et al. [252] on modeling linear elasticity using PiNN was among the first papers that introduced PiNN in the solid mechanics community. Since then, the framework has been extended to other solid mechanics problems (e.g., linear and nonlinear elastoplasticity).
Shukla et al. [253] used PiNN for surrogate modeling of the micro-structural properties of poly-crystalline nickel. In their study, in addition to employing the PiNN model, they applied an adaptive activation function to accelerate the convergence of numerical modeling. The resulting PiNN-based surrogate model demonstrated a viable strategy for nondestructive material evaluation. Henkes et al. [254] modeled nonlinear stress and displacement fields induced by inhomogeneities in materials with sharp phase transitions using PiNN. To overcome PiNN convergence issues in this problem, they used adaptive training approaches and domain decomposition [73]. According to their results, the domain decomposition approach is capable of properly resolving nonlinear stress, displacement, and energy in heterogeneous microstructures derived from real-world μCT scans images [254]. Zhang and Gu [255] trained a PiNN model with a loss function based on minimal energy criteria to investigate digital materials. The model tested on 1D tension, 1D bending, and 2D tensile problems demonstrated equivalent performance compared to supervised DL methods (i.e., PgNNs). By adding a hinge loss for the Jacobian matrix, the PiNN method was able to adequately approximate the logarithmic strain and rectify any erroneous deformation gradient.
Rao et al. [256] proposed a PiNN architecture with mixed-variable outputs (displacement and stress components) to handle elastodynamic problems without labeled data. The method was found to boost network’s accuracy and trainability in contrast to the pure displacement-based PiNN model. Figure 7 compares the ground truth stress fields generated by the FEM with the ones estimated by mixed-variable PiNN for an elastodynamic problem [256]. It can be observed that stress components can be accurately estimated by mixed-variable PiNN. Rao et al. [256] also proposed a composite scheme of PiNN to enforce the initial and boundary conditions in a hard manner compared to the conventional (vanilla) PiNN with soft initial and boundary condition enforcement. This model was tested on a series of dynamics problems (e.g., the defected plate under cyclic uniaxial tension and elastic wave propagation) and resulted in the mitigation of inaccuracies near the boundaries encountered by PiNN. Zhou et al. [257] proposed a framework for predicting fatigue life that uses a probabilistic PiNN to incorporate the underlying physics of fatigue. In this framework, a composite loss function is used to enforce physical constraints on derivatives and utilizes a negative log-likelihood function to ensure that the network learns a continuous function that aligns with both experimental data and physics-based insights. Fang and Zhan [258] proposed a PiNN model to design the electromagnetic meta-materials used in various practical applications such as cloaking, rotators, and concentrators. They studied PiNN’s inference issues for Maxwell’s equation [259] with a high wave number in the frequency domain and improved the activation function to overcome the high wave number problems. The proposed PiNN recovers not only continuous functions but also piecewise functions, which is a new contribution to the application of PiNN in practical problems. Zhang et al. [260] employed PiNN to identify nonhomogenous materials in elastic imaging for application in soft tissues. Two PiNNs were used, one for the approximate solution of the forward problem and the other for approximating the field of unknown material parameters. The results showed that the unknown distribution of mechanical properties can be accurately recovered using PiNN. Abueidda et al. [261] employed PiNN to simulate 3D hyperelasticity problems. They proposed an enhanced PiNN architecture consisting of the residuals of the strong form and the potential energy [262], producing several loss terms that contribute to the definition of the total loss function to be minimized. The enhanced PiNN outperformed both the conventional (vanilla) PiNN and deep energy methods, especially when there were areas of high solution gradients.
Haghighat et al. [15] tested a different variant of PiNN to handle inverse problems and surrogate modeling in solid mechanics. Instead of employing a single neural network, they implemented a PiNN with multiple neural networks in their study. They deployed the framework on linear elastostatic and nonlinear elastoplasticity problems and showed that the improved PiNN model provides a more reliable representation of the physical parameters. In addition, they investigated the domain of transfer learning in PiNN and found that the training phase converges more rapidly when transfer learning is used. Yuan et al. [263] proposed an auxiliary PiNN model (dubbed A-PiNN) to solve inverse problems of nonlinear integro-differential equations (IDEs). A-PiNNs circumvent the limitation of integral discretization by establishing auxiliary output variables in the governing equation to represent the integral(s) and by substituting the integral operator with automated differentiation of the auxiliary output. Therefore, A-PiNN, with its multi-output neural network, is constructed such that it determines both primary and auxiliary outputs to approximate both the variables and integrals in the governing equations. A-PiNNs were used to address the inverse issue of nonlinear IDEs, including the Volterra equation [264]. As demonstrated by their findings, the unknown parameters can be determined satisfactorily even with noisy data.
Some of the other variants of PiNN used in computational solid mechanics are as follows: PhySRNet, which is a PiNN-based super-resolution framework for reconstructing high-resolution output fields from low-resolution counterparts without requiring high-resolution labeled data [265]; peridynamic differential operator (PDDO)-PiNN, which is a combination of PDDO [266] and PiNN to overcome degrading performance of PiNN under sharp gradients [267]; PiELM, which is a combination of PiNN and extreme learning machine (ELM) [268] employed to solve direct problems in linear elasticity [269]; distributed PiNN (DPiNN), which is a distributed PiNN utilizing a piecewise-neural network representation for the underlying field, instead of the piece-polynomial representation commonly used in FEM [58]; and PiNN-FEM, which is a mixed formulation based on PiNN and FEM for computational mechanics in heterogeneous domain [270].
Table 6 reports a nonexhaustive list of recent studies that leveraged PiNN in computational solid mechanics. Furthermore, Table 7 reports a nonexhaustive list of recent studies that developed other variants of PiNN architectures to improve overall prediction accuracy and computational cost in solid mechanics modeling.
Area of application | Objectives | Reference |
---|---|---|
Elasticity | Solving forward and inverse problems in linear elastostatic and nonlinear elasticity problems | [15] |
Simulating forward and discovery problems for linear elasticity | [252] | |
Resolving the nonhomogeneous material identification problem in elasticity imaging | [260] | |
Estimating elastic properties of tissues using precompression and postcompression images of objects mimicking properties of tissues | [271] | |
Estimating mechanical response of elastic plates under different loading conditions | [272] | |
Finding optimal solutions to the reference biharmonic problems of elasticity and elastic plate theory | [273] | |
Simulating elastodynamic problems, e.g., elastic wave propagation, deflected plate under periodic uniaxial strain, without labeled data | [256] | |
Heterogeneous materials | Inferring the spatial variation of compliance coefficients of materials (e.g., speed of the elastic waves) to identify microstructure | [253] |
Resolving nonlinear stress, displacement, and energy fields in heterogeneous microstructures | [254] | |
Solving coupled thermomechanics problems in composite materials | [274] | |
Predicting the size, shape, and location of the internal structures (e.g., void, inclusion) using linear elasticity, hyperelasticity, and plasticity constitutive models | [275] | |
Structural elements | Predicting the small-strain response of arbitrarily curved shells | [276] |
Solving mechanical problems of elasticity in one-dimensional elements such as rods and beams | [277] | |
Predicting creep-fatigue life of components (316 stainless steel) at elevated temperatures | [278] | |
Quantifying physical parameters (e.g., corrosion-fatigue) that are not accounted for in cumulative damage models | [279] | |
Structural vibrations | Estimating and optimizing vibration characteristics and system properties of structural mechanics and vibration problems | [280] |
Digital materials | Resolving physical behaviors of digital materials to design next-generation composites | [255] |
Fracture mechanics | Reconstructing the solution of displacement field after damage to predict crack propagation for quasi-brittle materials | [281] |
Elasto-viscoplasticity | Modeling the strain rate and temperature dependence of the deformation fields (i.e., displacement, stress, plastic strain) | [282] |
Additive manufacturing | Predicting finite fatigue life in materials containing defects | [283] |
solid mechanics | Providing a detailed introduction to programming PiNN-based computational solid mechanics from 1D to 3D problems | [284] |
Area of application | Objectives | Reference |
---|---|---|
Elasticity | Solving forward and inverse problems in linear elastostatic and nonlinear elasticity problems | [15] |
Simulating forward and discovery problems for linear elasticity | [252] | |
Resolving the nonhomogeneous material identification problem in elasticity imaging | [260] | |
Estimating elastic properties of tissues using precompression and postcompression images of objects mimicking properties of tissues | [271] | |
Estimating mechanical response of elastic plates under different loading conditions | [272] | |
Finding optimal solutions to the reference biharmonic problems of elasticity and elastic plate theory | [273] | |
Simulating elastodynamic problems, e.g., elastic wave propagation, deflected plate under periodic uniaxial strain, without labeled data | [256] | |
Heterogeneous materials | Inferring the spatial variation of compliance coefficients of materials (e.g., speed of the elastic waves) to identify microstructure | [253] |
Resolving nonlinear stress, displacement, and energy fields in heterogeneous microstructures | [254] | |
Solving coupled thermomechanics problems in composite materials | [274] | |
Predicting the size, shape, and location of the internal structures (e.g., void, inclusion) using linear elasticity, hyperelasticity, and plasticity constitutive models | [275] | |
Structural elements | Predicting the small-strain response of arbitrarily curved shells | [276] |
Solving mechanical problems of elasticity in one-dimensional elements such as rods and beams | [277] | |
Predicting creep-fatigue life of components (316 stainless steel) at elevated temperatures | [278] | |
Quantifying physical parameters (e.g., corrosion-fatigue) that are not accounted for in cumulative damage models | [279] | |
Structural vibrations | Estimating and optimizing vibration characteristics and system properties of structural mechanics and vibration problems | [280] |
Digital materials | Resolving physical behaviors of digital materials to design next-generation composites | [255] |
Fracture mechanics | Reconstructing the solution of displacement field after damage to predict crack propagation for quasi-brittle materials | [281] |
Elasto-viscoplasticity | Modeling the strain rate and temperature dependence of the deformation fields (i.e., displacement, stress, plastic strain) | [282] |
Additive manufacturing | Predicting finite fatigue life in materials containing defects | [283] |
solid mechanics | Providing a detailed introduction to programming PiNN-based computational solid mechanics from 1D to 3D problems | [284] |
PiNN architecture | Objectives | Reference |
---|---|---|
PhySRNet | Enhancing PiNN using super-resolution techniques to reconstruct downscaled fields from their upscaled counterparts | [265] |
Enhanced PiNN | Improving the interpolation and extrapolation ability of PiNN by integrating potential energy, collocation method, and deep learning for hyperelastic problems | [261] |
PDDO-PiNN | Improving the performance of PiNN in the presence of a sharp gradient by integrating PDDO and PiNN methods to solve elastoplastic deformation problems | [267] |
PiELM | Accelerating PiNN’s training process by integrating PiNN and ELM methods to model high-dimensional shell structures | [269] |
PiELM | Integrating PiNN and ELM methods to accelerate PiNN’s training process in order to model biharmonic equations | [285] |
DPiNN | Providing a truly unified framework for addressing problems in solid mechanics solving high-dimensional inverse in heterogeneous media such as linear elasticity | [58] |
A-PiNN | Improving PiNN model to solve inverse problems of nonlinear integro-differential equations | [263] |
PiNN-FEM | Hybridizing FEM and PiNN to solve heterogeneous media problems such as elasticity and Poisson equation | [270] |
PiNN architecture | Objectives | Reference |
---|---|---|
PhySRNet | Enhancing PiNN using super-resolution techniques to reconstruct downscaled fields from their upscaled counterparts | [265] |
Enhanced PiNN | Improving the interpolation and extrapolation ability of PiNN by integrating potential energy, collocation method, and deep learning for hyperelastic problems | [261] |
PDDO-PiNN | Improving the performance of PiNN in the presence of a sharp gradient by integrating PDDO and PiNN methods to solve elastoplastic deformation problems | [267] |
PiELM | Accelerating PiNN’s training process by integrating PiNN and ELM methods to model high-dimensional shell structures | [269] |
PiELM | Integrating PiNN and ELM methods to accelerate PiNN’s training process in order to model biharmonic equations | [285] |
DPiNN | Providing a truly unified framework for addressing problems in solid mechanics solving high-dimensional inverse in heterogeneous media such as linear elasticity | [58] |
A-PiNN | Improving PiNN model to solve inverse problems of nonlinear integro-differential equations | [263] |
PiNN-FEM | Hybridizing FEM and PiNN to solve heterogeneous media problems such as elasticity and Poisson equation | [270] |
3.3 PiNNs’ Limitations.
PiNNs demonstrate significant potential in modeling dynamic systems described by ODEs and PDEs. Nonetheless, they have some limitations and shortcomings that must be considered:
Vanilla PiNNs use deep networks consisting of a series of fully connected layers and a variant of gradient descent optimization. The learning process and hyperparameter tuning are conducted manually and are sample size and problem dependent. Their training, thus, may face gradient vanishing problems and can be prohibitively slow for practical three-dimensional problems [286]. In addition, vanilla PiNNs impose limitations on low-dimensional spatiotemporal parameterization due to the usage of fully connected layers [46].
For linear, elliptic, and parabolic PDEs, Shin et al. [287] provided the first convergence theory with respect to the number of training data. They also discussed a set of conditions under which convergence can be guaranteed. However, there is no “solid” theoretical proof of convergence for PiNNs when applied to problems governed by nonlinear PDEs. Note that deep learning models generally fail to realize theoretically established global minima; thus, this limitation is not specific to PiNNs and holds for all deep learning models [7].
PiNNs contain several terms in the loss function with relative weighting that greatly affects the predicted solution. Currently, there are no guidelines for optimal weight selection [58]. Different terms in the loss function may compete with each other during training, and this competition may reduce the stability of the training process. PiNNs also suffer during training when confronted with an ill-posed optimization problem due to their dependence on soft physical constraints [46].
PiNNs may face challenges when handling multiscale problems that are governed by low-frequency (e.g., large-scale vortices or polymer relations that vary relatively slowly or smoothly over space or time) or high-frequency (e.g., small-scale turbulence or permeability variation in porous media that vary relatively fast or abruptly over time or space) features [50,288]. The former occurs because low-frequency features necessitate large domains to be entirely resolved, and PiNNs encounter difficulty propagating information from initial or boundary conditions to unobserved areas of a large domain or to future time-steps [289,290].
PiNNs also face challenges when applied to stiff [291,292] and high-dimensional dynamical systems [293]. Stiff problems may lead to discontinuous solutions, making it difficult for NNs with continuous activation functions to learn [294]. To resolve the issue to some extent, methods such as stiff-PiNNs can be used [295,296]. For high-dimensional problems, the required number of collocation points increases exponentially, making the learning process difficult and computationally expensive [7]. To address this issue to some extent, the importance sampling approach for collocation points [214] can be used.
PiNNs are solution learning algorithms, i.e., they learn the solutions to a given PDE for a single instance. For any new instance of functional parameters or coefficients, PiNNs require the training of a new neural network [56]. This is because, by construction, PiNNs cannot learn the physical operation of a given phenomenon, and this limits their generalization (e.g., spatiotemporal extrapolation). The PiNN approach thus suffers from the same computational issue as classical solvers, especially in 3D problems (e.g., FEM, FVM), as the optimization problem must be solved for every new instance of PDE parameters, boundary conditions, and initial conditions [64].
PiNNs encounter difficulties while learning the solutions to inverse problems in heterogeneous media, e.g., a composite slab composed of several materials [286]. In such cases, the parameters of the underlying PDE (e.g., conductivity or permeability coefficients) change across the domain; yet, a PiNN outputs unique parameter values over the whole domain due to its inherent design.
Despite the shortcomings, PiNNs offer a strong promise for complex domains that are hard to mesh and practical problems where data acquisition is expensive. To circumvent some of the limitations of vanilla PiNN, several techniques have been proposed. For instance, to address the first limitation listed earlier, discrete learning techniques using convolutional filters, such as HybridNet [297], dense convolutional encoder–decoder network [298], autoregressive encoder-decoder model [299], TF-Net [300], DiscretizationNet [301], and PhyGeoNet [302], have been employed that exceed vanilla PiNN in terms of computational efficiency. As another example, to address the last limitation listed earlier, Dwivedi et al. [286] proposed a DPiNN that has potential advantages over existing PiNNs to solve the inverse problems in heterogeneous media, which are most likely to be encountered in engineering practices. Some of the other solutions to solve high-dimensional inverse problems are cPiNN [202] and self-adaptive PiNN [57]. Further, XPiNN [203], with its intrinsic parallelization capabilities to deploy multiple neural networks in smaller subdomains, can be used to considerably reduce the computational cost of PiNNs in large (three-dimensional) domains. However, these modifications and alternatives do not solve the generalization problem of PiNNs as the resultant models lack the ability to enforce the existing physical knowledge. To this end, PeNNs have started to emerge. In Sec. 4, we will review the recent literature on physics-encoded neural networks.
4 Physics-Encoded Neural Networks
PeNNs are another family of mesh-free algorithms used in scientific computing, mostly in the fluid mechanics and solid mechanics fields, that strive to hard-encode underlying physics (i.e., prior knowledge) into the core architecture of neural networks. Note that, by construction, PeNN-based models extend the learning capability of a neural network from instance learning (imposed by the PgNN and PiNN architectures) to continuous learning [46,60,303]. To hard-encode physical laws (in terms of ODEs, PDEs, closure laws, etc.) into neural network components, different approaches have recently been proposed [9,46,60,194]. PeNN is not a completely new notion, as there has been a long trajectory of research that has proposed the philosophy of building physics constraints constructively into the architectures. For example, one can refer to preserving convexity [304] using deterministic annealing neural network, preserving positivity [305], enforcing symmetries in physics using Lagrangian neural networks (LaNNs) [306,307], capturing trajectories using symplectic recurrent neural networks (SRNNs) [308,309], enforcing exact physics, and extracting structure-preserving surrogate models using data-driven exterior calculus (DDEC) on graphs [310], and imposing different types of physics and probabilistic constraints, including mean values, variances, and derivative constraints, on neural network biases and weights for robust curve estimation and uncertainty quantification [311–313]. In this section, we review the two most prominent approaches for encoding physics in neural network architectures and their applications in computational fluid and solid mechanics: (i) physics-encoded recurrent convolutional neural network (PeRCNN) [46,303] and (ii) differential programing (DP) or neural ordinary differential equations (NeuralODEs) [9,60].
4.1 Physics-Encoded Recurrent Convolutional Neural Network.
Rao et al. [46] introduced the PeRCNN model, which hard encodes prior knowledge governing nonlinear systems into a neural network. The PeRCNN architecture shown in Fig. 8 facilitates learning in a data-driven manner while forcibly encoding the known physics knowledge. This model exceeds the PgNN and PiNN’s capabilities for phenomena in which the explicit formulation of PDEs does not exist and very limited measurement data are available (e.g., Earth or climate system modeling [59]). The encoding mechanism of physics proposed in this model is distinct from the penalty-based physics-informed learning approach and ensures strict adherence to the given physics by the network. The model uses a novel element-wise product operation to achieve nonlinearity, instead of traditional nonlinear activation functions. The results of numerical experiments showed that the physics-encoded learning paradigm produced a highly robust and generalizable model, even in the presence of data noise or scarcity, outperforming some of the existing state-of-the-art data-driven modeling approaches.
As shown in Fig. 8, PeRCNN is made of an input layer, which is constituted by low-resolution noisy initial state measurements X = [X1, X2, X3, …, Xn]; a fully convolutional network, as the initial state generator, which downscales/upsamples the low-resolution initial state to a full resolution initial state, dubbed modified Xo, to be used as input to further recurrent computations. For recurrent computing purposes, an unconventional convolutional block, dubbed π, is employed [46]. PeRCNN uses the π block as its main component, which applies multiple convolutional layers to the modified input Xo and combines their feature maps using an element-wise product layer. A 1 × 1 convolutional layer is added after the product operation to aggregate the output channels. By multiplying the output of the 1 × 1 convolutional layer by the time spacing δt, the residual of the dynamical system at time tk can be obtained, which is indicated as δUk. Ultimately, the last layer generates predictions Y′ = [Y′1, Y′2, Y′3, …, Y′n] by element-wise addition. These operations are shown schematically in Fig. 8.
The PeRCNN architecture was tested on two datasets representing 2D Burgers and 3D Gray–Scott reaction–diffusion equations [46]. In both cases, PeRCNN was compared with convolutional LSTM (ConvLSTM) [314], deep residual network [315], and deep hidden physics models [189] in terms of accuracy (RMSE), data noise/scarcity, and generalization. The comparison for the 2D Burgers’ dataset is shown in Fig. 9(a), adapted from the study by Rao et al. [46]. The cumulative RMSE for PeRCNN began with a higher value in the training region (due to 10% Gaussian noise in the data) and reduced as additional time-steps were assessed. The accumulative RMSE for PeRCNN slightly increases in the extrapolation phase (as a measure of the model’s generalization), but clearly surpasses all other algorithms in terms of long-term extrapolation. Rao et al. [303] also used PeRCNN to discover spatiotemporal PDEs from scarce and noisy data and demonstrated its effectiveness and superiority compared to baseline models.
Ren et al. [316] proposed a hybrid algorithm that combined PeRCNN and PiNN to solve the limitations in low-dimensional spatiotemporal parameterization encountered by PgNNs and PiNNs. In the resultant physics-informed convolutional-recurrent network, dubbed PhyCRNet, an encoder–decoder convolutional LSTM network is proposed for low-dimensional spatial feature extraction and temporal evolution learning. In PhyCRNet, the loss function is specified as aggregated discretized PDE residuals. The boundary conditions are hard-coded in the network via designated padding, and the initial conditions are defined as the first input state variable for the network. Autoregressive and residual connections that explicitly simulate time marching were used to enhance the networks. This method ensures generalization to a variety of initial and boundary condition scenarios and yields a well-posed optimization problem in network training. Using PhyCRNet, it is also possible to simultaneously enforce known conservation laws in the network (e.g., mass conservation can be enforced by applying a stream function as the solution variable in the network for fluid dynamics) [316]. Ren et al. [316] evaluated and validated the performance of PhyCRNet using several nonlinear PDEs compared to state-of-the-art baseline algorithms such as the PiNN and autoregressive dense encoder–decoder model [299]. A comparison between PhyCRNet and PiNN to solve Burgers’ equation is shown in Fig. 9(b) [316]. The results obtained by Ren et al. [316] clearly demonstrated the superiority of the PhyCRNet methodology in terms of solution accuracy, extrapolability, and generalizability.
4.2 Neural Ordinary Differential Equations.
The NeuralODE method is another family of PeNN models in which the hidden state of the neural network is transformed from a discrete sequence to a continuous nonlinear function by parametrizing the hidden state derivative using a differentiable function [60]. The output of the network is then computed using a traditional differential equation solver. During training, the error is backpropagated through the network as well as through the ODE solver without access to its internal operations. This architecture is feasible since numerical linear algebra is the common underlying infrastructure for both scientific computing and deep learning, which is bridged by automated differentiation (AD) [317]. Because both differential equations and neural networks are differentiable, standard optimization and error backpropagation techniques can be used to optimize network’s weights during training. Instead of learning the nonlinear transformation directly from the training data, the model in NeuralODE learns the structures of the nonlinear transformation. Therefore, because the neural network optimization equations are differentiable, the physical differential equations can be encoded directly into a layer as opposed to adding more layers (e.g., deeper networks). This results in a shallower network mimicking an infinitely deep model that can be continuously inferred at any desired accuracy at reduced memory and computational cost [318].
These continuous-depth models offer features that are lacking in PiNN and PgNNs, such as (i) a reduced number of parameters for supervised learning, (ii) constant memory cost as a function of depth, and (iii) continuous time-series learning (i.e., training with datasets acquired at arbitrary time intervals) [60]. However, the error backpropagation may cause technical difficulties when training such continuous-depth networks. Chen et al. [60] computed gradients using the adjoint sensitivity method [319] while considering the ODE solver as a black box. They demonstrated that this method uses minimal memory can directly control numerical error, and, most importantly, scales linearly with the size of the problem.
Ma et al. [320] compared the performance of the discrete and continuous adjoint sensitivity analysis. They indicated that forward-mode discrete local sensitivity analysis implemented via AD is more efficient than reverse-mode and continuous forward and/or adjoint sensitivity analysis for problems with approximately fewer than 100 parameters. However, in terms of scalability, they showed that the continuous adjoint method is more efficient than the discrete adjoint and forward methods.
Several computational libraries have been implemented to facilitate the practical application of NeuralODE. Poli et al. [321] implemented the TorchDyn library to train NeuralODE models and be as accessible as regular plug-and-play deep learning primitives. Innes et al. [9] and Rackauckas et al. [318] developed GPU-accelerated Zygote and DiffEqFlux libraries in the Julia coding ecosystem to bring differentiable programming and universal differential equation solver capabilities together. As an example, they encoded the ordinary differential equation of motion, as the transformation function, into a neural network to simulate the inverse dynamics of trebuchet [9]. As shown in Fig. 10, the network with classical layers takes the target location and wind speed as input and estimates the weight and angle of the projectile to hit the target. These outputs are fed into the ODE solver to calculate the distance achieved. The model compares the predicted value with the target location and backpropagates the error through the entire chain to adjust the weights of the network. This PeNN model solves trebuchet’s inverse dynamics on a personal computer 100x faster than a classical optimization algorithm for this inverse problem. Once trained, this network can be used to aim at any blind target, not just the ones it was trained on; hence, the model is both accelerated and predictive.
NeuralODE has also been integrated with PiNN models, dubbed PiNODE, in order to further constrain the network with known governing physics during training. Such an architecture consists of a neural network whose hidden state is parameterized by an ODE with a loss function similar to PiNN’s loss function (see Fig. 5). The loss function penalizes the algorithm based on the data and the strong form of the governing ODEs and backpropagates the error through the application of the adjoint sensitivity methods [320], to update the learnable parameters in the architecture. PiNODE can be deployed to overcome high bias (due to the use of first principles in scientific modeling) and high variance (due to the use of pure data-driven models in scientific modeling) problems. In other words, using PiNODE, prior physics knowledge in terms of ODE is integrated where it is available, and function approximation (e.g., neural networks) is used where it is not available. Lai et al. [322] used PiNODE to model the governing equations in the field of structural dynamics (e.g., free vibration of a 4-degrees-of-freedom dynamical system with cubic nonlinearity). They showed that PiNODE provides an adaptable framework for structural health monitoring (e.g., damage detection) problems. Roehrl et al. [323] tested PiNODE using a forward model of an inverted pendulum on a cart and showed that the approach can learn the nonconservative forces in a real-world physical system with substantial uncertainty.
The application of neural differential equations has also been extended to learn the dynamics of PDE-described systems. Dulny et al. [324] proposed NeuralPDE by combining the method of lines (which represents arbitrarily complex PDEs by a system of ODEs) and NeuralODE through the use of a multilayer convolutional neural network. They tested NeuralPDE on several spatiotemporal datasets generated from the advection–diffusion equation, Burgers’ equation, wave propagation equation, climate modeling, and so on. They found that NeuralPDE is competitive with other DL-based approaches, e.g., ResNet [325]. The NeuralPDE’s limitations are set by the limitations of the method of lines, e.g., it cannot be used to solve elliptical second-order PDEs. Tables 8 and 9 reports a nonexhaustive list of leading studies that leveraged PeNN to model different scientific problems.
PeNN structure | Objective | Reference |
---|---|---|
PeRCNN | Hard-encoding prior knowledge into a neural network to model nonlinear systems | [46] |
PhyCRNet | Leveraging the benefits of PeRCNN and PiNN into a single architecture | [316] |
NeuralODE | Developing a continuous-depth network by parametrizing the hidden state derivative using a differentiable function | [60] |
Zygote/DiffEqFlux | Providing libraries in the Julia coding ecosystem to facilitate the practical application of NeuralODE. | [9,318] |
PiNODE | Integrating PiNN and NeuralODE to provide an adaptable framework for structural health monitoring | [322] |
NeuralPDE | Combining the method of lines and NeuralODE to learn the dynamics of PDE-described systems | [324] |
LaNN | Capturing symmetries in physical problems such as relativistic particle and double pendulum | [306] |
SRNNs | Capturing dynamics of Hamiltonian systems such as the three-body and spring-chain system from observed trajectories | [308] |
DDEC | Enforcing exact physics and extracting structure-preserving surrogate models | [310] |
Probabilistic NN | Formulating appropriate neural network architecture with weight and bias constraints | [311,312] |
PeNN structure | Objective | Reference |
---|---|---|
PeRCNN | Hard-encoding prior knowledge into a neural network to model nonlinear systems | [46] |
PhyCRNet | Leveraging the benefits of PeRCNN and PiNN into a single architecture | [316] |
NeuralODE | Developing a continuous-depth network by parametrizing the hidden state derivative using a differentiable function | [60] |
Zygote/DiffEqFlux | Providing libraries in the Julia coding ecosystem to facilitate the practical application of NeuralODE. | [9,318] |
PiNODE | Integrating PiNN and NeuralODE to provide an adaptable framework for structural health monitoring | [322] |
NeuralPDE | Combining the method of lines and NeuralODE to learn the dynamics of PDE-described systems | [324] |
LaNN | Capturing symmetries in physical problems such as relativistic particle and double pendulum | [306] |
SRNNs | Capturing dynamics of Hamiltonian systems such as the three-body and spring-chain system from observed trajectories | [308] |
DDEC | Enforcing exact physics and extracting structure-preserving surrogate models | [310] |
Probabilistic NN | Formulating appropriate neural network architecture with weight and bias constraints | [311,312] |
NO structure | Objective | Reference |
---|---|---|
DeepONets | Learning nonlinear operators accurately and efficiently with low generalization error | [63] |
CA-DeepONet | Learning dynamic development of a two-phase mixture and reducing solution time for predicting microstructure evolution | [332] |
Pi-DeepONets | Integrating DeepONets and PiNN to relax the requirement for a large training dataset while improving generalization and predictive accuracy | [65] |
V-DeepONet | Generating a fast and generalizable surrogate model for brittle fracture mechanics | [333] |
FNO | Providing a mesh- and discretization-invariant model to be inferred at any arbitrary spatiotemporal resolutions | [194] |
Parallelized FNO | Extending FNO based on domain decomposition to model large-scale three-dimensional problems | [336] |
U-FNO | Enhancing the training and testing accuracy of FNO for large-scale, highly heterogeneous geological formations | [338] |
IFNO | Capturing the long-range dependencies in the feature space that yields an implicit neural operator to model the complex responses of materials due to their heterogeneity and defects | [339] |
PINO | Integrating PiNN and FNO to relax the requirement for a large training dataset while enhancing generalization and optimization | [50] |
NO structure | Objective | Reference |
---|---|---|
DeepONets | Learning nonlinear operators accurately and efficiently with low generalization error | [63] |
CA-DeepONet | Learning dynamic development of a two-phase mixture and reducing solution time for predicting microstructure evolution | [332] |
Pi-DeepONets | Integrating DeepONets and PiNN to relax the requirement for a large training dataset while improving generalization and predictive accuracy | [65] |
V-DeepONet | Generating a fast and generalizable surrogate model for brittle fracture mechanics | [333] |
FNO | Providing a mesh- and discretization-invariant model to be inferred at any arbitrary spatiotemporal resolutions | [194] |
Parallelized FNO | Extending FNO based on domain decomposition to model large-scale three-dimensional problems | [336] |
U-FNO | Enhancing the training and testing accuracy of FNO for large-scale, highly heterogeneous geological formations | [338] |
IFNO | Capturing the long-range dependencies in the feature space that yields an implicit neural operator to model the complex responses of materials due to their heterogeneity and defects | [339] |
PINO | Integrating PiNN and FNO to relax the requirement for a large training dataset while enhancing generalization and optimization | [50] |
4.3 PeNNs’ Limitations.
Despite the advancement of numerous PeNN models and their success in modeling complex physical systems, these new architectures also face several challenges. The most important one is attributed to training. PeNN-based models promote continuous learning using the development of continuous-depth networks, making PeNNs more difficult to train than PgNNs and PiNNs. Considering this, most of the limitations faced by PgNN and PiNN (e.g., convergence rate, stability, scalability, sample size dependency, and problem dependency) are also faced by PeNN. In addition, PeNNs usually have complex architectures, and their implementation is not as straightforward as PiNNs or PgNNs. In spite of PeNNs’ implementation complexity, their efficient algorithms in the finite-dimensional setting, their ability to provide transferable solutions, their robustness against data scarcity, and their generalizability compared to PgNN and PiNN make them have a great potential to significantly accelerate traditional scientific computing for applications in computational fluid and solid mechanics.
5 Neural Operators
Most of the scientific deep learning methods discussed so far, e.g., PgNNs, PiNNs, and PeNNs, are generally designed to map the solution of a physical phenomenon for a single instance (e.g., a certain spatiotemporal domain and boundary conditions to solve a PDE using PiNN) and thus must be retrained or further trained (e.g., transfer learning [326]) to map the solution under a different instant. Another way to alleviate this problem is to use neural operators that learn nonlinear mappings between function spaces [45,327,328]. Neural operators, thus, form another simulation paradigm that learns the underlying linear and nonlinear continuous operators using advanced architecture. These models, similar to PgNNs, enforce the physics of the problem using labeled input–output dataset pairs, but provide enhanced generalization, interpretability, continuous learning, and computational efficiency compared to PgNNs as well as PiNNs and PeNNs [50,60,194].
This new paradigm uses mesh-invariant, infinite-dimensional operators based on neural networks that do not require a prior understanding of PDEs. Neural operators work with the data to learn the resolution-invariant solution to the problem of interest [50]. In other words, neural operators can be trained on one spatiotemporal resolution and successfully inferred on any other [328]. This resolution-invariant feature is achieved using the fact that a neural operator learns continuous functions rather than discretized vectors, by parameterizing the model in function spaces [50,328]. Note that PgNNs and PiNNs, for example, using MLP, may also guarantee a small generalization error, but that is only achieved by sufficiently large networks. One distinct feature of neural operators is their robustness for applications that require real-time inference, especially when combined with PiNN [64,65]. Three main neural operators have recently been proposed, namely, (i) DeepONets [63], (ii) Fourier neural operator (FNO) [194], and (iii) graph neural operator (GNO) [328,329]. A recent review by Goswami et al. [64] extensively compared these neural operators. In this section, we briefly review DeepONets and FNO as the two prominent neural operators that can be applied in computational fluid and solid mechanics.
5.1 Deep Operator Networks.
Lu et al. [45] developed DeepONets based on the universal approximation theorem for operators [330] that can be used to learn operators accurately and efficiently with very small generalization errors. Lu et al. [63] proposed two architectures known as stacked and unstacked for DeepONet. The stacked DeepONet architecture is shown in Fig. 11(a), which consists of one trunk network and multiple stacked branch networks, k = 1, 2 …, p. The stacked DeepONet is formed by selecting the trunk network as a one-layer network with width p and each branch network as a one hidden layer network with width n. To learn an operator G:s → G(s), the stacked DeepONet architecture takes the function s as input to branch networks and y (i.e., points in the domain of G(s)) as input to the trunk network. Here, the vector [(x1), (x2), …, (xm)] represents the finite data locations, alternatively dubbed sensors. The trunk network outputs , and each branch network outputs a scalar represented by , where k = 1, 2 · · ·, p. Next, the outputs generated by trunk and branch networks are integrated together as . The unstacked DeepONet architecture is also shown in Fig. 11(a), which consists of only one branch network (depicted in dark blue) and one trunk network. Unstacked DeepONet can be considered as a stacked DeepONet, in which all branch networks share the same set of parameters [63]. DeepONet was first used to learn several explicit operators, including integral and fractional Laplacians, along with implicit operators that represented deterministic and stochastic differential equations [63]. The two main advantages of DeepONets discussed by Lu et al. [63] are as follows: (i) small generalization error and (ii) rapid convergence of training and testing errors with respect to the quantity of training data.
Lin et al. [331] showed the effectiveness of DeepONet against data density and placement, which is advantageous when no prior knowledge is available on how much training data are required or when there are strict limits on data acquisition (e.g., location accessibility). To this end, they employed DeepONet and LSTM (i.e., PgNN) to model datasets that represent the formation of a single bubble in response to time-varying changes in ambient liquid pressure. To generate datasets, they used Rayleigh–Plesset (R–P) as a macroscopic model and dissipative particle dynamics as a microscopic model. They used Gaussian random fields (GRFs) to generate different pressure fields, which serve as input signals for this dynamical system. The results of the comparison are shown in Fig. 11(b). The top row shows the prediction results for the liquid pressure trajectory when only 20 data points are known per trajectory, i.e., sparse training data, and the bottom row shows the same but when 200 data points are known per trajectory, i.e., dense training data. As shown, regardless of how sparse the training data was, DeepONet was able to outperform LSTM in predicting the liquid pressure trajectory.
In addition, they examined a case where the input was not contained within the training input range, i.e., when the correlation length of the pressure field was outside of the training range. In this case, they were initially unable to make accurate predictions, but mitigated the issue by transferring learning to a pretrained DeepONet trunk network and fine-tuning it with only a few additional data points. They also demonstrated that DeepONet can learn the mean component of the noisy raw data for the microscopic model without any additional data processing and that the computational time can be reduced from 48 CPU hours to a fraction of a second. These results confirmed that the DeepONet model can be applied across macroscopic and microscopic regimes of bubble growth dynamics, establishing the foundation for a unified neural network model that can seamlessly predict physics interacting across scales.
Oommen et al. [332] proposed a method called convolutional autoencoder (CA)-DeepONet, which combines CA with DeepONet to accelerate the prediction of microstructure evolution by learning the dynamic development of a two-phase mixture. The CA-DeepONet approach uses the low-dimensional latent space of the CA to obtain a compressed representation of the microstructure data, while the DeepONet is used to learn the mesoscale dynamics of microstructure evolution from the latent space of the autoencoder. The decoder component of the CA then reconstructs the evolution of the microstructure based on DeepONet’s predictions. The trained DeepOnet architecture can then be used to speed up the numerical solver in extrapolation tasks or substitute the high-fidelity phase-field numerical solver in interpolation problems. By taking inspiration from PiNNs for sparse data domains, DeepONets can also be trained with very sparse labeled datasets while incorporating known differential equations into the loss function. This approach results in physics-informed DeepONets (Pi-DeepONets) [65,333]. Wang et al. [65] employed Pi-DeepONets for benchmark problems such as diffusion reaction, Burger’s equation, advection equation, and eikonal equation. Compared to vanilla DeepONet, the result reveals significant improvements in predictive accuracy, generalization performance, and data efficiency. Furthermore, Pi-DeepONets can learn a solution operator without any paired input–output training data, allowing them to simulate nonlinear and nonequilibrium processes in computational mechanics up to three orders of magnitude faster than traditional solvers [65].
Goswami et al. [333] used a physics-informed variational formulation of DeepONet (Pi-V-DeepONet) for the mechanics of brittle fractures. Training of the Pi-V-DeepONet was performed using the governing equations in a variational form and some labeled data. They used the Pi-V-DeepONet framework to determine failure pathways, failure zones, and damage along failure in brittle fractures for quasi-brittle materials. They trained the model to map the initial configuration of a defect (e.g., crack) to the relevant fields of interest (e.g., damage and displacements, see Fig. 12). They showed that their model can rapidly predict the solution to any initial crack configuration and loading steps. In brittle fracture mechanics, the proposed model can be used to improve design, assess reliability, and quantify uncertainty.
Due to the high cost of evaluating integral operators, DeepONets may face difficulty in developing effective numerical algorithms capable of replacing convolutional or recurrent neural networks in an infinite-dimensional context. Li et al. [194] made an effort along this line and developed an operator regression by parameterizing the integral kernel in the Fourier space and termed it the FNO. In Sec. 5.2, we discuss the core architecture of FNO and the recent developments around it.
5.2 Fourier Neural Operator.
In order to benefit from neural operators in infinite-dimensional spaces, Li et al. [194] developed a neural operator in the Fourier space, dubbed FNO, with a core architecture schematically shown in Fig. 13. Training starts with an input X, which is subsequently elevated to a higher-dimensional space by a neural network S. The second phase involves the use of several Fourier layers of integral operators and activation functions. In each Fourier layer, the input is transformed using (i) a Fourier transform, F; (ii) a linear transform, T, in the lower Fourier modes that filters out the higher modes; and (iii) an inverse Fourier transform, F−1. The input is also transformed using a local linear transform, W, before the application of the activation function, σ. The Fourier layers are designed to be discretization invariant since they learn from functions that are discretized arbitrarily. Indeed, the integral operator is applied in convolution and is represented as a linear transformation in the Fourier domain, allowing the FNO to learn the mapping over infinite-dimensional spaces. The result of the Fourier layer is projected back to the target dimension in the third phase using another neural network M, which eventually outputs the desired output Y′ [194]. Unlike other DL methods, the FNO model error is consistent regardless of the input and output resolutions (e.g., in PgNN methods, the error grows with the resolution).
Li et al. [194] employed FNO on three different test cases, including the 1D Burgers’ equation, the 2D Darcy flow equation, and the 2D Navier–Stokes equations. For each test case, FNO was compared with state-of-the-art models. In particular, for Burgers’ and Darcy’s test cases, the methods used for comparison were the conventional ANN (i.e., PgNN), reduced bias method [334], fully convolutional networks [335], principal component analysis as an encoder in the neural network [327], graph neural operator [328], and low-rank decomposition neural operator (i.e., unstacked DeepONet [45]). In all test cases, FNO yielded the lowest relative error. The model error comparisons for 1D Burgers’ and 2D Darcy flow equations are depicted in Fig. 14, adapted from the study by Li et al. [194].
The FNO model, as stated, can be trained on a specific resolution and tested on a different resolution. Li et al. [194] demonstrated this claim by training a FNO on the Navier–Stokes equations for a 2D test case with a resolution of 64 × 64 × 20 (nx, ny, nt) standing for spatial (x, y) and time resolution, and then evaluating it with a resolution of 256 × 256 × 80, as shown in Fig. 15(a). Compared to other models, the FNO model was the only technique capable of performing resolution downscaling both spatially and temporally [194]. FNOs can also achieve several orders of magnitude speedup factors over conventional numerical PDE solvers. However, they have only been used for 2D or small 3D problems due to the large dimensionality of their input data, which increases in the number of network weights significantly. With this problem in mind, Grady et al. [336] proposed a parallelized version of FNO based on domain-decomposition to resolve this limitation. By using this extension, they were able to use FNO in large-scale modeling, e.g., simulating the transient evolution of the CO2 plume in subsurface heterogeneous reservoirs as part of the carbon capture and storage technology [337], see Fig. 15(b). The input to the network (with a similar architecture to that proposed by Li et al. [194]) was designed to be a tensor containing both the permeability and topography fields at each 3D spatial position using a 60 × 60 × 64 (nx, ny, nz) resolution, and the output was 60 × 60 × 64 × nt. For a time resolution of nt = 30 s, they found that the parallelized FNO model was 271 times faster (without even leveraging GPU) than the conventional porous media solver while achieving comparable accuracy. Wen et al. [338] proposed U-FNO, an extended version of FNO, to simulate multiphase flows in porous media. Their study focused on the CO2–water multiphase flow through a heterogeneous medium under a wide range of reservoir conditions, injection configurations, flowrates, and multiphase flow properties. U-FNO was compared to FNO and CNN (i.e., PgNN) and found to perform better in predicting gas saturation and pressure build up in highly heterogeneous geological formations. However, the authors noted that while U-FNO improved the training accuracy of FNO, it did not inherently allow flexibility in training and testing at different discretizations.
You et al. [339] proposed implicit Fourier neural operator (IFNO) to model the behavior of materials with defects and heterogeneity without relying on traditional constitutive models. As the depth of the network increases, the IFNO model can represent a fixed-point equation that generates an implicit neural operator and can capture long-range dependencies within the feature space (e.g., it can mimic displacement/damage fields). You et al. [339] demonstrated the performance of IFNO using a series of test cases such as hyperelastic, anisotropic, and brittle materials. Figure 16 shows a comparison between IFNO and FNO for the transient propagation of a glass-ceramic crack [339]. As demonstrated, IFNO outperforms FNO (in terms of accuracy) and conventional constitutive models (in terms of computational cost) to predict the displacement field.
The FNO model has also been hybridized with PiNN to create the so-called physics-informed neural operator (PiNO) [50]. The PiNO framework is a combination of operating-learning (i.e., FNO) and function-optimization (i.e., PiNN) frameworks that improves convergence rates and accuracy over both PiNN and FNO models. This integration was suggested to address the challenges in PiNN (e.g., generalization and optimization, especially for multiscale dynamical systems) and the challenges in FNO (e.g., the need for expensive and impractical large training datasets) [50]. Li et al. [50] deployed the PiNO model on several benchmark problems (e.g., Kolmogorov flow, lid-cavity flow) to show that PiNO can outperform PiNN and FNO models while maintaining the FNO’s exceptional speedup factor over other solvers.
5.3 NOs’ Limitations.
DeepONet [45] and FNO [194], as the two most common neural operators to date, share some commonalities but also have significant differences. The DeepONet architecture was developed based on the universal approximation theorems proposed by Chen and Chen [330], while the FNO architecture was based on parameterizing the integral kernel in Fourier space. However, the continuous form of FNO can be seen as DeepONet with a specific trunk architecture (using a trigonometric basis) and branch networks [340]. Unlike DeepONet, FNO discretizes both the input and output functions via point-wise evaluations in an evenly spaced mesh. As a result, after network training, FNO can only predict the solution in the same mesh as the input function, whereas DeepONet can make predictions at any arbitrary location. In addition, FNO requires a full field of observation data for training, while DeepONet is more flexible in this regard, except POD-DeepONet [341] that requires a full field of observation data to calculate POD modes. DeepONet, FNO, and their various variants still face some limitations, especially when applied to large multi-physics problems, that necessitate further investigations.
Neural operators are purely data driven and require relatively large training datasets; therefore, they face constraints when applied to problems where data acquisition is complex and/or costly [341]. Integration with PiNN can resolve this issue to some extent for problems where the underlying physics is fully known and can be integrated into the loss function [65]. Also, for practical applications, training NOs based solely on the governing equations in the loss function can produce inaccurate predictions; instead, hybrid physics-data training is recommended [65].
Neural operators encounter challenges when learning from diverse datasets, especially when the input functions are sampled from various GRFs with different length scales [342]. To address this problem, one can analyze the extrapolation behavior of neural operators by measuring the complexity of the extrapolation with the 2-Wasserstein distance [343] and taking into account the bias–variance trade-off for extrapolation with respect to the capacity of the model [344].
Neural operators, especially DeepONets, can face limitations when dealing with large time domains due to high dimensionality, leading to training challenges and overfitting [345]. To tackle this limitation, one can restructure DeepONet learning so that it recursively learns the solution trajectories of dynamical systems with time-dependent inputs [346].
DeepONet and FNO are typically limited to basic geometry and/or structured data (e.g., 2D or small 3D problems) due to the large dimensionality of their input data that significantly increases the number of network weights) [341]. They are also prone to overfitting as the number of trainable parameters increases, making the training process more difficult [347]. IFNOs have addressed this challenge to some extent [339]. In IFNO, the solution operator is first formulated as an implicitly defined mapping and then modeled as a fixed point. The latter aims to overcome the challenge of network training in the case of deep layers, and the former minimizes the number of trainable parameters and memory costs. However, due to the finite size of the neural network architecture, the convergence of NOs (e.g., DeepONet) error for the size of the training data becomes algebraic for large datasets; it is desired to be exponential [341].
FNO might not be reliable for discontinuous functions as it relies on the Fourier transformation. This is mitigated to some extent by DeepONet, as it was shown to perform well for functions with discontinuity (e.g., compressible Euler equations) [341].
Despite these limitations, neural operators are the leading algorithms in a variety of real-time inference applications (in vanilla form [63,65,344] or combined with PiNN [64]), including autonomous systems, surrogates in design problems, and uncertainty quantification. For the latter, the variational Bayes DeepONet (VB-DeepONet) was introduced to provide a Bayesian operator learning framework that improves the robustness and generalization capabilities of DeepONets while providing essential predictive uncertainty information [348]. Yang et al. [349] proposed an approach to quantify uncertainty in DeepONets, showing its efficacy in improving prediction robustness, estimating uncertainty in data-scarce scenarios, and identifying out-of-distribution examples. Additionally, they introduced a specialized library, UQDeepONet, designed to facilitate large-scale applications of uncertainty quantification in DeepONets. The significance of uncertainty quantification in neural operators lies in its ability to provide methodologies and tools to meet the increasing demand for dependable and interpretable uncertainty quantification when integrating neural networks with mathematical principles in scientific and engineering applications. Lin et al. [350] also proposed a Bayesian DeepONet, which leverages replica exchange Langevin diffusion to handle noisy data. They showed that Bayesian DeepONet, compared to DeepONets trained with gradient-based algorithms, significantly improves training convergence and provides accurate uncertainty estimates for noisy scenarios.
6 Conclusions and Future Research Directions
A considerable number of research topics collectively support the efficacy of combining scientific computing and deep learning approaches. In particular, this combination improves the efficiency of both forward and inverse modeling for high-dimensional problems that are prohibitively expensive, contain noisy data, require a complex mesh, and are governed by nonlinear, ill-posed differential equations. The everincreasing computer power will continue to further furnish this combination by allowing the use of deeper neural networks and considering higher-dimensional interdependencies and design space. This can result in enhancing predictive capabilities for fluid and solid mechanics problems and discovering new physics and hidden patterns governing fluid and solid behaviors.
The combination of scientific computing and deep learning approaches also surpasses traditional computational mechanics solvers in a number of prevalent scenarios in practical engineering. For example, a sparse dataset obtained experimentally for a complex (i.e., hard-to-acquire data) phenomenon cannot be simply integrated with traditional solvers. By using DL, the following tasks can be performed: (i) PgNN-based models can be applied to the sparse data to extract latent interdependencies and conduct spatiotemporal downscaling or upscaling (i.e., interpolated data); (ii) PiNN-based models can be applied to the interpolated data to deduce governing equations and potentially unknown boundary or initial conditions of the phenomena (i.e., strong mathematical form); (iii) PeNN-based models can be used to combine the interpolated data and the strong mathematical form to conduct extrapolation exploration; and (iv) NO-based models can be applied to make real-time predictions of the complex dynamics. Therefore, the combination of DL-based methods and traditional scientific computing methods provides scientists with a cost-effective toolbox to explore problems across different scales that were deemed computationally far-fetched.
To this end, several other breakthroughs in DL are required to enable the use of PgNNs, PiNNs, PeNNs, and NOs in large-scale three-dimensional (or multidimensional) problems. For instance, the training of complex DL models (e.g., PiNNs, PeNNs, and NOs) should be accelerated using different parallelization paradigms, and the trained models should generalize well to unseen conditions, geometries, or material properties. Table 10 compares the main characteristics of PgNNs, PiNNs, PeNNs, and NOs. PgNN-based models suffer mainly from their statistical training process, for which they require large datasets. They map carefully curated training datasets only based on correlations in statistical variations, and hence, their predictions are naturally physics agnostic. PiNN-based models suffer mainly from the presence of competing loss terms that can destabilize the training process. PiNN is also a solution learning algorithm with limited generalizability due to its inability to learn the physical operation of a specific phenomenon. Models based on PeNNs and NOs, on the other hand, may experience low convergence rates and require a large volume of paired, structured datasets, leading to highly expensive training.
Feature | PgNNs | PiNNs | PeNNs | NOs |
---|---|---|---|---|
Accelerating capability | ✓ | ✓ | ✓ | ✓ |
Mesh-free simulation | ✓ | ✓ | ✓ | ✓ |
Straightforward network training | ✓ | ✗ | ✗ | ✗ |
Training without labeled data | ✗ | ✓ | ✗ | ✗ |
Physics-informed loss function | ✗ | ✓ | ✓ | ✓ |
Continuous solution | ✗ | ✓ | ✓ | ✓ |
Spatiotemporal interpolation | ✗ | ✓ | ✓ | ✓ |
Physics encoding | ✗ | ✗ | ✓ | ✗ |
Efficient operator learning | ✗ | ✗ | ✗ | ✓ |
Continuous-depth models | ✗ | ✗ | ✓ | ✗ |
Spatiotemporal extrapolation | ✗ | ✗ | ✓ | ✓ |
Solution transferability | ✗ | ✗ | ✓ | ✓ |
Efficient real-time predictions | ✗ | ✗ | ✗ | ✓ |
Feature | PgNNs | PiNNs | PeNNs | NOs |
---|---|---|---|---|
Accelerating capability | ✓ | ✓ | ✓ | ✓ |
Mesh-free simulation | ✓ | ✓ | ✓ | ✓ |
Straightforward network training | ✓ | ✗ | ✗ | ✗ |
Training without labeled data | ✗ | ✓ | ✗ | ✗ |
Physics-informed loss function | ✗ | ✓ | ✓ | ✓ |
Continuous solution | ✗ | ✓ | ✓ | ✓ |
Spatiotemporal interpolation | ✗ | ✓ | ✓ | ✓ |
Physics encoding | ✗ | ✗ | ✓ | ✗ |
Efficient operator learning | ✗ | ✗ | ✗ | ✓ |
Continuous-depth models | ✗ | ✗ | ✓ | ✗ |
Spatiotemporal extrapolation | ✗ | ✗ | ✓ | ✓ |
Solution transferability | ✗ | ✗ | ✓ | ✓ |
Efficient real-time predictions | ✗ | ✗ | ✗ | ✓ |
Considering the effectiveness of this new challenge of combining scientific computing and DL, future studies can be divided into three distinct categories: (i) Improving algorithms: Developing advanced variants of PgNNs, PiNNs, PeNNs, and NOs that offer simpler implementation with enhanced convergence rate; faster training in multidimensional and multiphysics problems; higher accuracy and generalization to unseen conditions while using sparse training datasets, more robust to be used in real time forecasting; better adaptability to multi-spatiotemporal resolutions, more flexibility to encode various types of governing equations (e.g., all PDE types, closure laws, data-driven laws, etc.), and provide a closer tie with a plethora of traditional solvers. (ii) Considering causalities: Developing a causal training algorithm (e.g., causal Q-learning [351]) that restores physical causality during the training of PgNN, PiNN, and PeNN models by re-weighting the governing equations (e.g., PDEs) residual loss at each iteration. This line of research will allow for the development of causality-conforming variants of the PgNN, PiNN, and PeNN algorithms that can bring new opportunities for the application of these algorithms to a wider variety of complex scenarios across diverse domains. (iii) Expanding applications: Leveraging the potentials of PgNNs, PiNNs, PeNNs, and NOs in problems with complex anisotropic materials (e.g., flow in highly heterogeneous porous media, metal and non-metal particulate composites); problems with multiscale multiphysics phenomena (e.g., magnetorheological fluids, particle-laden fluids, dry powder dynamics, reactive transport, unsaturated soil dynamics); problems with multiresolution objectives and extensive spatiotemporal downscaling or upscaling (e.g., global and regional climate modeling, geosystem reservoir modeling); and structural health monitoring (e.g., crack identification and propagation, hydrogen pipeline leakage, CO2 plume detection). and (iv) Coupling solvers to form hybrid models: Coupling PgNNs, PiNNs, and PeNNs as well as NOs with open-source computational mechanics packages such as openifem, openfoam, palabos, lammps, liggghts, and moose. Integrating NNs with existing simulation tools and workflows seamlessly is a nontrivial task. This line of research will allow for faster surrogate modeling, and hence faster development of next-generation solvers. It also accelerates community and industry adoption of the combined scientific-DL computational paradigm.
Acknowledgment
S. A. F. would like to acknowledge supports from the Department of Energy Biological and Environmental Research (BER) (Award No. DE-SC0023044), National Science Foundation Partnership for Research and Education in Materials (PREM) (Award No. DMR-2122041), American Chemical Society Petroleum Research Fund (Award No. 66452-DNI9), and Texas State University Multidisciplinary Internal Research Grant (MIRG) (Award No. 9000003028). C. F. would like to acknowledge supports from FEDER funds through the COMPETE 2020 Programme and National Funds through FCT (Portuguese Foundation for Science and Technology) under the projects UID- B/05256/2020, UID-P/05256/2020 and MIT-EXPL/TDI/0038/2019-APROVA-Deep learning for particle-laden viscoelastic flow modelling (POCI-01-0145-FEDER-016665) under the MIT Portugal program.
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The authors attest that all data for this study are included in the paper.