## Abstract

Integrated computational materials engineering (ICME) models have been a crucial building block for modern materials development, relieving heavy reliance on experiments and significantly accelerating the materials design process. However, ICME models are also computationally expensive, particularly with respect to time integration for dynamics, which hinders the ability to study statistical ensembles and thermodynamic properties of large systems for long time scales. To alleviate the computational bottleneck, we propose to model the evolution of statistical microstructure descriptors as a continuous-time stochastic process using a non-linear Langevin equation, where the probability density function (PDF) of the statistical microstructure descriptors, which are also the quantities of interests (QoIs), is modeled by the Fokker–Planck equation. We discuss how to calibrate the drift and diffusion terms of the Fokker–Planck equation from the theoretical and computational perspectives. The calibrated Fokker–Planck equation can be used as a stochastic reduced-order model to simulate the microstructure evolution of statistical microstructure descriptors PDF. Considering statistical microstructure descriptors in the microstructure evolution as QoIs, we demonstrate our proposed methodology in three integrated computational materials engineering (ICME) models: kinetic Monte Carlo, phase field, and molecular dynamics simulations.

## 1 Introduction

Integrated computational materials engineering (ICME) models are powerful tools to study the process–structure–property linkages at multiple length and time scales. Numerous ICME models have been developed to study material systems from quantum to continuum scales. Quantifying local distributions of microstructures or configurations and mapping statistical ensembles to the measurable properties at a larger scale become an important methodology in the simulations of multiscale systems. Describing microstructures statistically and modeling their evolution have been one of the important topics in computational materials science. Simulation and prediction of long-term behaviors of materials are important for many problems, such as in predicting high-cycle fatigue where voids are slowly nucleated and grown over a long time, and nuclear material degradation which may take place for decades. However, the expensive computational cost of ICME models usually does not allow us to simulate the microstructure evolution over a long period of time. Although some methods to accelerate molecular dynamics (MD) and kinetic Monte Carlo (kMC) simulations have been developed, more generic and fast predictions of statistical quantities of interests (QoIs) have not been studied.

Here, we propose a stochastic modeling approach to accelerate the predictions of statistical QoIs in material evolution, specifically the statistical descriptors of microstructures such as grain size distribution of polycrystalline solids and chord-length distribution of multi-phase liquids. Mathematically, the propagation of statistical distributions can be modeled as continuous-time stochastic processes. We apply the Kramers–Moyal expansion to model the evolution of probability density function (PDF) of the QoI along time. When only the first and second orders of the expansion are considered, the model is simplified to the well-known Fokker–Planck equation. The Fokker–Planck equation is a deterministic partial differential equation of PDFs, which is equivalent to the stochastic differential equation usually employed to model Langevin dynamics. The dynamics of a statistical QoI is modeled by the Fokker–Planck equation as a one-dimensional continuous-time stochastic process, where the drift and diffusion coefficients in the equation are estimated and calibrated using the available time-series dataset, collected from ICME simulations for some period of time. Once calibrated, the Fokker–Planck equation can be applied to predict the dynamics of PDFs for the statistical QoIs directly and very efficiently for a much longer time period, instead of continuously relying on the actual ICME simulations. Thus the Fokker–Planck equation can be regarded as a stochastic reduced-order model (ROM) in our approach. The advantage of the proposed stochastic ROM approach is that the predictions of statistical QoIs in the simulated material systems can be significantly accelerated because the time-scale used in the Fokker–Planck equation can be higher than the ones in the ICME model, whichever ICME model has been used to generate the training dataset.

The parameters of the stochastic ROM are the drift and diffusion coefficients. They need to be calibrated based on the ICME models. With well-calibrated parameters, the stochastic ROM can predict the evolution of QoIs. In this work, the distributions of QoIs from the ICME models as time series are divided into two time periods or stages. The data from the first stage are used to train and calibrate the ROM, whereas the second stage is used to test the ROM performance. Several assumptions are made in using the Langevin equation to describe the QoI (cf. Sec. 4.2), which is considered as a continuous-time stochastic process in this paper.

In the rest of this paper, we denote *x*(*t*) as a one-dimensional QoI in an ICME model, where *x*(*t*) is considered as a continuous-time stochastic process and modeled using the Langevin equation. The outline of the paper is as follows. Section 2 reviews the literature on stochastic reduced-order model methods and scale-bridging methods for computational materials science. Section 3 provides the background and derivation of the Fokker–Planck equation. The mathematical foundation of the proposed methodology and the numerical procedure in applying the proposed method is described in Sec. 4. Section 5 provides three examples of kinetic Monte Carlo (kMC), phase field (PF), and molecular dynamics (MD) simulations. The advantages and limitations of the proposed methods are discussed in Sec. 6. Section 7 concludes the paper.

## 2 Literature Reviews

### 2.1 Stochastic Reduced-Order Models.

Some researchers have applied the Fokker–Planck equation as the stochastic ROM for system dynamics simulation. For instance, Grigoriu [1] constructed a stochastic ROM with simple random functions to approximate an arbitrary random function, where statistical discrepancy are minimized. Sarkar et al. [2] applied the method of Grigoriu [1] to quantify uncertainty in a corroding system and compare against sampling-based approaches. Mignolet and Soize [3] proposed another stochastic ROM for both model and parameter uncertainty in a stochastic finite element approach. This nonparametric approach accounts for both model and parameter uncertainty, compared to only parameter uncertainty in Ref. [4]. Goudon and Monasse [5] modified the Lifshitz–Slyozov equation based on the Fokker–Planck equation and demonstrated the approach with polymer systems. Ganapathysubramanian and Zabaras [6] proposed a data-driven ROM and solved it through stochastic collocation and variational methods for thermal diffusion in random heterogeneous media applications. Bhattacharjee and Matouš [7] proposed a ROM based on Isomap, a non-linear manifold learning technique, in concert with neural networks, for heterogeneous hyperelastic materials. Latypov and Kalidindi [8] proposed a ROM based on two-point statistics correlation for two-phase composite materials.

### 2.2 Scale-Bridging Methods.

Multiscale methods aiming at solving multi-physics problem multiple length-scale are available. For example, Yotov et al. [9,10] proposed and implemented the mortar finite element method for second-order elliptic equations. Engquist and collaborators [11,12] proposed the heterogeneous methods to efficiently approximate the macroscopic state of the system when the microscopic model is readily available. Variational methods is another approach that has received much attention in solving computational fluid dynamics problems, for instance, Refs. [13–15]. Another approach that couples atomistic to continuum level is the atomistic-to-continuum methods [16–18], which has found many users in computational materials science. However, all of the aforementioned methods are rather limited to multiple length-scale, leaving time-scale issues unresolved.

## 3 Background of the Fokker–Planck Equation

In this section, the background of uncertainty propagation using the Kramers–Moyal expansion and Fokker–Planck equation is provided, as it is the backbone of our proposed methodology.

The one-dimensional (1D) non-linear Langevin equation for stochastic variable *x*(*t*), which is the QoI in this paper, provides the preliminary context which gives rise to the Fokker–Planck equation. Following the notation of Risken [19] and Frank [20], the 1D non-linear Langevin equation reads

*t*) is assumed to be a Gaussian distributed, white noise term with zero mean and

*δ*correlation function [21], i.e.,

*δ*(

*t*−

*t*′) is the Dirac-delta function, and the intensity of noise is 1 by convention [22]. If

*h*(

*x*,

*t*) = 0 and

*g*(

*x*,

*t*) = 1, Eq. (1) describes the Wiener process. There are two alternative ways to interpret the drift and diffusion coefficients of the Fokker–Planck equation. The first is based on Itô calculus and the second is based on Stratonovich calculus, depending on the existence of spurious or noise-induced drift [19]. For Itô calculus,

*D*

^{(1)}(

*x*,

*t*) =

*h*(

*x*,

*t*), whereas for Stratonovich calculus,

*D*

^{(1)}(

*x*,

*t*) =

*h*(

*x*,

*t*) + (∂

*g*(

*x*,

*t*)/∂

*x*)

*D*

^{(2)}(

*x*,

*t*). For both Itô and Stratonovich calculus,

*D*

^{(2)}(

*x*,

*t*) =

*g*

^{2}(

*x*,

*t*). With Itô calculus, Eq. (1) can be rewritten as

*dx*(

*t*) =

*h*(

*x*,

*t*)

*dt*+

*g*(

*x*,

*t*)

*dW*

_{t}, where

*W*

_{t}is the Wiener process, which is the source of randomness [22].

*x*(

*t*) is the QoI dependent on time, also referred to as the state variable. We restrict

*x*(

*t*) to a scalar, even though it can be generalized for higher dimensional QoIs. In the scope of this paper, the Itô calculus is used to interpret the stochastic process in Eq. (1), that is,

*W*(

*t*),

*t*≥ 0, is a scalar Wiener process, and

*D*

^{(1)}(

*x*,

*t*) and

*D*

^{(2)}(

*x*,

*t*) are the drift and diffusion coefficients, respectively. The stochastic process described in Eq. (1) is a Markov process with

*δ*-correlated force; this Markov property is destroyed if Γ(

*t*) is no longer

*δ*-correlated [19].

*f*(

*x*,

*t*) associated with stochastic variable

*x*(

*t*) can be written as [19]

*n*is the number of truncated terms in Kramers–Moyal expansion and

*D*

^{(j)}(

*x*,

*t*) is the Kramers–Moyal expansion coefficient. The Pawula’s theorem [19] states that the Kramers–Moyal expansion may stop either after the first term or after the second term; if it does not stop after the second term, then it must contain an infinite number of terms. With only the first two terms, the Kramers–Moyal expansion is reduced to the Fokker–Planck equation as

*D*

^{(1)}and

*D*

^{(2)}are the drift and diffusion coefficients, respectively, and both are spatio-temporal functions.

Assume the evolution of the stochastic variable *x*(*t*), which is the QoI, can be modeled by the Fokker–Planck equation. This assumption holds if *x*(*t*) obeys a Langevin equation with Gaussian *δ*-correlated noise, it can be shown that all coefficients other than drift and diffusion coefficients vanish, and the Kramer–Moyal expansion simply reduces to the Fokker–Planck equation (cf. Ref. [19, Sec. 1.2.7]). The Fokker–Planck equation is sometimes referred to as the backwards or second Kolmogorov equation in the literature [23]. Furthermore, for the case of one stochastic variable QoI *x*(*t*) in Eq. (1), it is always possible to convert the Langevin equation from a multiplicative noise force *g* to an additive noise force, through a transformation of variable (cf. Ref. [19, Sec. 3.3]).

### [21,24,25]

*Denote the*

*nth central moment as*

*M*

^{(n)}(

*x*,

*t*),

*then*

It is noted that Theorem 1 follows from the Taylor series expansion in deriving Kramers–Moyal expansion. The evolution of mean and variance can be modeled through the two following corollaries:

*The expectation of*

*x*(

*t*),

*denoted as*$E[x(t)]$,

*D*

^{(1)}(

*x*,

*t*) =

*D*

^{(1)}(

*t*), then

*Assume that the drift coefficient is a temporal function, i.e.,*

*D*

^{(1)}(

*x*,

*t*) =

*D*

^{(1)}(

*t*),

*then the variance of*

*x*(

*t*),

*denoted as*$Var[x(t)]$,

*D*

^{(2)}(

*x*,

*t*) =

*D*

^{(2)}(

*t*), then

## 4 Methodology

In this section, the proposed stochastic ROM method with time upscaling is introduced. The organization of this section is as follows. Section 4.1 presents the problem statement in terms of mathematics, along with the assumptions in Sec. 4.2. In Sec. 4.3, we introduce the calibration and training procedure for the proposed stochastic ROM. Section 4.4 describes a numerical treatment via Tikhonov regularization to prevent divergent solutions. In Sec. 4.5, the finite difference method in solving the Fokker–Planck equation is discussed.

### 4.1 Problem Statement.

Figure 1 provides a schematic overview of the proposed stochastic ROM method. First, the ROM needs to be trained, where the drift and diffusion coefficients are calibrated using the predicted QoIs from ICME models of materials. After test and validation with additional materials simulation data, the stochastic ROM then can predict the uncertainty propagation efficiently with a much longer time-step than the time-steps used in the ICME models.

### 4.2 Assumptions.

Several assumptions are made in the ROM formulation. The assumptions are explained and justified as follows.

First, we assume that there is sufficiently enough data to train the ROM. The dataset can be obtained either experimentally through data acquisition or computationally through running simulations repetitively on a high-performance computing platform. Some notable work to estimate the drift-diffusion parameters from experimental and computational time-series dataset include noisy electrical circuit [26], stochastic dynamics of metal cutting [27–29], meteorological data for sea surface wind [30,31], to name a few. Interested readers are referred to the review paper of Friedrich et al. [32] for extensive multi-disciplinary applications across many scientific and engineering fields. Here, we applied and extended the method into the field of computational materials science with applications to microstructure evolution.

Second, we assume that the noise associated with QoIs are *δ*-correlated, in order to preserve the Markov property of the Langevin model. Furthermore, the noise is also assumed to be independent of the QoI *x*(*t*). In practice, the noise is not strictly *δ*-correlated, which results in Markov–Einstein time *τ*_{ME}, such that for sampling intervals *τ* < *τ*_{ME}, the Markov property does not hold [21]. However, there has been proofs that the Markov assumption is a valid assumption, for example, in the field of fluid mechanics with small-scale turbulence [23]. Some recent efforts are also noted in adopting the Fokker–Planck equation for the non-Markovian process [33,34].

Third, we assume that the drift and diffusion coefficients are a function of time, *D*^{(n)}(*x*, *t*) = *D*^{(n)}(*t*). In the literature, both time-independent [35] and time-dependent coefficients [36] have been studied. Generally speaking, the drift and diffusion coefficients do not have to be a temporal function but can be a spatio-temporal function, i.e., no restriction. This assumption is simply made for the computational convenience of the analytical calibration approach described in Sec. 4.3 but can simply be removed if the optimization approach in Sec. 4.3 is considered, as the optimization problem is considered as a black-box function and does not impose any restriction on the parameterization of the coefficients. This assumption only applies for the analytical approach, but not the optimization approach, during the calibration process for drift and diffusion coefficients.

Fourth, the proposed ROM is purely data-driven and does not have any underlying physical assumption. The purpose of this assumption is to retain the generalization of the proposed ROM to a wide range of ICME applications, without being restricted to a certain set of problems. However, it is possible to impose some physical conditions on the drift- and diffusion-coefficient, if it is desirable. The optional choice of imposing physical constraints depends on specific applications and is left to users. If the physical constraints are added, the ROM considered becomes a physics-constrained machine learning model, which is more restricted than the ROM considered in this paper.

Finally, we assume that only one QoI *x*(*t*) is considered for a ROM. Theoretically, high-dimensional Fokker–Planck equations exist; practically, the Fokker–Planck equation is often solved in 2D, 3D [37], 4D [38], and 6D [39], where the dimensionality corresponds to the spatial dimensionality of the Fokker–Planck equation. State-of-the-art mesh-based methods to solve Fokker–Planck equation through finite difference and finite element methods are severely limited by the curse of dimensionality for more QoIs, which has a profound effect on the computing memory and speed. This assumption can be considered for computational convenience, opening up for future works to include more QoI in the same ROM. The assumptions for ROM construction are summarized as follows.

#### Mathematical

*The noise associated with the QoIs are δ-correlated and independent of the QoIs, which preserves the Markov property.*

#### Modeling

*There are sufficiently enough data, either experimentally or computationally, to train the ROM.*

#### Modeling

*The drift and diffusion are functions of time,**D*^{(n)}(*x*, *t*) = *D*^{(n)}(*t*), *but not a function of**x*.

#### Modeling

*The general ROM is data-driven, where there is no physical constraints on the drift and diffusion, although certain parameterization may be imposed depending on specific applications.*

#### Modeling

For each ROM, only one QoI is considered.

Mousavi et al. [40] and Anvari et al. [41] discussed in great detail regarding the Markov–Einstein time-scale *τ*_{M}—the minimum time interval over which the time series *x*(*t*) can be considered as a Markov process. Friedrich et al. [32] (cf. Sec. 2.2.5) also pointed out that the Markovian property is “usually violated by many physical systems.”

### 4.3 Training Drift and Diffusion Coefficients in the Fokker–Planck Equation.

The PDF for a QoI is numerically propagated along time, using the Fokker–Planck equation with calibrated coefficients and initial conditions. There are two important elements in constructing the stochastic ROM. First, the Fokker–Planck equation coefficients must be trained. Second, the initial conditions must be constructed with numerical stability consideration. The Fokker–Planck equation then can be solved using the calibrated coefficients and the initial conditions, and the QoI evolution along time can be predicted. During the training, the initial PDFs and evolution of PDFs associated with the QoIs are obtained by running the original materials simulation models, and the drift and diffusion coefficients are calibrated based on the training PDFs. After calibration, the ROMs can be used to predict the PDFs of QoIs for longer periods of time, independently from the original material models. We propose two approaches to calibrate the stochastic ROM.

The first approach to analytically train or estimate coefficients is based on Corollaries 1 and 2 using linear regression. Based on Theorem 1, the coefficients can be estimated from the time-series dataset. However, in practice, direct application of Theorem 1 faces challenges from both spatio and temporal dimension. First, the number of QoI observations is often not sufficient in practice to approximate the central moments well enough along the spatial dimension. Second, the sampling time is often sparse along time dimension. As a result, numerical estimations based on derivatives to approximate coefficients, based on Theorem 1, are often noisy and oscillatory, creating numerical challenges to construct the ROM model. Observe that the QoI can be noisy in both temporal and spatial dimensions.

One approach to reduce the effect of noise is to perform homogenization for the spatial variable *x* in the coefficients and simplify the coefficients as functions of time *t* only, i.e., *D*^{(n)}(*x*, *t*) = *D*^{(n)}(*t*). Excluding the spatial variable implies that the drift and diffusion coefficients are assumed to constant throughout the modeled spatial domain. This assumption is reasonable if the spatial domain is small. Materials distribution can be stable if a clear trend with respect to time is observed and can be anticipated in the future. The analytical approach built on Corollary 1 and Corollary 2 states that if the first two central moments are well constructed, then the drift and diffusion coefficients can be approximated by linear regression of mean and variance with respect to time, respectively. The numerical approximations converge to the central moment in probability by the weak law of large number [42].

A further challenge comes from the data used in calibration. It has been shown that the sampling rate of data is important in estimating the Fokker–Planck drift and diffusion coefficients from time-series data in previous studies (for example, [21,24,25,43–46]). The data with low sampling rates are insufficient to estimate the coefficients accurately.

To circumvent the challenging problems posed by the first calibration approach, we also propose a second calibration approach to bypass the technical challenge of sampling rate by solving an inverse problem. Here, the coefficients are first parameterized, then optimized by minimizing the difference between the simulated and calibrated PDFs. The coefficients are trained and calibrated based on the minimization of a loss function. Compared to the first analytical approach described above, the second numerical approach is more robust.

*t*=

*τ*for a fixed time-step

*τ*or

*t*=

*τ*

_{j}for multiple time-steps $(\tau j)j=1n$, where

*d*(·, ·) is the distance between two PDFs. The distances

*d*(·, ·) can be defined in different ways, such as ℓ

^{p}norms, Kullback–Leibler divergence, Wasserstein distance, or others [47]. In this work, the Kullback–Leibler divergence is used to measure the difference between two PDFs as

*q*(

*x*) in Eq. (16) to avoid dividing by zero error.

### 4.4 Regularization for Initial Conditions.

After the ROMs are calibrated, the evolution of PDFs for QoIs can be efficiently simulated with much longer time-steps. The numerical stability of the ordinary differential equations however is sensitive to the initial PDFs. “Noisy” empirical PDFs of QoIs from ICME models need to be processed with regularization, before they are used as the initial conditions in solving the Fokker–Planck equations.

*λ*is the regularization parameter, and

*d*is the order of derivative.

*Q*can be expressed as

**E**is the derivative matrix of an arbitrary order,

**M**is the mapping matrix,

**B**is the midpoint rule integration matrix, $B~$ is a subset of

**B**, and

**F**= diag(

**f**) is the observation matrix in diagonal form. In this paper, we have found that setting

*λ*= 10

^{−6}and

**E**as the second derivative matrix has worked well.

*Q*/∂

*f*= 0, we obtain the simplest form of smoothing by regularization as

### 4.5 Finite Difference Fokker–Planck Equation Solver.

Numerically, the Fokker–Planck equation is commonly solved in two ways, finite element method and finite difference method. Both of them suffer the curse of dimensionality for large problems with high-dimensional state space. However, in the scope of this paper, only 1D stochastic process is concerned. Finite difference method is applied here.

The numerical implementation of the finite difference method is developed based on the algorithm of Hassan et al. [49], which calculates derivatives of any degree with any arbitrary order of accuracy over a uniform grid. The Fokker–Planck equation in Eq. (5) is discretized and implemented with a matrix form as

**f**(

*x*,

*t*=

*τ*

_{0}),

**E**

^{(1)}and

**E**

^{(2)}are the first and second derivative matrices, respectively, with a specified order of accuracy, and

*D*

^{(1)}(

*x*,

*t*) and

*D*

^{(2)}(

*x*,

*t*) are the parameterized drift and diffusion coefficients, respectively.

## 5 Applications and Demonstrations

In this section, the proposed ROM is demonstrated using three examples: kMC simulation in Sec. 5.1, PF simulation in Sec. 5.2, and MD simulation in Sec. 5.3. In the kMC example, the grain growth is simulated with a hybrid Potts-phase field model. The selected QoI is the grain area. In the PF example, the evolution of Fe–Cr microstructures and phases is simulated. The QoI is the chord-length. In the MD example, a simple liquid argon system is simulated, and the selected QoIs are the total mean-displacements and enthalpy of the simulation cell. In these examples, the ICME models are ran for a period of time. Then, the QoI PDFs are obtained by post-processing the simulations. A beginning portion with respect to time of the PDFs collection is considered as the training dataset, as illustrated in Fig. 1, whereas the last portion of this PDFs collection is considered as the testing dataset. The comparison between the evolving PDF of the Fokker–Planck equation after calibrated using the training dataset and the last PDF in the testing dataset is a reasonable measure on the numerical performance of the proposed stochastic ROM method.

### 5.1 Kinetic Monte Carlo Simulation: Hybrid Potts-Phase Field Simulation for Grain Growth.

*λ*= 0.3,

*C*

_{1}= 0.25,

*C*

_{2}= 0.75,

*C*

_{3}= 0.05,

*C*

_{4}= 0.95, and

*α*= 0.5. A computational domain of 5000 pixel × 5000 pixel is used to perform the 2D grain growth kMC simulation. Fifty kMC simulations are performed for 20,000 Monte Carlo steps (mcs), where 40 Monte Carlo events are observed, with the last microstructure obtained at 16,681.1 mcs. Time-step in kMC is an exponentially distributed random variable [53], which is inversely proportional with the rate constant and typically generated from the uniformly distributed seed.

Exploiting the fact that the log-normal distribution is one of the most common used distribution to characterize the grain size, as well as the fact that in the kinetic Monte Carlo method, time-step is an exponentially distributed random variable, we apply the log transformation to both grain size and time, i.e., *x* → log *x*, *t* → log *t*, before modeling the QoI (i.e., *x* after transformation) using the Fokker–Planck equation.

The drift and diffusion coefficients are calibrated using the first analytical approach based on Corollaries 1 and 2.

The initial and training PDFs are constructed using the kernel density estimation method with the normal kernel distribution. The selected bandwidth is optimal for the normal kernel density [54]. The Tikhonov regularization, described in Sec. 4.4, is applied to the initial PDF to reduce the chance of numerical divergent for the Fokker–Planck solver. The Monte Carlo events from 46.5 mcs to 599.5 mcs are used as the training dataset, while the Monte Carlo events from 744.375 mcs to 16681.1 mcs are used as the testing dataset. Using the first approach described in Sec. 4.3, by simple linear regression, we obtained *D*^{(1)}(*t*) = 0.7320, whereas *D*^{(2)}(*t*) = −0.02931.

Figure 2 shows the evolution of microstructure using kMC. In parallel, Fig. 5 shows the evolution of the QoI using the Fokker–Planck equation at the same steps with Fig. 2, where the QoI is the log of the grain area based on the kMC simulation results. Figure 3 presents the evolution of the original grain area, showing a diffusion-dominant type of Fokker–Planck equation. Figure 4 shows the evolution of the log of the grain area, where densities are closer to Gaussian distribution. The first training PDF is at 46.5 mcs, the last training PDF is at 599.5 mcs, and the last testing PDF is at 16,681.1 mcs. The calibrated density is the last PDF used to train the stochastic ROM. The final density denotes the last PDF, obtained from direct simulations, to evaluate the performance of the trained stochastic ROM and is not used for training the stochastic ROM.

Figure 5 shows the solution of the Fokker–Planck equation after the log transformation at three different snapshots: Fig. 5(a) at the beginning of the training dataset, Fig. 5(b) at the end of the training dataset, and Fig. 5(c) at the end of the testing dataset. Excellent agreement with the testing dataset is obtained.

From the solution of the Fokker–Planck equation shown in Fig. 5, we apply rejection sampling algorithm to draw samples and reconstruct the PDF of the grain size, by inverting the log transformation, i.e., *x* → exp(*x*). Figure 6 shows the comparison between the reconstructed PDFs from Fokker–Planck solution and the original PDFs from SPPARKS. Figure 6(a) shows the beginning of the training dataset. Figure 6(b) shows the end of the training dataset. Figure 6(c) shows the end of the testing dataset. It is observed that even though the Fokker–Planck solution has a longer tail distribution, in general, the last testing PDF at 16,681.1 mcs agrees relatively well with the reconstructed PDF from the Fokker–Planck solution. This demonstrates if the Fokker–Planck coefficients are well-trained, a prediction about the evolution of the microstructural descriptor using the trained ROM can be made with a good level of accuracy.

### 5.2 Phase Field Simulation: Spinodal Decomposition.

^{2}is used to demonstrate the approach. The PF simulation is used to model the spinodal decomposition of an Fe–Cr alloy on 250 nm × 250 nm computational surface domain at 500 °C over a period of 7 days (604,800 s) in physical time. The system is described by the Cahn–Hillard equation with no external energy source as

*c*is the mole fraction of Cr (dimensionless),

*M*(

*c*) is the mobility of Cr (m

^{2}mol/J s),

*f*

_{loc}(

*c*) is the free energy density (J/mol), and

*κ*= 8.125 · 10

^{−16}is the gradient energy coefficient (J m

^{2}/mol). The mobility term

*M*(

*c*) and the free energy term

*f*

_{loc}(

*c*) are, respectively, described by

*g*(

*c*) is described by

f variable | f value | Cr variable | Cr value | Fe variable | Fe value |
---|---|---|---|---|---|

A_{f} | −24,468.31 | A_{Cr} | −32.770969 | A_{Fe} | −31.687117 |

B_{f} | −28,275.33 | B_{Cr} | −25.8186669 | B_{Fe} | −26.0291774 |

C_{z} | 4167.994 | C_{Cr} | −3.29612744 | C_{Fe} | 0.2286581 |

D_{f} | 7052.907 | D_{Cr} | 17.669757 | D_{Fe} | 24.3633544 |

E_{f} | 12,089.93 | E_{Cr} | 37.6197853 | E_{Fe} | 44.3334237 |

F_{f} | 2568.625 | F_{Cr} | 20.6941796 | F_{Fe} | 8.72990497 |

G_{f} | −2345.293 | G_{Cr} | 10.8095813 | G_{Fe} | 20.956768 |

f variable | f value | Cr variable | Cr value | Fe variable | Fe value |
---|---|---|---|---|---|

A_{f} | −24,468.31 | A_{Cr} | −32.770969 | A_{Fe} | −31.687117 |

B_{f} | −28,275.33 | B_{Cr} | −25.8186669 | B_{Fe} | −26.0291774 |

C_{z} | 4167.994 | C_{Cr} | −3.29612744 | C_{Fe} | 0.2286581 |

D_{f} | 7052.907 | D_{Cr} | 17.669757 | D_{Fe} | 24.3633544 |

E_{f} | 12,089.93 | E_{Cr} | 37.6197853 | E_{Fe} | 44.3334237 |

F_{f} | 2568.625 | F_{Cr} | 20.6941796 | F_{Fe} | 8.72990497 |

G_{f} | −2345.293 | G_{Cr} | 10.8095813 | G_{Fe} | 20.956768 |

In this example, the QoI is the chord-length distribution, which is another statistical microstructural descriptor. Chord-length distribution is also an anisotropic statistical microstructure descriptor [56] to characterize multi-phase microstructures. At a particular angle, a set of parallel lines are drawn, where intersections with the microstructure are recorded. Segments corresponding to the same phase are collected and measured by their lengths, which then characterizes a microstructure sample. The training and initial PDFs are also constructed using the kernel density estimation method with the normal kernel distribution. The optimal bandwidth is selected for the normal kernel density [54]. The initial PDF is regularized using the Tikhonov regularization as described in Sec. 4.4 to reduce the probability of divergence Fokker–Planck solution.

The training dataset is the PDFs collection from 0 steps to 1700 steps, where the rest, which are the PDFs collection from 1700 steps to 2405 steps, is the testing dataset. Due to the noise and instability observed at the beginning of the simulation, the training dataset is truncated from 1400 steps to 1700 steps to exclude extreme variations at the beginning of the PF simulation. This variation is typically observed in many dynamic ICME models, including MD and PF simulations.

Figure 7 shows the microstructure evolution of Fe–Cr spinodal decomposition simulations. The system is modeled using the Cahn–Hilliard equation with no external energy sources. The initial concentration of Cr is randomly generated within the interval [44.774%, 48.774%] with the expectation of 46.774%. The coarsening effect is observed, and the clusters slowly expand as the simulation advances.

The coefficients of the Fokker–Planck equation are calibrated using the Bayesian optimization method [57–60]. Here, the drift and diffusion coefficients are parameterized in a polynomial approximation manner with low-degree polynomials, and the batch-parallel Bayesian optimization is applied to minimize the Kullback–Leibler divergence between the training PDF and the predicted Fokker–Planck PDF, as KL(*p*_{training}‖*p*_{predicted}), where the *p*_{predicted} is obtained from solving the forward Fokker–Planck equation with certain coefficients. The optimized drift and diffusion coefficients are 0.001328 and 0.001928, respectively (Fig. 8).

Figure 9 presents the evolution of the calibrated Fokker–Planck equation to capture the evolution of QoI. The first training PDF is at 1400*τ* steps, the last training PDF is at 1700*τ* steps, and the last testing PDF is at 2405*τ* steps. Figure 7(d) shows the comparison between the PDF obtained by calibrated and trained Fokker–Planck equation and the testing PDF from the ICME model, which is the PF simulation in this case.

### 5.3 Molecular Dynamics Simulation: Equilibrium Liquid Argon.

In this example, MD simulation of liquid argon at 85 K is performed using LAMMPS [61] to assess the total mean-square displacement and enthalpy of the simulated system. The system consists of 4000 atoms, where the interatomic potential is described by Lennard–Jones model with uncertain well-depth *ɛ* and well-location *σ*. Different *ɛ* and *σ* for argon have been used in the literature, for example, Refs. [62–67]. The uncertain *ɛ* and *σ* here are modeled with truncated normal distributions. The mean *μ*, the standard deviation *σ*, the support lower and upper bounds are (0.2383, 0.0667, 0.2376, 0.2390) for *ɛ* and (3.4000, 0.6670, 3.3000, 3.5000) for *σ*, respectively. The microcanonical ensemble (NVE) is used, where a Langevin thermostat is also used to couple with the system. The Langevin thermostat, which has a random noise generator [68], can be thought of a source for aleatory uncertainty. A total of 25,062 equilibrium simulations are performed, and the QoIs are analyzed using log files of the simulation. The sampling time is 50 fs, the time-step is 1 fs, and the total simulation time is 20 ps. The collection of PDFs from 0 ps to 160 ps is used as the training dataset, whereas the collection of PDFs from 160 ps to 200 ps is used as the testing dataset. Due to the instability of the MD simulation, only a part of the training dataset from 140 ps to 160 ps is used to exclude the noise.

Monte Carlo sampling is used to assess the *a posteriori* distribution of the QoIs. The training and initial PDFs are reconstructed using kernel density estimation method with the normal kernel distribution. The selected bandwidth is optimal for the normal kernel density [54]. For the enthalpy, Laplace approximation, which is equivalent to moment matching, is used to fit a Gaussian distribution instead of kernel density estimation. The initial PDF for total mean-square displacement is regularized using the Tikhonov regularization as described in Sec. 4.4 to avoid divergent solutions in solving the Fokker–Planck equation (Figs. 10 and 11).

Figures 12 and 13 show the evolution of the QoIs’ PDFs in different snapshots at 140, 160, 180, and 200 ps. The first training PDF is at 140 ps, the last training PDF is at 160 ps, and the last testing PDF is at 200 ps. In Fig. 13, all the PDFs are fitted to normal distributions instead of approximating by the kernel density estimation method. The Fokker–Planck coefficients for the total mean-square displacement are trained by minimizing the Kullback–Leibler divergence, whereas the coefficients for enthalpy are trained using Corollary 1 and Corollary 2. For the total mean-square displacement, the optimized drift and diffusion coefficients are 5.4309 − 1.3013 *t* and 7.3099 + 10^{−9}*t*, respectively. For the enthalpy, the optimized drift and diffusion coefficients are 3.06 and −23.525, respectively. The comparison between the testing PDF and evolved Fokker–Planck PDF shows a fairly good agreement after the Fokker–Planck coefficients are calibrated.

## 6 Discussions

We compare the computational cost on a AMD A10-6700 CPU at 3.7GHz on Ubuntu 18.04 platform between optimized C/C++ packages for ICME models and a preliminary non-optimized implementation on matlab, where the results are tabulated in Table 2. Efficient and optimized C++ implementation of the proposed ROM in Trilinos [69,70] or PETSc [71,72] would increase the speedup factor significantly. The speedup depends on various factors, such as the computational time of the ICME model (which varies depending on the fidelity of the model), how many times the simulation repeats, and the implementation of ROM (the time-step used in the integrator, as well as the robustness of the partial differential equation linear solver for the ROM). For kMC, 50 large-scale simulations are performed. For PF, one large-scale simulation is performed. For MD, 25,062 small-scale simulations are performed, with each simulation costs around 0.8254 h, because each MD simulation can only sample one trajectory. The proposed ROM is particularly competitive with a large speedup when ICME models are computationally expensive, such as high-fidelity large-scale simulations.

ICME packages | ICME comp. | ROM comp. | Speedup | |
---|---|---|---|---|

kMC | SPPARKS | 76.3486 h | 0.1375 h | 555.2625× |

PF | MOOSE | 97.8667 h | 1.2189 h | 80.2909× |

MD | LAMMPS | 20,686.1748 h | 65.8581 h | 314.1022× |

ICME packages | ICME comp. | ROM comp. | Speedup | |
---|---|---|---|---|

kMC | SPPARKS | 76.3486 h | 0.1375 h | 555.2625× |

PF | MOOSE | 97.8667 h | 1.2189 h | 80.2909× |

MD | LAMMPS | 20,686.1748 h | 65.8581 h | 314.1022× |

In this paper, we present a stochastic ROM that accelerates the uncertainty propagation in materials modeling of system dynamics, by modeling the evolution of uncertain QoIs using the Fokker–Planck equation. The proposed method is demonstrated by both drift-dominated (as in the cases of MD and kMC) and diffusion-dominated (as in the case of PF) examples. It is shown that if the Fokker–Planck equation coefficients are appropriately parameterized and well calibrated, the proposed method has a predictive capacity to estimate the QoI distributions for longer time scales, without using the computationally expensive material simulations.

The diffusion process modeling by the Fokker–Planck equation is used as a stochastic ROM in this paper. If there is no diffusion, then the diffusion coefficient becomes zero, while the drift coefficient is non-zero. We assume that the QoI can be modeled by the diffusion process. However, this is only true for certain applications, such as those driven by diffusion phenomena in materials science, where the materials behaviors are experimentally justified.

In addition, we also assume that the drift and diffusion coefficients are only a function of time and not a function of QoI itself. This assumption conveniently simplifies the estimation of the drift and diffusion coefficients, yet also imposes the same effect regardless of the QoI magnitudes, posing a drawback of the proposed method and a potential future study. Interested readers are referred to the works of Friedrich et al. [21,24,26,32,46] for various applications of stochastic method in time-series and estimating the drift and diffusion coefficients of Fokker–Planck equations in low- and high-sampling rates.

A limitation of the proposed framework is that the solution of the stochastic ROM does not always describe the true solution of the underlying ICME model. The proposed stochastic ROM is purely data-driven and meant to provide an approximate solution to the true solution of ICME models. Furthermore, there is a limitation in extrapolating the stochastic ROM beyond the training regime, as the approximation error between the stochastic ROM and the true solution is expected to grow. The only case where the stochastic ROM can match perfectly with the true solution is when both are Gaussian, and when the mean and variance of the true solution belong to the parameterization of the stochastic ROM.

The efficiency is mainly based on the difference between the time-scale of solving the Fokker–Planck equation and the time-scale of solving the ICME models. The efficiency depends on the cost of solving the Fokker–Planck equation, as well as the cost of simulating the ICME models. The cost of solving the Fokker–Planck equation can be significantly improved by an implicit time-integrator, as well as a pre-conditioner for the Fokker–Planck solver. The time-scale of solving the Fokker–Planck equation depends on the scale of ICME models, which can be large scale in both length-scale and time-scale. Because these two time-scales (one of the ICME models and one of the Fokker–Planck equation) are completely independent, the benefits of using the stochastic ROM depend on many factors. In general, the benefits are maximized if the ICME models are very computationally expensive, and if the Fokker–Planck equation can be efficiently solved. The improvement of Fokker–Planck solver is posed as open questions for further research.

In this paper, only one QoI is considered in a ROM. Building ROMs for coupled or correlated QoIs requires the high-dimensional Fokker–Planck equation, which will be a topic of future study. If the QoIs are structural descriptors, their predicted values can also be used to reconstruct the microstructure. Examples of statistical and deterministic microstructural descriptors are discussed in Refs. [56,73–76]. The statistical descriptors tend to outnumber the deterministic descriptors, due to the random nature of materials. The proposed framework in this paper can potentially be used to predict the evolution of microstructures for a much longer time-scale than the direct simulations can achieve.

The Kramers–Moyal expansion allows one to propagate higher-order moments in theory, but in practice, estimations of the Kramers–Moyal coefficients can also be approached using optimization methods. Furthermore, higher-order derivative terms make the partial differential equation harder to numerically solve for the forward Kramers–Moyal expansion. The proposed method is extensible in two directions. The former extension includes more variables in a high-dimensional Fokker–Planck equation, whereas the later extension would capture the QoIs evolution with smaller approximation errors by including higher-order terms for the same QoI. A caveat of the later extension is that with higher-order derivatives, it is likely that the numerical solution of the forward Kramers–Moyal expansion would diverge at some points, and some numerical treatments are required in order to obtain a convergent solution. A robust implicit integration solver may be required to stabilize in solving higher-order Kramer–Moyal expansion. Tikhonov regularization can also be applied to smooth out the PDFs.

Microstructures, in general, are stochastic by nature and are arguably best described by statistical microstructure descriptors. The proposed Fokker–Planck ROM performs best when the underlying PDF is normal, e.g., in equiaxed microstructures. If the underlying PDF is not normal, then only the evolution of the first two moments of the PDF are captured, where higher-ordered moments, including skewness and kurtosis, are not. This is clearly a limitation of the Fokker–Planck model and opens up the opportunity for future work.

Solid-state phase transformations [77], including martensitic transformation and precipitation reactions, are probably best computationally captured by PF models. Even though the proposed Fokker–Planck method has only been demonstrated with spinodal decomposition using the PF model, it can also be applied to other solid-state phase transformations, as long as the QoI is a homogenized, scalar, and stochastic variable.

Analytically, the solutions to some microstructure descriptors are well known in the field of materials science. In the first case study of kinetic Monte Carlo for grain growth problem, it is known that the average grain size (isotropic grain boundary energies and mobilities) grows as *t*^{1/2}, as described in Ref. [78]. Therefore, the average size $R\xaf(t)$ and the grain size distribution *R*(*t*) are known for any arbitrary time. In the second case study of phase field for spinodal decomposition problem, it is also known that the microstructure during spinodal decomposition is self-similar. That is, the microstructure at one time is related to an earlier time by only a change in the length-scale, which grows as *t*^{1/3}, as described in the work of Tateno and Tanaka [79]. In the third case study of molecular dynamics for liquid argon, the mean-square displacement varies linearly with time with the slope proportional to the diffusion coefficient [80]. However, whether an analytical distribution for mean-square displacement exist is unclear. We would like to emphasize that the mean/average of the QoI is also naturally controlled by the drift coefficient, which we assume to be a function of time in this paper, i.e., *D*^{(1)}(*t*). Such physical insights can be naturally incorporated by modifying this *D*^{(1)}(*t*) term, depending on a specific problem, as the proposed approach could be considered as a physics-informed machine learning approach for microstructure evolution. Lastly, while it is helpful to understand the underlying physics, not every microstructure descriptors are available analytically. Data-driven approaches, in contrast, provide a numerically available solution to problems which have not been solved previously. It is important to emphasize that the prediction obtained by the Fokker–Planck equation does not always agree well with observations, and how well they can approximate is specific to applications of interest. For QoIs that are typically Gaussian, the Fokker–Planck equation may be an excellent model to adopt, and for QoIs that are not typically Gaussian, one may expect a substantial loss of accuracy.

## 7 Conclusion

In this paper, a stochastic ROM method is proposed to solve the time-scale issue of uncertainty propagation in materials modeling. The ROM coefficients are trained either analytically or numerically, so that the evolution of QoIs can be accurately captured using the ROMs. The Ornstein–Uhlenbeck stochastic process is modeled using 1D generalized Langevin equation. With the formulation of Stratonovich calculus, the stochastic variable can be modeled using the Fokker–Planck equation.

Three examples are used to demonstrate the stochastic ROM framework, including kMC, PF, and MD, where the statistical microstructural descriptors are the QoIs. The results show a good agreement between the prediction from the trained ROMs and the direct simulations, when the ROMs are parameterized and calibrated appropriately.

## Footnote

## Acknowledgment

The research was supported by NSF under grant number CMMI-1306996 and the George W. Woodruff Faculty Fellowship. The research cyberinfrastructure resources and services provided by the Partnership for an Advanced Computing Environment (PACE) at the Georgia Institute of Technology are appreciated.

The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.

The authors thank several reviewers for their constructive criticism and valuable comments to improve the paper.

## Conflict of Interest

On behalf of all the authors, the corresponding author declares that there is no conflict of interest within the scope of this paper.

### Appendix: Analytical Solutions of the Fokker–Planck Equation

Here we present two families of Gaussian distribution, with three simple analytical examples that can capture a wide range of phenomenon with increasing complexity. These examples are the analytical solution of the one-dimensional Fokker–Planck equation as in Eq. (5).

*f*

_{1}(

*x*,

*t*), $E[x(t)]=0$ and $D(1)=\u2202E[x(t)]/\u2202t=0$. $Var[x(t)]=2Dt$ and $D(2)=(1/2)$$(\u2202Var[x(t)]/\u2202t)=D$.

*f*

_{2}(

*x*,

*t*) case, $E[x(t)]=\mu t$, $D(1)=\u2202E[x(t)]/\u2202t=\mu $, $Var[x(t)]=\sigma 2$, and $D(2)=(1/2)(\u2202Var[x(t)]/\u2202t)=0$.

*f*

_{3}(

*x*,

*t*), $E[x(t)]=\mu t$, $D(1)=\u2202E[x(t)]/\u2202t=\mu $, $Var[x(t)]=2Dt$, and $D(2)=(1/2)(\u2202Var[x(t)]/\u2202t)=D$.