Abstract
Two-photon lithography (TPL) is a direct laser writing process that enables the fabrication of cm-scale complex three-dimensional polymeric structures with submicrometer resolution. In contrast to the slow and serial writing scheme of conventional TPL, projection TPL (P-TPL) enables rapid printing of entire layers at once. However, process prediction remains a significant challenge in P-TPL due to the lack of computationally efficient models. In this work, we present machine learning-based surrogate models to predict the outcomes of P-TPL to >98% of the accuracy of a physics-based reaction-diffusion finite element simulation. A classification neural network was trained using data generated from the physics-based simulations. This enabled us to achieve computationally efficient and accurate prediction of whether a set of printing conditions will result in precise and controllable polymerization and the desired printing versus no printing or runaway polymerization. We interrogate this surrogate model to investigate the parameter regimes that are promising for successful printing. We predict combinations of photoresist reaction rate constants that are necessary to print for a given set of processing conditions, thereby generating a set of printability maps. The surrogate models reduced the computational time that is required to generate these maps from more than 10 months to less than a second. Thus, these models can enable rapid and informed selection of photoresists and printing parameters during process control and optimization.
1 Introduction
Two-photon lithography (TPL) is at the forefront of nanofabrication technologies under development today, promising true three-dimensional fabrication capabilities with feature resolution below 100 nm [1–3]. TPL systems typically achieve such subdiffraction-limit resolutions by focusing an intense femtosecond laser beam into a small spot within a curable photopolymer material [4]. Focusing on intense femtosecond light enables one to achieve nonlinear two-photon absorption wherein appreciable light absorption and polymerization-based curing occur in only a small region within the focal volume [5]. This powerful capability of TPL has been widely applied to fabricate a variety of functional nano-enabled devices, such as micro-electronics [6,7], microfluidics [8,9], micro-and nanophotonics [10–13], bioscaffolds and grafts [14,15], microrobots [16,17], and mechanical metamaterials [18,19].
Given the resolution and versatility of TPL, there have been substantial efforts to scale up the method in terms of throughput, while also reducing cost per unit mass [20–25]. Conventional TPL is slow due to its point-by-point serial writing scheme. One approach to parallelization-based scale up, demonstrated by our group, uses a layer-by-layer projection-based TPL (P-TPL) writing scheme to effectively overcome the traditional rate-versus-resolution tradeoff. In our past work, we have demonstrated P-TPL printing of nanowires with widths and heights smaller than 175 nm and at throughputs 1000 times higher than serial techniques [20]. We have also demonstrated that it is possible to computationally model the P-TPL process using a finite element (FE) based reaction-diffusion simulation that is capable of handling the time and length scales of P-TPL [26].
For further development of P-TPL, it is crucial to not only advance the printing technology to improve performance and reduce cost but to also accurately characterize the parameter space associated with the printing process. In the absence of reliable characterization data and robust modeling tools, one must rely on iterative trial-and-error printing runs to fabricate the desired features. This imposes a significant barrier to the adoption of the technology for scalable manufacturing of nano-enabled devices. In addition, identifying the process limits is impractical through an iterative empirical approach. For example, choosing the printing conditions and photopolymer material that will produce the highest resolution (i.e., generate the smallest features) is challenging without an accurate model of the process. This is incompatible with a vision of TPL as a mature, scaled-up technology. Computational modeling of the process offers a path toward process scalability.
While we have previously demonstrated the ability to predict the feature size using FE simulations [26], in the present work we approach the modeling problem at a higher level. Here, our aim is to rapidly characterize printability over broad parameter spaces instead of predicting the printed geometry under a specific set of conditions. Physics-based FE simulation is an appropriate choice when prediction of the printed geometry is desired for a specific combination of printing setup and a photoresist material. Under such a specific set of operating conditions, several model parameters can be empirically calibrated to generate an accurate model over a narrow parameter space. However, it is impractical to use FE simulations to model broad parameter spaces wherein multiple parameters can widely vary. This is because it takes several minutes to solve an FE analysis problem for P-TPL. For instance, in this study, we are interested in investigating the process outcome corresponding to 10 input parameters with several parameters varying over multiple orders of magnitude. Producing a process map over such a broad range would require tens of thousands of data points and would take thousands of central processing unit (CPU)-hours using FE modeling on cluster computing. Even if such a simulation were to be performed once, it is impractical to repeat this simulation every time a new process map is desired. Here, we adopt a machine learning (ML)-based surrogate modeling approach to solve this challenge. By training a classification neural network (CNN) on data points generated from FE simulations, we can accurately investigate printability regimes with solve times at least seven orders of magnitude faster than the full FE simulation. The surrogate model is presented in Sec. 2 and the results of the printability mapping are presented in Sec. 3.
2 Modeling Methods
2.1 Projection Two-Photon Lithography.
Here represents the increase in concentration of the primary radicals R*, measured in mol/dm3, over the duration of a single laser pulse. The primary radicals refer to those radicals that are formed from the photoinitiator molecules. The parameter represents the optical dosage delivered during a single laser pulse. The optical dosage is the square of the instantaneous light intensity, integrated over the duration of a single pulse. The intensity of light increases linearly with the optical power of the beam. The physical significance of the optical dosage is that it is proportional to the amount of energy absorbed per unit volume of the material in the two-photon absorption mode. The parameter σ(2) represents the two-photon cross section of the photoinitiator molecule and the parameter Φ represents the quantum yield of radical formation, i.e., it quantifies the fraction of excited molecules that generate the radicals. The parameter h represents Planck's constant and ν is the frequency of the light that illuminates the photoresist material.
In a serial TPL process, the dosage is delivered to the photoresist material sequentially one point at a time. However, in the P-TPL process, an entire 2D light sheet can be illuminated at once, as shown in Fig. 1. A digital micromirror device (DMD) is used to selectively illuminate the desired region, acting as a digital mask. Through the simultaneous spatial and temporal focusing of femtosecond light, the entire 2D image can be brought into focus and a whole 2D layer can be printed at once [20].
2.2 Reaction-Diffusion Model.
Here Eq. (2) represents inhibition of the primary free radical, R*, by oxygen dissolved in the photoresist medium [30]. Those free radicals that are not immediately quenched go on to start polymerization chain reactions, modeled in Eqs. (3) and (4) as the addition of monomer molecules to a growing polymer chain. The primary radicals react with the monomer molecules (represented by P here) in the photoresist material to generate the secondary radicals P*. The secondary radicals continue to react with more monomer molecules to form a growing chain of polymer. The number of secondary radicals is preserved during the growth process as each secondary radical forms a new secondary radical upon incorporation into a polymer chain. The polymer chains keep growing until they are terminated through reaction with an oxygen molecule, per Eq. (5). Other termination mechanisms have been neglected here because oxygen termination has been shown to be the dominant termination mechanism for TPL [31]. The reaction rate constants kq, kp, and kt correspond to the rates of the quenching, polymerization, and termination reactions, respectively.
This system of PDE represents the rate of change of the concentration of the reacting chemical species. In addition to changes in concentration caused by the aforementioned chemical reactions, diffusion of O2 and R* from regions of high concentration to regions of low concentration is also captured through the spatial derivatives in Eqs. (6) and (9). The slower diffusion of other larger molecules is neglected. Thus, Eqs. (6)–(11) govern the reaction-diffusion mechanism underlying photopolymerization in P-TPL. The initial conditions and the parameters for the simulation are provided in Table 1. For these initial conditions, the properties of pentaerythritol triacrylate (PETA) monomer was used here because PETA has been widely used for TPL printing in the past [31,32]. It is noteworthy here that despite these fixed initial conditions, a large set of different photoresist material properties could be simulated here because the reaction rate constants, the concentration of the photoinitiator, and the quantum yield of the photoinitiator were allowed to vary over large ranges.
The reaction-diffusion finite element simulation was performed with the commercially available comsolmultiphysics software package. The PDE system was solved over a given geometry and for a specified amount of time using a direct time-dependent solver. All simulation in this work was performed for a 2D geometry representing the centermost XZ plane when printing a set of nanowires. It is reasonable to use 2D geometry to model the formation of the nanowires due to the inherent symmetry of system along the length of the nanowires. The simulation has been updated from the version described in more detail previously [26], notable changes are discussed below.
The mesh that the equations were solved over has been updated to the one shown in Fig. 2. Instead of a mesh with uniformly sized elements, this mesh uses progressively finer elements near the focal plane. This mesh has been found to increase the accuracy of printing prediction. Mesh convergence was verified by successively refining the mesh; mesh with values of n = 40 were found to be sufficiently fine for simulation of nanowires under most parameter conditions. For challenging parameter regimes (such as at extremely high doses or extremely rapid reactions), a mesh of n = 50 was used.
where kp and kt are real numbers. This allows the rate constants to decrease from their initial value to zero as polymerization progresses and the viscosity of the photoresist increases. Similarly, the diffusivity of oxygen is also modeled to decrease linearly with increasing DOC as the reaction progresses. The diffusivity of R* is approximated as a constant since this species is consumed rapidly at the start of the polymerization.
While the previous version of the simulation was initialized by evaluating the dosage as the product of the square of time-averaged light intensity and a constant pulse duration [26], this version uses dosage values calculated by performing a time-integral of the square of the instantaneous intensity, with integration performed over the duration of the entire pulse. The intensity and pulse durations were evaluated through an optics simulation similar to the one described in the supporting materials of Saha et al. [20]. Since P-TPL uses temporal focusing, dosage values that account for the different pulse durations at different locations are more accurate than dosage values calculated by assuming the same pulse duration at all locations. The model was also updated to account for the consumption of the photoinitiator after each pulse, thereby increasing the accuracy of predictions. Furthermore, the optical parameters (such as magnification and laser repetition rates) were changed to reflect our new P-TPL printing system located at Georgia Tech.
The output of the FE reaction-diffusion simulation is the spatio-temporal evolution of the concentration of each chemical species. From these data, the DOC at the final time is calculated. The DOC refers to the fraction of the monomer that has converted to the polymer; therefore, it characterizes the progress of the polymerization process. From this simulated data, the regions of the photoresist that have cured can be predicted by applying a threshold on the DOC. This thresholding replicates the experimental observation that the photoresist becomes insoluble in solvents when the DOC exceeds the threshold DOC (DOCth). Regions with DOC above this threshold are retained intact after development, i.e., after cleaning the photoresist with solvents. Thus, by applying a threshold, as shown in Fig. 3, the shape and size of the printed regions can be discerned. The width and height are measured from this geometric data.
2.3 Training Data Generation.
To train a neural network (NN) capable of acting as a surrogate model for the FE simulation, it was necessary to first generate a large amount of training data. This was accomplished through the use of the PACE high-performance computing resource at Georgia Tech. Solving a single model takes of the order of ∼5 min on a workstation computer with an Intel® i7-9700K CPU and 64 GB of RAM; this time can be more or less depending on the simulation parameters. While using high performance computing does not necessarily speed up the runtime of a single model, it does allow for parallel solving of multiple models simultaneously. This vastly reduces the time needed to generate a large dataset. After some optimization, it was possible to perform the order of 100 FE model solves per hour, depending on the amount of compute resources allocated.
All simulations for this work were performed with the same projection comprising a set of five lines, with a width of five pixels and a period of 15 pixels on the DMD. Chemical parameters as well as the duration of exposure and the optical power were varied to capture a wide range of conditions. The 10 input parameters that were varied are listed in Table 2, along with the maximum and minimum values they were allowed to take. The parameters “number of pulses” and the “optical power” represent the illumination-based process parameters that quantify the total amount of optical dosage. The parameters [PI] and Φ represent the process parameters that are determined by the photoinitiator, whereas the rest of the parameters represent the material properties of the monomer in the photoresist. Thus, by varying these three sets of process parameters, one may achieve a vast number of different P-TPL process set ups. It is impractical to attempt the mapping of printability over this entire design space using FE simulations. Instead, samples were generated using FE simulations to train the NN models.
A full-factorial sampling scheme was inappropriate for a ten-dimensional parameter space such as this because aiming to sample each parameter at just five levels would require 510 simulations to be performed. This is impractical given the compute time per simulation. Instead, a modified Latin hypercube sampling (LHS) scheme was adopted using the lhsdesign() function built into matlab's statistics and machine learning toolbox. The LHS method divides each parameter range into segments and chooses n random points equally distributed among the segments. As a result, it is guaranteed to be more representative of the sample space than pure random selection, such as Monte Carlo sampling [33]. LHS sampling allows for far more dense sampling than factorial sampling, at the cost of not generating all possible combinations of parameter levels.
While the ranges of the rate constants span eight orders of magnitude, it was thought likely that large portions of this space might not generate any printing. Since there is no benefit to sampling these regions densely, the simulations were performed in batches, with each batch focused on the parameter space deemed to be most promising for printing by the previous batches.
The simulation process was automated through the use of matlab LiveLink for comsolmultiphysics, which allows algorithmic control of the comsol simulation by a matlab script. The generation of parameters, batch model solves, and postprocessing to find the width and height of the resulting print was automated in this way. In total, 7084 model solves were completed. Data augmentation was then performed from these results to increase the size of the dataset that was used for training the neural networks. Due to the nature of the process, it was possible to generate several datapoints from each model solve by varying the DOCth parameter and taking measurements without resolving the model, since the underlying DOC information remains the same. In addition, it was possible to apply physics-based prior process knowledge to augment the data. For example, it is known that if a given set of parameters failed to print, then decreasing the parameters that are positively correlated with printing (e.g., kp-0, power), while keeping the other parameters constant would also lead to no printing. Here, positive correlation means that individually increasing the input parameter will lead to an increase in the amount of printing. Similarly, increasing the parameters that are negatively correlated with printing (e.g., kq, kt-0), while keeping the other parameters constant would also lead to no printing. The converse is true for those datapoints that resulted in runaway printing (i.e., over-polymerization). Because the data was created by a deterministic physics-based simulation, it was known with certainty that increasing or decreasing these parameters would always change the outcome in the same direction; that direction can be deduced from the definition of the parameter and the process equations. In this way, synthetic datapoints were generated with high confidence without performing full model solves. The augmented dataset had 26,769 points, i.e., about four times the number of models solved. The entire training dataset is available elsewhere [34].
2.4 Neural Network Surrogate Modeling.
A surrogate model, or a metamodel, is a computationally inexpensive approximation of a high-fidelity model. Surrogate models are fit to datasets of the type generated here; they are generally not trained on the underlying model physics but solely on input–output relationships. While regression and response surface models have been traditionally popular choices for surrogate models, artificial NNs trained through backpropagation are an increasingly popular choice for high dimensional data [35,36]. Since this is a high dimensional parameter space with large ranges for each parameter, a NN was chosen over a traditional regression model.
An NN is a machine learning method loosely modeled on the human brain [37,38]. A network of nodes is setup where each node receives inputs, assigns them weights, and passes them through a transfer function to generate an output. Neural network topologies can include many thousands of layers of nodes. However, for the purpose of this work, a shallow NN with a few layers will suffice [39]. A CNN applies this principle to classification problems, assigning each input data vector to one of several output classes.
Since a surrogate model inherently has some approximation error, it was not expected that a surrogate model could perform feature size prediction at accuracies comparable to the FE model. Rather, the surrogate model was tasked with predicting whether a given set of printing conditions would result in no printing, printing, or overprinting. These three categories were assigned labels of −1, 0, and 1, respectively. The ground truth labels for the training dataset were generated by analyzing the results of the FE simulations. Prints with connected features or a marked change in aspect ratio were classified as “overprinted.” Prints with no regions above DOCth were categorized as having “no printing.” Prints with intermediate values of DOC and distinct line features were categorized as “printing.” Figure 4 illustrates these categories for a representative value of DOCth = 0.30. The DOC of 0.30 is not achieved at any spatial location in the “no printing” outcome. In the overprinting case, there is runaway polymerization leading to merging of the line features, i.e., a contiguous patch of material with DOC > 0.30 exists between the centers of adjacent line features.
Neural network training was performed in matlab using the fitcnet() function, which allows for the training of a shallow feedforward NN for classification problems. 15% of datapoints were reserved for testing. The rest were used to train a classification neural network with four layers of sizes: 15, 9, 5, and 3. ReLU (rectified linear unit) activation functions were used for the hidden layers and a softmax activation function was used for the output layer. All input and output data were treated as real numbers, including the number of pulses (which only took integer values in the training dataset). k-fold cross-validation with k = 5 was performed to avoid overfitting. The network was trained on an Intel i7-1165G7 system with 16 GB of RAM. It took an average of 4 min to train the net. A schematic representation of the neural network is shown in Fig. 5.
3 Results and Discussion
3.1 Performance of Neural Network.
The NN was tested on the 15% of reserved datapoints, and it was found to predict the correct result 98.1% of the time. This is possibly a slight overestimation of the true performance since slight correlations in the dataset might have been introduced during the data augmentation phase. Nonetheless, this performance is generally considered satisfactory for a classification model of this type [39]. The confusion matrix shown in Fig. 6 demonstrates that the NN excels at correctly identifying overprinting and nonprinting conditions. Accuracy at distinguishing false negatives is slightly lower, with printable conditions being correctly recognized 94.6% of the time. The training dataset contained 39% datapoints of class −1, 9% datapoints of class 0, and 52% datapoints of class 1. As the training dataset consisted of fewer datapoints of class 0 (printing) in comparison to class −1 (no printing), and class 1 (overprinting), so slightly lower accuracy for class 0 was expected.
Although the accuracy of the NN model is lower than that of the physics-based FE simulations, the NN model has a significant advantage in the computational efficiency. The evaluation time for the NN model is almost instantaneous, taking an average of 0.32 μs for a single model evaluation. This is significantly shorter than the several minutes taken to solve even a simple 2D FE simulation, with more complex simulations taking even longer. The rapid solve times enable evaluation of tens of thousands of parameter combinations within seconds, accomplishing the main aim of surrogate modeling. The NN model also has an advantage over the FE simulations in the amount of data storage that is required. Each solved FE simulation generates a vast amount of simulation data (∼100 s of megabyte) that must be parsed to determine the output class corresponding to the printing outcome. In contrast, the NN models directly generate the output class (∼1 byte) for a given set of input parameters (∼100 bytes). Thus, such a surrogate model can be deployed on low-cost computing infrastructure, such as edge computing devices, to perform real-time manufacturing process control and optimization.
3.2 Printability Maps.
Having generated the surrogate model, it is possible to perform some tests to demonstrate the utility of the model. First, printability mapping was performed by varying two reaction rate constants at a time while holding the other parameters constant. The aim is to verify whether the surrogate model is capable of accurately exploring the material design space, thus enabling choosing of reagents with the rate constants required to print under a given set of conditions. To validate this, a random sample of FE simulations was also performed in the same parameter range, to provide a coarse printability map to benchmark against.
The printability map corresponding to varying combinations of polymerization and termination rate constants (i.e., kp−0 and kt−0) is shown in Fig. 7. All other input process parameters were held constant at a randomly chosen set of values. A DOCth value of 0.20 was used. The square symbols represent the results from the NN, from densely sampled points. The triangles represent FE results for comparison. In this parameter range, it is observed that the NN has captured the essential behavior of the process, albeit with inaccuracies. The NN predicts a gradual shift from nonprinting to printing to overprinting as kp−0 is increased and kt−0 is decreased. This is valid according to the FE model; however, the exact boundaries between the different regimes are not perfectly accurate, as represented by some FE model points being in a different NN region. Nonetheless, the NN has predicted a wide printing band and narrow bands for no printing and overprinting in approximately the correct parameter locations. While an FE model would be necessary to quantify the exact boundaries of printability, the NN can be used to qualitatively assess printability behavior over a broad range of input parameters.
Similar results were observed when combinations of polymerization and quenching rate constants (i.e., kp−0 and kq) were varied, as illustrated in Fig. 8. This printability map shows kq versus kp−0 with the other input parameters held constant at a different set of values than in Fig. 7. Over this parameter space, only printing and no printing outcomes are observed. While the NN results do not exactly match the FE results, the NN is nonetheless able to approximately locate the boundary between the printable and nonprintable region. It is interesting to note that despite the inaccuracy in quantitatively locating the boundary, the NN model can qualitatively predict that the boundary gradually shifts toward the top-right corner as the value of kp−0 increases. The dotted circles were drawn in Fig. 8 to highlight this trend. Thus, the NN surrogate model can be an efficient and effective tool for the prediction of printability over a broad material design space.
A comparison of the compute times required to generate the FE and NN predictions for the two printability maps is summarized in Table 3. Despite the FE sampling being much coarser, generation of the FE datasets takes over an hour, compared to under a second for the NN. Furthermore, the FE solve time depends on how challenging the parameter regime is, which results in an exceedingly long solve time of over 7 h for the parameter combination in Fig. 7. In contrast, the NN solve time does not depend on the input parameter combination. Generation of the printability map of Fig. 7 with 100,000 FE simulations would have taken more than 10 months of computation compared to the <1 s taken by the NN model. Thus, the NN surrogate model enables rapid prediction of printability in P-TPL over broad operating regimes, thereby solving a problem that was intractable in the past.
Although our work here has been limited to predicting printability in the P-TPL process, we anticipate that our ML-based approach can also be applied to the printability prediction of other TPL implementations. Our approach is particularly well suited for those TPL implementations that use low repetition rate femtosecond lasers (i.e., rates of ∼1 kHz), such as the high-throughput implementation demonstrated by Ouyang et al. [40]. Our model of photopolymerization can be adapted to such studies with minimal changes due to the similar time scale of illumination and polymerization reactions. To adapt our work to the more prevalent TPL implementations that use high repetition rate lasers (i.e., rates of ∼100 MHz), one must modify the polymerization rate equations to include the illumination term within the differential rate equations. Nevertheless, our work demonstrates that ML-based surrogate modeling is an effective and resource efficient approach to predicting printability over a wide range of operating conditions.
As our goal here was to generate ML-based surrogate models of the physics-based FE models, our ML models were trained exclusively with computational data from the FE models. The FE models were themselves validated against experiments [26], but data from experiments were not used to directly train the ML models. In the absence of validated FE models, it is tempting to train the ML models exclusively on experimental data. However, such an approach is more challenging to implement due to the need for a large amount of experimental data and the presence of experimental uncertainty in the data. More importantly, FE models can generate data over such large operating spaces that cannot be readily accessed through physical experiments with specific systems and photoresist materials. Thus, we anticipate that FE modeling will be an important source of training data for future ML models. Nevertheless, we believe that the accuracy of the ML models can be further improved in the future by fusing the computational dataset with data from physical experiments, as has been demonstrated in ML-based modeling of other AM processes [41].
4 Conclusion
In summary, we have demonstrated that neural network-based surrogate modeling is capable of capturing the essential features of the P-TPL parameter space. We have successfully trained a classification model that is capable of evaluating printability for a set of parameters with solve times of the order of microseconds rather than minutes. While the surrogate model currently lacks the accuracy needed for precise feature size prediction, it nonetheless enables computationally cheap qualitative mapping of printability over broad operating regimes. Therefore, it significantly reduces the barriers to the practical adoption of P-TPL as a high-throughput nanoscale additive manufacturing capability by enabling rapid process planning and optimization. Thus, we anticipate that this work will enable scalable nanomanufacturing of complex three-dimensional structures for a variety of devices with applications in computing, energy, transportation, and human health.
Acknowledgment
This work was supported by the National Science Foundation (NSF) CAREER grant 2045147 and in part through research cyberinfrastructure resources and services provided by the Partnership for an Advanced Computing Environment (PACE) at the Georgia Institute of Technology. This research was previously presented at the 2023 Manufacturing Science and Engineering Conference (MSEC2023).
Funding Data
National Science Foundation (NSF) (Grant No. 2045147; Funder ID: 10.13039/100000001).
Conflict of Interest
The co-authors are inventors on pending and/or issued patent applications on P-TPL for which the intellectual property rights are assigned to either Georgia Tech Research Corporation or Lawrence Livermore National Security, LLC.
Data Availability Statement
The data sets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.
Nomenclature
- A =
kp exponential function shape parameter
- B =
kt exponential function shape parameter
- =
diffusivity of O2
- =
optical dosage per pulse
- DR* =
diffusivity of R*
- DOC =
degree of polymer conversion
- DOCth =
threshold degree of polymer conversion
- h =
Planck's constant
- kp =
polymerization rate constant at a specific degree of polymer conversion
- kp−0 =
polymerization rate constant at the beginning of polymer conversion
- kq =
radical quenching rate constant
- kt =
termination rate constant at a specific degree of polymer conversion
- kt−0 =
termination rate constant at the beginning of polymer conversion
- v =
frequency of light
- =
two-photon cross section of photoinitiator
- Φ =
quantum yield of photoinitiator