Yichen Guo1,Shuyu Qin1,Xinqiao Zhang2,Joshua Agar2
Lehigh University1,Drexel University2
Yichen Guo1,Shuyu Qin1,Xinqiao Zhang2,Joshua Agar2
Lehigh University1,Drexel University2
Scientific discoveries rely on extracting understanding from experiment results; however, data collected regularly outpaces human analysis capabilities. In turn, it is common to use computational methods to extract actionable information from scientific data. A fundamental problem with computational methods are that they are bounded by logical rules and thus cannot apply generalized concepts and sentiment. In materials physics, symmetry is one of the most pervasive predictors of structure-property relations. Human identification of symmetry can be time-consuming, inaccurate, and cannot be done at scale. It is essential to create models that can understand this concept. Here, we develop datasets, benchmarks, and DL workflow to classify wallpaper group symmetry.<br/>First, we developed three image datasets based on the 17 wallpaper group symmetries. These datasets are generated by forming symmetry operations on sections of images from the ImageNet dataset, randomly generated noise and artificial atomically resolved images. We apply a Gaussian noise to mimic real data. These datasets with generation metadata provide a predictable benchmark to validate the efficacy of machine learning models.<br/>We benchmarked the performance of the deep learning model to identify the symmetries of our three datasets. We trained the ResNet34 model and VGG-19 model with two training strategies: transfer learning and training from random initialization. Results show that current deep learning models can classify images with different symmetry classes with 99.38% accuracy (ResNet34 transfer learning). While, at first pass, it might seem that these models are learning a generalized concept for symmetry, this is a farce. If we cross-validate the performance of a model with a dataset of a different type, the accuracy is only slightly better than a random guess. This highlights challenges in deploying generalized machine learning models for scientific inference, particularly when limited access to training data. As a first step, we demonstrate few-shot learning using neural networks that have symmetry awareness by being pre-trained on symmetry datasets of other forms. The model achieves a 7.55% error rate when trained on 50 images per class dataset, as accurate as models trained on 1000 images per class dataset.<br/>While adding transformations such as Fast Fourier and Radon does not significantly improve the model performance, we achieved marked improvements by adding symmetry-constrained layers. We developed a customized preprocess layer with the ability to detect 2-, 3-, and 4-folds rotation, mirror, and glide transformation. To better test the performance and limitation of this transformation layer, we designed stepwise training experiments to classify the 2-, 3-, and 4-folds rotation and mirror transformed 2D images. The training accuracy improved compared benchmark result only when we preprocess manually defined image region. Compared to the 47.6% accuracy for the P1, P2, and Pm symmetry dataset trained from scratch with the ResNet34 model, the preprocessed dataset gives 85% accuracy. Improvement in cross-entropy accuracy proves that the preprocessing layer can detect the 2-folds and mirror transformation. Still, more effort is required to parsimoniously learn symmetry with neural networks.<br/>All told, we demonstrate the nuanced challenges of designing and benchmarking machine learning models for detecting symmetry. We show that machine learning models can readily give the illusion of solving a benchmark task without actually learning any physically relevant concepts. This work demonstrates the importance of including physics constraints in machine learning models to improve performance. Developing a model capable of parsimoniously detecting symmetry in the image will tremendously impact crystallographic and microstructural analysis – and could be pipelined into other machine learning models for materials discovery and design.