Amanda Parker1
The Australian National University1
Amanda Parker1
The Australian National University1
In order to apply machine learning to the study of structure-property relationships domain knowledge is typically required for feature extraction. However, this process may introduce bias if there is a focus on known aspects of structure, thereby impeding the discovery of new science. Here, we develop an approach that uses only atomic Cartesian coordinates to predict the electronic properties of simulated graphene nanoflakes (from a publically available data set). Our approach addresses the limits of currents methods by greatly extending the degree of material complexity, assymetry, surface details and size differences that can be encoded by a graph embedding. <br/><br/>The workflow developed decribes graphene nanoflakes with graphs that are more representative than the ball-stick atom-bond representation that is intuitive to reserachers. We generate fixed-size embeddings of these graphs using a neural embedding framework. Pairing the graph embeddings with a convolutional neural network produces a highly accurate predictive model for electron affinities, band gap energies, Fermi energies and ionization potentials. The hold out test set model performance fit has $R^2$ from $0.9-0.96$ for nanoflakes with a very challenging variation in size from tens to thousands of atoms. These predictions were benchmarked against results for optimized predictive models with geometric domain-driven features and exceeded their model accuracy for predictions of Fermi energy, electron affinity and ionisation potential and met their model accuracy for band gap energy. We also introduce and optimize a model hyperparameter that gives insight into the relevant lengthscales of interactions for the material modelled.<br/>\end{abstract}