The tutorial presented novel computational analytic methods capable of unlocking the human knowledge that’s been documented and archived in the unstructured text of hundreds of millions of scientific publications to extend scientific discovery beyond human capacity, and ways to automate experimental knowledge generation. The instructors explored pathways for visualizing and comprehending knowledge propagation, evolution, and assessment of scientific research fronts, data-based hypothesis generation and methods for quantifying research impact within the scientific community and beyond.
All three parts of this tutorial are available for free viewing via MRS OnDemand:
Part I | Part II | Part III
8:30 am – 10:00 am—Part I: Ichiro Takeuchi
Ichiro Takeuchi discussed the use of informatics techniques to effectively handle, visualize and analyze the large amount of data that are generated from the combinatorial experiments and potential of data mining of publications to establish knowledge-driven research paradigms. In addition to use of multivariate statistical analysis and machine learning techniques, the need for text-based knowledge extraction for further progress was discussed.
10:00 am – 10:30 am—Break
10:30 am – 12:00 pm—Part II: Rama Vasudevan
Rama Vasudevan introduced the arguments for the need for text analysis in the field of materials growth and focused on a specific use case of text mining of papers on epitaxial thin films of complex oxides, for determination of growth conditions-functional property relationships. An open source annotation tool is modified for this purpose, using regular expressions on text from selected papers to automatically annotate text associated with growth conditions and functional properties. Via the use of crowd sourcing, the annotations are checked and matched with the materials of interest, to populate a database containing information on the type of material grown, the substrate, growth conditions and functional property information. The methods shown here are general, and can be applied to a wide variety of growth methods and material types.
12:00 pm – 1:30 pm—Break
1:30 pm – 2:30 pm—Part III: Justin Fessler
Justin Fessler introduced the natural language processing tools of IBM Watson. Through an exploration of specific test cases, he showed how natural language processing afforded by Watson can be utilized to determine latent connections between different data, identify trends and suggest links between disparate domains. These tools can be useful to both existing researchers in fields as well as newcomers, to quickly explore the domain.
2:30 pm - 3:00 pm—Open Discussion