S17 Landing Banner 1400x275

Tutorial CM7—21st Century Tools for Accelerating Scientific Research—From Combinatorial Synthesis and Text Mining to Artificial Intelligence

Monday, April 17
8:30 am – 3:00 pm
PCC North, 100 Level, Room 124 B

The tutorial presented novel computational analytic methods capable of unlocking the human knowledge that’s been documented and archived in the unstructured text of hundreds of millions of scientific publications to extend scientific discovery beyond human capacity, and ways to automate experimental knowledge generation. The instructors explored pathways for visualizing and comprehending knowledge propagation, evolution, and assessment of scientific research fronts, data-based hypothesis generation and methods for quantifying research impact within the scientific community and beyond.

All three parts of this tutorial are available for free viewing via MRS OnDemand:

Part I  |  Part II  |  Part III


  • Rama Vasudevan, Oak Ridge National Laboratory
  • Ichiro Takeuchi, University of Maryland
  • Justin Fessler, IBM Federal Software


8:30 am – 10:00 am—Part I: Ichiro Takeuchi

Ichiro Takeuchi discussed the use of informatics techniques to effectively handle, visualize and analyze the large amount of data that are generated from the combinatorial experiments and potential of data mining of publications to establish knowledge-driven research paradigms. In addition to use of multivariate statistical analysis and machine learning techniques, the need for text-based knowledge extraction for further progress was discussed.

10:00 am – 10:30 am—Break

10:30 am – 12:00 pm—Part II: Rama Vasudevan

Rama Vasudevan introduced the arguments for the need for text analysis in the field of materials growth and focused on a specific use case of text mining of papers on epitaxial thin films of complex oxides, for determination of growth conditions-functional property relationships. An open source annotation tool is modified for this purpose, using regular expressions on text from selected papers to automatically annotate text associated with growth conditions and functional properties. Via the use of crowd sourcing, the annotations are checked and matched with the materials of interest, to populate a database containing information on the type of material grown, the substrate, growth conditions and functional property information. The methods shown here are general, and can be applied to a wide variety of growth methods and material types.

12:00 pm – 1:30 pm—Break

1:30 pm – 2:30 pm—Part III: Justin Fessler

Justin Fessler introduced the natural language processing tools of IBM Watson. Through an exploration of specific test cases, he showed how natural language processing afforded by Watson can be utilized to determine latent connections between different data, identify trends and suggest links between disparate domains. These tools can be useful to both existing researchers in fields as well as newcomers, to quickly explore the domain.

2:30 pm - 3:00 pm—Open Discussion