MRS Meetings and Events

 

DS06.07.02 2023 MRS Fall Meeting

Property-Structure-Process Relationship Trees and Contextual Understanding of Scientific Manuscripts using Large Language Models of Artificial Intelligence

When and Where

Nov 29, 2023
8:15am - 8:30am

Sheraton, Second Floor, Back Bay A

Presenter

Co-Author(s)

Maciej Tomczak1,Daniel Cieslinski1,Payden Brown2,Yang Park2,Ju Li2,Stefanos Papanikolaou1

National Centre for Nuclear Research1,Massachusetts Institute of Technology2

Abstract

Maciej Tomczak1,Daniel Cieslinski1,Payden Brown2,Yang Park2,Ju Li2,Stefanos Papanikolaou1

National Centre for Nuclear Research1,Massachusetts Institute of Technology2
<br/>The ever expanding corpus of scientific manuscripts contains vast amount of knowledge and descriptions of various experimental settings. As new articles are published daily, it is impractical for any single individual to grasp all that information. However, for each manuscript, the scientific human mind further trains a tree of connections between Properties (eg. hardness, strength, conductivity of materials), Structures (eg. crystalline type, defect content, composition) and Processes (eg. annealing, cold work). This tree of PSPs represents the human strategy for dimensional reduction in processing scientific manuscripts, and requires to quickly and efficiently find required information in the published works and compare it with similar texts or other sources.<br/><br/>Recent advancements in natural language processing (NLP) have given rise to high-performing foundation models, in particular large language models (LLM) such as OpenAI’s GPT-4. These powerful models are capable of complex tasks such as text summarization and creative content generation. However, utilization of these models comes at a significant computational cost and contextual understanding is commonly limited. In this work, we develop small and efficient fine-tunable models for capturing PSPs in scientific manuscripts, using Elsevier's database. We also propose to incorporate the Retrieval Augmented Generation (RAG) approach alongside our fine-tuning methods for ensuring a more robust and reliable model for capturing PSPs in scientific manuscripts.<br/><br/>We investigate the statistics of various ways of fine-tuning LLMs, and also extract PSPs in pre-defined sets. Using texts focused only on nuclear materials research from Journal of Nuclear Materials, we evaluate LLMs of different sizes and different strategies to determine their suitability as knowledge bases for scientists.

Keywords

nuclear materials

Symposium Organizers

Mathieu Bauchy, University of California, Los Angeles
Ekin Dogus Cubuk, Google
Grace Gu, University of California, Berkeley
N M Anoop Krishnan, Indian Institute of Technology Delhi

Symposium Support

Bronze
Patterns and Matter | Cell Press

Session Chairs

Mathieu Bauchy
Binquan Luan

In this Session

Publishing Alliance

MRS publishes with Springer Nature