Recorded webinar

Domain-specific knowledge extraction from scientific texts using LLMs

This webinar will explore how Large Language Models (LLMs) can streamline the extraction of critical information from scientific texts, focusing on patient-derived cancer models (PDCMs). PDCMs are vital tools for cancer research and preclinical studies, with a growing body of literature in this field. However, manually extracting and curating information from scientific texts is labor-intensive and prone to delays. In this session, we will introduce two innovative approaches: direct prompting and soft prompting. Direct prompting uses manually created instructions to extract PDCM-related entities, while soft prompting leverages machine learning to train continuous vector prompts. We will discuss our comparative evaluation using state-of-the-art proprietary and open-source LLMs, demonstrating how tailored prompt engineering can elevate the performance of smaller, open models to match proprietary counterparts. This session will highlight the potential of LLMs to enhance domain-specific knowledge extraction and accelerate research workflows.

Resource type: Recorded webinar

Scientific topics: Machine learning


Activity log