Beyond Information Generation: Leveraging Large Language Models For Business Automation

By Pankaj Manon, CTO at ThoughtSphere Inc.

March 2024

The potential Large Language Models (LLMs) and their applications can deliver remains a key focus of innovators across industries. While LLMs are widely recognized for their prowess in responding to questions with well-composed generated text, the next phase of usage involves using LLMs to create operational plans and begin executing operational workflows accordingly.

LLMs can be leveraged for automation tasks that can extend into various practical applications. Tasks such as Text Generation (Causal LLMs), predicting the next token in a sentence (Fill Mask), Summarization, Feature Extraction, and more that can be tailored to meet specific business needs. Identifying the right application and composition of LLMs to deliver a business outcome is key to unlocking the full potential of language models.

To illustrate the practical application of LLMs to support business processes, let’s consider the example of extracting information from a clinical trial protocol to automate downstream trial delivery processes. In this context, the objective is not merely generating summary protocol text, but rather reading the protocol and retrieving targeted information from the protocol document to answer key questions. The extracted free text data must then be transformed to a defined structure to be used for automating business processes within a clinical trial. A few examples of such automation opportunities are – recommending study budget scenarios, automating the CRF build, identifying operational risks, and recommending TLFs for regulatory submission.

While the business automation possibilities are exciting, starting off on the right foot is important. Below are 2 core tenants that must be carefully determined and crafted to maximize success along with an real-world example of how an LLM can support business automation.

Identify the Right LLM(s) for the Job:

Choosing a right LLM that best fits the business solution is critical. There are various open source LLMs (Llama2, Bloom, Bert etc.) and pay as you use LLMs (OpenAI ChatGPT, Google Gemini etc.) which can be leveraged within the application. There are also domain specific models that can be leveraged such as Google Med-PaLM which is trained in healthcare and medical use cases. The key factors you need to consider for selecting the right model are domain specific training, cost, size, extensibility, etc.

Craft Meaningful Prompts to Support Retrieval Augment Generation

The Retrieval Augment Generation (RAG) is a technique to provide context of an authoritative knowledge base to the LLM for generating a response. The RAG takes a prompt and passes it to the LLM. The LLM is then instructed to use the context as a reference and generate factual information from it. Thus, crafting meaningful prompts is indicative to ensure correct and complete information retrieval.
So exactly how does the RAG technique work and support the automation of business processes? The first step is for the LLM to convert the information from the protocol document into data chunks. The word embeddings for these data chunks are retrieved from the LLM, and the embedding vectors are stored in a vector database. Developers create relevant prompts to retrieve information from the source protocol document. The embeddings for these prompts are queried in the vector database to find protocol document chunks that are semantically relevant. A context is created using the best matches of protocol data returned by the query. The prompt and the context is then passed to the LLM, and the LLM is instructed to retrieve information from the context in a structured format. The information retrieved is leveraged to support the automation of specific business use cases.

RAG Technique Applied to Study Budget Creation

Going back to the clinical protocol example, let’s discuss the study budget operational use case. To set up a study budget there are quite a few details that are needed, including the number of subjects to be screened/enrolled, study countries, therapeutic area, indication, schedule of visits along with procedures, the overall study duration expected, etc. Applying the RAG approach, prompt(s) must be developed to query the text in a protocol document for the required information. The LLM uses these prompts along with the context from the protocol document to retrieve the relevant information and return it in a pre-defined structure. Below is a sample representation of a prompt and the context that gets passed to the LLM for retrieving the number of subjects from the document:

Prompt:

You are a clinical project manager working on creating a study budget. You need to identify the clinical trial budget parameters from the protocol document. From the following context information, and not prior knowledge answer the question:
Total Number of subjects planned across cohort groups for this study are?
Context: Cohort 1 (Drug1): A maximum of 50 patients were planned to be enrolled to yield 25 pairs of biopsies Cohort 2 (Comparator) : A Maximum of 15 patients will be enrolled for Cohort 2

LLM Response:

The total number of subjects planned across cohort groups for this study is calculated by summing the planned enrollments for each cohort group:

Total Subjects = Cohort 1 (Drug1) + Cohort 2 (Comparator)

Total Subjects = 50 (Cohort 1) + 15 (Cohort 2) = 65 subjects.

Structured Format:

ProtocolDocumentObject: {

NumberOfSubjects: 65

}

Conclusion:

As businesses explore the potential of Large Language Models for automation, it is imperative to recognize the breadth of applications beyond generative tasks. By identifying specific use cases, leveraging advanced techniques, and validating results, organizations can harness the power of LLMs to enhance efficiency, streamline workflows, and extract valuable insights from textual data. The journey from generative capabilities to practical business automation is one paved with innovation and strategic implementation.

Unlock the Potential of Unified Clinical Data with ThoughtSphere!

Please enable JavaScript in your browser to complete this form.