Large language models (LLMs) have a hard time interpreting generic phrases (linguistic generics). Humans might interpret generic phrases such as "Sharks attack beachgoers" to be true, but humans also know that not all sharks attack humans and out of the ones that do, only a very few sharks attack beachgoers. When an LLM receives the phrase in a training corpus, it establishes a relationship between “sharks” and “attacking beachgoers,” which is not necessarily true. (To learn more about the problem of linguistic generics in LLM training, read this research article or this one.)
In this article, we will evaluate this generic phrase, “Do sharks attack beachgoers?” We will use five different prompt templates, similar to the WikiContradict evaluation pipeline in this research article (see Figure 3 in this article). In particular, we choose this question and provide different prompts with different contexts to analyze how LLMs behave when asked generic-related questions. (We chose the sharks example to commemorate the 50th anniversary of JAWS this year).
Next, we load an existing retriever. We use the WikipediaRetriever (which is a database on wikipedia documents) and query it on the topic of “Shark”. We retrieve only the first two documents. It is very important that the two documents are distinctly different from each other for evaluation purposes.
from langchain_community.retrievers import WikipediaRetriever
retriever= WikipediaRetriever(top_k_results=2)
docs= retriever.invoke("Shark")
Copy codeCopied!
Now, let us see what the first document is about.
print(docs[0].page_content[:100])
Copy codeCopied!
Output:
Sharks are a groupof elasmobranch cartilaginous fishes characterized by a ribless endoskeleton...
Copy codeCopied!
We see that the first document is indeed on the animal shark. This is aptly defined and is also necessary to answer our question “Do sharks attack beachgoers?”.
Now let’s see what the second document contains.
print(docs[1].page_content[:100])
Copy codeCopied!
Output:
In cryptography, SHARK is a block cipher identified asoneofthe predecessors of Rijndael...
Copy codeCopied!
Here, we see that the second document is related to cryptography and the page retrieved is on SHARK, a block cipher.
Finally, we move on to load the tokenizer and the model from HuggingFace. We use the granite-3.3-2b-instruct model for evaluation.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ibm-granite/granite-3.3-2b-instruct")
model = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-3.3-2b-instruct")
Copy codeCopied!
We start off by creating a HuggingFace pipeline.
from langchain_community.llms import HuggingFaceHub
from langchain_core.output_parsers import StrOutputParser
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import HuggingFacePipeline
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.7,
device=0 if torch.cuda.is_available() else -1, # Use GPU if available
)
llm = HuggingFacePipeline(pipeline=pipe)
Copy codeCopied!
Prompt Template 1 (Vanilla Prompt)
Next, we import PromptTemplate and create an LLM chain.
Then, we ask the model to respond to the question “Do sharks attack beachgoers”?.
We choose only four quantifiers (“all”, “no”, “some” or “not all”) as it was studied formally in Aristotle's syllogistics.
prompt = PromptTemplate.from_template("""Select one of the basic quantifier expressions from the below list for answering the question and provide a reason based on your internal knowledge : [“all”, “no”, “some” or “notall”].
Donot print anything else apart from the answer and the reason.
Question: {question}""")
chain = LLMChain(llm=llm, prompt=prompt)
response = chain.invoke(
"Do sharks attack beachgoers?")
Copy codeCopied!
Finally, we print the response text.
print(response['text'])
Copy codeCopied!
Output:
c. some
Reason: Sharks that attack beachgoers are a minority event, not a universal one.
Therefore, the statement "some sharks attack beachgoers"is accurate, butnot all.
Copy codeCopied!
Prompt Template 2 (Vanilla Prompt + Context from First document)
Similar to the first prompt, the only change is adding the formatted_context line.
formatted context here is a function that implements retrieving page content only from the first document. We change it accordingly in other prompt templates, as per the evaluation pipeline shown above.
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.7,
device=0 if torch.cuda.is_available() else -1, # Use GPU if available
)
llm = HuggingFacePipeline(pipeline=pipe)
prompt = PromptTemplate.from_template(
"""Select one of the basic quantifier expressions from the below list for answering the question and provide a reason based on your internal knowledge : [“all”, “no”, “some” or “not all”].
Do not print anything else apart from the answer and the reason.
Question: {question}
Context: {context}"""
)
chain_with_context = LLMChain(llm=llm, prompt=prompt)
def format_docs_0(docs):
return docs[0].page_content if docs else""
formatted_context = format_docs_0(docs)
response = chain_with_context.invoke(
{"context": formatted_context, "question": "Do sharks attack beachgoers?"}
)
Copy codeCopied!
The response:
print(response['text'])
Copy codeCopied!
Output:
Reason:"some"
Some shark species are apex predators and attack beachgoers, especially the larger species such as the great white shark, tiger shark, and bull shark.
However, it's important to note that not all sharks attack humans, and most sharks have a diet consisting of fish and marine mammals.
The majority of shark attacks on humans are fatal, but the probability of being attacked by a shark is relatively low.
Copy codeCopied!
Prompt Template 3 (Vanilla Prompt + Context from Second document)
Here, we implement the same code as above but with one single change: we now need to select the second document for context.
defformat_docs_1(docs):
return docs[1].page_content if docs else""
formatted_context = format_docs_1(docs)
Copy codeCopied!
The response:
print(response['text'])
Copy codeCopied!
Output:
Note: The provided contextis about the cryptographic block cipher SHARK, not about sharks or beachgoers.
Therefore, there's no relevant information to form a quantifier expression for the question.
The answer is"none"and the reason is: The context does not contain any information about sharks attacking beachgoers.
Copy codeCopied!
Prompt Template 4 (Vanilla Prompt + Context from both documents)
Here, we include both the documents in the context.
The context provided doesnot explicitly discuss sharks attacking beachgoers.
However, itdoes offer information about shark species, their sizes, habitats, and behaviors.
To answer the question "Do sharks attack beachgoers?", we can infer a basic quantifier expression fromthe context.
Given the context, we can express itwiththe expression "some" because sharks do attack humans, though it's a rare occurrence.
This inference is based onthe fact that several shark species are apex predators, including bull sharks, tiger sharks, great white sharks, mako sharks, thresher sharks, and hammerhead sharks.
Although they typically do not live in freshwater, some sharks, like the bull shark, can be found in both seawater and freshwater.
Therefore, the answer is"some", andthe reason is based onthe existence of shark species capable of attacking humans, despite the rarity of such incidents.
Copy codeCopied!
Prompt Template 5 (Detailed Prompt + Context from both documents)
Lastly, in Prompt Template 5, we use a detailed prompt with context from both documents (as above).
prompt = PromptTemplate.from_template(
"""Select one of the basic quantifier expressions from the below list for answering the question and provide a reason based on your internal knowledge : [“all”, “no”, “some” or “not all”].
Carefully investigate the given context and provide a concise response that reflects the comprehensive view of the context, even if the answer contains contradictory information reflecting the heterogenous nature of the context.
Question: {question}
Context: {context}"""
)
Copy codeCopied!
The response:
print(response['text'])
Copy codeCopied!
Output:
In this context, the basic quantifier expressions to be chosen are "all", "no", "some", or"not all".
The context describes sharks as large predatory fish found in all seas, withsome species capable of living in freshwater, andsome being apex predators in marine environments.
Sharks are also noted to be caught by humans for various purposes, with many populations threatened due to overfishing and finning practices.
Given these details, the most appropriate quantifier expression is"some." Here's the reasoning:
1. The statement "Do sharks attack beachgoers?"isabout whether certain shark species engage in behavior that could pose a risk to humans.
2. The context reveals that sharks are predators, withsome species like the great white, tiger, and hammerhead sharks being noted as apex predators.
3. However, the contextdoesnot universally assert that all shark species attack beachgoers.
4. While there are documented attacks, the context primarily focuses onthe variety, diversity, and ecological roles of sharks rather than providing an exhaustive listof attacks.
5. In fact, the context highlights human threats to shark populations, implying thatnot all shark attacks are due to human-induced aggression or proximity.
6. Therefore, the answer "some" aligns best withthe context, acknowledging thatsome shark species are capable of, and do, attack beachgoers.
Copy codeCopied!
Conclusion
Through this experiment, we see that the Granite 3.3-2b model is good at identifying associated quantifiers in generic phrases and the model responses are context-driven too. We also see that the model relies on context-dependent information, which is why Prompt Template 3 gave “none” as the right answer to our question “Do sharks attack beachgoers?”.
Feel free to create a copy of the notebook here, and play around with your own linguistic generics.
Acknowledgements
Thanks to my IBM mentors, Dr. Alessandra Pascale and Susan Malaika, for guiding me throughout this project! I would also like to thank Dr. Alexander Tolbert for his guidance and introducing me to this field.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.