Introduction to Generative AI

Generative AI includes many types of systems, such as models that generate images, audio, video, or text. In this course, we begin with large language models (LLMs). LLMs are one of the most widely used forms of generative AI today and power tools such as ChatGPT, Claude, and Gemini. These systems interact with users through language and are increasingly used in research, writing, and information services.

Attention Is All You Need

In 2017, researchers at Google published a paper titled "Attention Is All You Need." The paper introduced a new model design called the Transformer architecture.

Generative AI did not begin with this paper, but this work helped trigger the rapid development of many systems that generate text, images, and other content.

Because of this, the paper is widely seen as a turning point that enabled the modern wave of generative AI, including tools such as ChatGPT, Claude, and Gemini.

Earlier in this course we discussed several paradigms of AI, including symbolic AI and machine learning. Generative AI is usually discussed within machine learning. Most recent generative AI systems are built using neural networks, which are a type of machine learning model. These systems learn patterns from large collections of data and use them to produce new outputs.

What Is a LLM?

A large language model is a system designed to work with human language. It analyzes patterns in large collections of text and uses those patterns to generate new text.

A language model estimates what words are likely to appear next in a sequence of text. For example, in the phrase "The capital of France is", many readers expect the next word to be "Paris". Language models learn similar patterns from large text collections.

The term large refers to the scale of these systems. Modern language models are trained on very large collections of text and use large neural network models. This scale allows them to capture many patterns of language.

Because of this training, LLMs can perform tasks such as answering questions, summarizing documents, translating text, and assisting with writing. These systems generate responses by drawing on patterns in their training data rather than by consulting a database of verified facts.

Common Terms in LLMs

Parameters

A parameter is a numerical value inside a neural network that is adjusted during training. These values act like weights that determine how the model links pieces of text to each other. For example, the model may learn that certain words often appear together or that some words depend on earlier parts of a sentence. When a model has more parameters, it has more capacity to store many such connections in language.

Model names often indicate the number of parameters. For example, a model labeled 7B has about seven billion parameters. The letter B stands for billion. For instance, LLaMA‑3‑8B has about 8 billion parameters, while a model such as LLaMA‑3‑70B has about 70 billion. Larger models usually have more capacity to capture patterns in language.

Training Data

Large language models learn from large collections of text. These may include books, websites, articles, and technical documents.

Example: When an LLM explains a concept from library science or ICT, it draws on patterns learned from many similar texts during training, such as textbooks, research articles, technical documentation, and online tutorials.

Tokens

LLMs process text as small units called tokens. A token may be a word, part of a word, or punctuation.

Example: When a student asks an LLM to summarize a journal article, the model processes the text as a sequence of tokens.

Before exploring tokenization hands-on, watch the following short video explaining what tokens are and how language models represent text.

Hands-on: Exploring Tokenization

Goal: Observe how large language models break text into tokens.

Visit one of the following tokenization tools:

Step 1: Enter a short sentence.

For example:

Libraries help people find information.
Metadata improves search and discovery.

Step 2: Observe how the system splits the text into tokens.

Notice that tokens are not always full words. A token may be:

a complete word
part of a word
punctuation

Step 3: Try several variations.

Enter a longer sentence.
Enter an uncommon or technical word.
Enter a paragraph.

As you experiment, consider the following questions:

Do tokens always correspond to full words?
When do words get split into smaller pieces?
How are punctuation marks handled?
Do longer or unusual words produce more tokens?

Think about why tokenization is useful for language models. Instead of storing every possible word, the model can represent language using smaller reusable pieces.

Embeddings

Before a neural network can process text, tokens must be converted into numbers. Embeddings are numerical representations of tokens.

Embeddings allow the model to compare and relate pieces of text. Tokens that appear in similar contexts often receive similar embeddings.

For example, words such as “library”, “archive”, and “collection” may appear close to each other in the embedding space.

You may revisit the vector example using the TensorFlow Embedding Projector. In that activity, each word is represented as a vector, which is a list of numbers. Embeddings use the same idea. In modern AI systems, an embedding is a vector used to represent a token or word so the model can compare pieces of text mathematically.

Next-Token Prediction

Large language models generate text by predicting one token at a time. Given a sequence of tokens, the model estimates which token is most likely to appear next.

After a token is predicted, it is added to the sequence. The model then predicts the next token again. Repeating this process allows the model to produce longer passages of text. This is why, when you interact with an AI system, you often see the response appear gradually rather than all at once.

Context Window

The context window is the amount of text a model can consider at one time. If a document is longer than the context window, the model cannot use all of it at once.

Example: When summarizing a long report or a digital archive document, the context window determines how much of the text the model can analyze in one request.

Further information

How LLMs Work?

To understand how large language models operate, read the following article:

Large language models, explained with a minimum of math and jargon

As you read, focus on the following key ideas.

Words as vectors

Language models do not represent words as strings of letters. Instead, each word (or token) is represented as a vector, which is a list of numbers.

This numerical representation allows the model to compare words and capture patterns such as:

similarity between related terms
relationships between words
patterns in how words appear together in text

Think about:

Why are numbers easier for a computer to manipulate than letters?
How representing words as vectors enables operations such as measuring distance or similarity.

Meaning depends on context

A single word can have multiple meanings depending on context. For example, a bank can refer to a financial institution or the side of a river.

Modern language models address this by generating context-dependent representations. The representation of a word changes depending on the surrounding words in the sentence.

Consider:

How humans use context to interpret meaning.
Why static word representations would struggle with ambiguity.

Transformers process text through layers

Large language models are built from many stacked transformer layers. Each layer receives representations of the input tokens and updates them. Across layers, the model gradually incorporates more contextual information.

Early layers often focus on:

sentence structure
syntax
resolving ambiguity

Later layers may help build a higher-level understanding of the passage.

The attention mechanism

Within each transformer layer, the model uses attention.

Attention allows tokens to exchange information with other tokens in the sequence.

For example, attention mechanisms help the model:

connect pronouns to the correct nouns
resolve ambiguous words
identify relationships between words in a sentence

Attention allows the model to determine which parts of the text are relevant to each token.

Feed-forward layers and pattern matching

After attention, the model goes through another step called a feed-forward network. This part of the model helps recognize useful patterns in the text and prepares for the next-token prediction.

Training through next-token prediction

Language models are trained using a simple objective: predict the next token in a sequence of text. During training:

The model receives a sequence of tokens.
It predicts the next token.
The prediction is compared to the actual token.
The model adjusts its internal parameters to reduce the error.

This process is repeated across massive text datasets, allowing the model to learn patterns of language usage.

Why scale matters

Performance improves when three factors increase together:

model size (number of parameters)
training data
computational resources

Large models trained on large datasets tend to capture more linguistic patterns and produce more accurate predictions.

Hands-on: Testing next-token prediction and context

Goal: Observe how a language model continues text based on context.

Large language models generate text one token at a time. At each step, the model predicts which token is most likely to appear next given the previous tokens.

In this activity, you will observe how changing the context changes the generated text.

Step 1: Open a large language model

Open any large language model you have access to, such as ChatGPT, Claude, Gemini, or Copilot.

Step 2: Try simple text completion

Enter the following incomplete sentences and observe how the model continues them.

The capital of France is
Libraries help people
Metadata improves
A reliable information source should

Read the generated text carefully.

Step 3: Change the context

Now modify the beginning of the sentence and compare the results.

Try examples such as:

In a fictional story, the capital of France is
In a children's book, libraries help people
In digital archives, metadata improves
In academic research, a reliable information source should

Observe how the model's continuation changes.

Step 4: Compare the outputs

Consider the following questions:

Did the model produce different continuations when the context changed?
Did the output seem predictable or surprising?
Did the model generate a full sentence or continue writing beyond what you expected?

Further information

You may explore this visualization of an LLM workflow. It includes many technical details, so reviewing it is optional. You may simply explore the visualization to get a general sense of how LLM components connect.
Optional: Jay Alammar: The Illustrated Transformer explains the transformer architecture in detail. Some parts may be technically challenging.
Andreas Stöffelbauer: How Large Language Models work? From zero to ChatGPT
MIT Sloan Management Review: How LLMs Work: Top 10 Executive-Level Questions
Drexel University: Dragons' Guide to GAI
AI Demystified: Introduction to large language models

Evaluating LLMs

How do researchers compare different large language models? Two commonly used approaches are benchmarking and human evaluation.

Benchmarking

Benchmarking evaluates models using standardized tasks and datasets. Each model completes the same tasks, and the results are summarized using numerical scores. Because the tasks are fixed, benchmarking allows researchers to compare different models under the same conditions.

Benchmarks may test abilities such as question answering, reasoning, code generation, or knowledge across academic subjects.

Advantages of benchmarking include standardized comparisons and measurable results. Because every model is tested on the same tasks, researchers can directly compare performance across systems. Benchmark scores can also be reproduced by other researchers, which supports transparency in evaluation.

However, benchmark scores do not always reflect real‑world use. Many benchmark tasks are short and highly structured, while real information tasks are often longer and more complex. In addition, models may be optimized specifically to perform well on widely known benchmarks.

Another concern is that some benchmark questions may appear in the training data used to train the model. If a model has already seen the same or very similar examples during training, it may appear to perform well simply because it remembers those answers. This issue is often referred to as data leakage or benchmark contamination.

Common benchmarks used to evaluate large language models include MMLU (which tests knowledge across academic subjects), GSM8K (which evaluates mathematical reasoning), and HellaSwag (which measures commonsense reasoning).

Human evaluation: LM Arena

Another approach is human evaluation. One widely known example is LM Arena. In LM Arena, users are shown responses from two anonymous models and are asked to choose the better answer.

Over time, many user votes produce rankings that reflect how people perceive model performance. This approach captures human judgments about qualities such as clarity, usefulness, and writing style.

Human evaluation can reflect more realistic tasks than standardized benchmarks. At the same time, the results may be influenced by user preferences, popularity effects, or differences in how questions are asked.

Hands-on: Exploring LM Arena

Goal: Observe how human evaluation is used to compare large language models.

Step 1: Explore the leaderboard

Visit: https://arena.ai/leaderboard

Look at the model rankings and consider the following questions:

Which models appear near the top of the leaderboard?
Are the models familiar to you (for example ChatGPT, Claude, Gemini)?
Do the rankings match your expectations about model quality?

Step 2: Try voting in LM Arena

Visit: https://arena.ai/

Enter a prompt and compare responses from two anonymous models.
After reading the responses, vote for the answer you think is better.

As you experiment, consider:

What criteria did you use to decide which answer was better?
Did both models appear competent, or was one clearly stronger?
Could different users reasonably vote for different answers?

This activity illustrates how large-scale human voting can be used to evaluate and rank language models.

Further information

Emerging directions

Early large language models worked primarily with text. Many newer systems can work with multiple types of data, including images, audio, video, and documents. These systems are often described as multi-modal models.

In a multi-modal system, language models can analyze information from different sources together. For example, a user may upload an image and ask the system to describe what it shows, or provide a chart and ask for an explanation of the data.

In library and information settings, multi-modal systems may support tasks such as describing images in digital collections, analyzing scanned documents, or generating alternative text to improve accessibility.

Hands-on: Trying a multi-modal model

Goal: Observe how a multi-modal system can use both an image and a written question.

In this activity, you will upload a photo to a chatbot that supports image input and ask it a practical question.

Step 1: Choose a tool

Open a system that supports image input, such as ChatGPT, Claude, Gemini, or Copilot.

Step 2: Take or upload a photo

Choose one image from your everyday environment. For example, you may upload:

a plant or flower
a household object or device
a chart, sign, or printed document
a damaged item that may need repair

Step 3: Ask a question about the image

Upload the image and ask a specific question. For example:

What kind of plant is this?
What does this chart show?
What does this sign mean?
What might be wrong with this object, and what should I check first?

Step 4: Evaluate the response

As you read the answer, consider the following questions:

Did the system correctly identify the main features of the image?
Did it answer the question directly?
Did it seem confident even when it might be uncertain?
What parts of the answer would you verify before acting on it?

Hands-on: Trying voice interaction

Goal: Observe how a language model processes spoken input and produces a response.

Many modern AI systems support voice interaction. In these systems, speech is converted into text, processed by the language model, and sometimes converted back into speech.

Step 1: Open a tool with voice mode

Use a system that supports voice interaction, such as:

Step 2: Ask a question using speech

Ask the system a question by speaking instead of typing.

For example:

What is metadata in library science?
Explain what digital archives are.
What are some examples of AI used in libraries?

Step 3: Observe the interaction

As the system responds, consider:

Did the system correctly understand your speech?
Did it respond using text, speech, or both?
Did the interaction feel different from typing a prompt?

Hands-on: Analyzing a PDF with a language model

Goal: Observe how a language model analyzes a document that contains both text and visual structure.

Many language models allow users to upload PDF documents. These systems can analyze both the text and the layout of the document.

Step 1: Select a PDF

Choose a PDF document you are allowed to upload.

Step 2: Ask questions about the document

Ask questions such as:

What are the main points of this document?
Summarize the key findings.
What information is shown in the tables or figures?

Step 3: Evaluate the response

As you read the answer, consider:

Did the system correctly identify the structure of the document?
Did it summarize the key ideas accurately?
Did it miss any important information?

Agentic AI and AI agents

Most early language models responded to one prompt at a time. More recent systems can also use external tools such as web search, databases, or software for specific tasks.

When a system can use tools to carry out multiple steps toward a goal, it is often described as agentic AI. An AI agent is a system designed to pursue a goal by planning steps and using tools. Some agentic systems use a single agent, while others coordinate multiple agents.

For example, an AI agent might search for sources on a topic, organize the results, and draft a short summary. In information work, similar systems may assist with tasks such as literature searching, document analysis, or preparing research notes.

Anthropic's Claude Cowork shows how an AI agent may begin to handle tasks that were previously done by separate software tools, while OpenClaw is an example of an open and self-hosted agent platform. These examples show that agentic AI can appear in both commercial and open systems.

At the same time, agentic systems can introduce additional risks. If they are connected to files, messages, or external tools, they may expose sensitive information or take unintended actions if they are not carefully configured and monitored.

Hands-on: Trying an AI Agent with Microsoft Copilot

Goal: Experience how an AI agent can complete a multi-step task.

In this activity you will try a research task using Microsoft Copilot.

Step 1: Access Copilot

Visit: Microsoft 365 Copilot

Sign in using your University of Kentucky account.

If you do not currently have access to the paid version of Copilot, the university provides licenses for students. You may request access here.

Step 2: Try the Researcher agent

In Copilot, choose the Researcher option.

Select Topic report and enter a research topic.

For example:

Agentic AI in libraries
AI use in digital archives
AI tools for academic research

Ask the agent to generate a report.

Step 3: Observe the agent's behavior

As the system generates the report, consider the following questions:

Does the system gather information from multiple sources?
Does it organize the information into sections?
Does it cite sources or reference materials?
Does it show intermediate steps, such as planning, searching, or checking citations?
Do those visible steps help you understand how the report is being produced?
Do any of the steps seem confusing, unnecessary, or misleading?

Step 4: Reflect on the results

After reading the report, consider:

Does the report appear accurate and well organized?
What parts of the report would you verify before using it in research?
How is this different from asking a chatbot a single question?

This activity illustrates how agent-based systems can perform multi-step tasks such as gathering information, organizing sources, and producing structured reports.

Hands-on: Creating your own agent in Microsoft Copilot

Goal: Build a simple agent and observe how its description and instructions shape its behavior.

In this activity, you will create a small agent in Microsoft Copilot and test how it responds.

You may check Microsoft's guide to Agent Builder if needed.

Step 1: Open Agent Builder

Open Microsoft 365 Copilot and choose New agent.

You can create an agent in two ways:

Describe: create the agent using plain language
Configure: enter the name, description, and instructions manually

Step 2: Define a simple purpose

Create an agent for one specific task. Keep the scope narrow. For example, your agent might be designed to:

explain concepts from this course in plain language
help generate study questions for a weekly reading
summarize a document for LIS or ICT students
suggest keywords for a research topic

Step 3: Add the basic components

Give your agent:

a short name
a brief description
a few clear instructions

Try to make the instructions specific. For example, tell the agent what kind of task it should perform, what audience it should assume, and what kind of answer it should produce.

Step 4: Test the agent

Use the test pane to try several prompts.

Consider the following questions:

Does the agent stay within its intended role?
Do the instructions affect the style or structure of the response?
What happens if the prompt falls outside the agent's intended purpose?

Step 5: Revise and reflect

Revise the description or instructions and test the agent again.

Further information

A Multimodal World
GeeksforGeeks: Multimodal Large Language Models
Google: What is agentic AI?
Explaining Agentic AI: The Good, the Bad & the Ugly

Limitations and Open Challenges

Despite their rapid development, large language models still face important limitations and open challenges.

Shortage of high-quality data

Large language models require very large amounts of training data. However, not all data are equally useful. High-quality text that is accurate, well edited, and legally usable is limited.

This creates a challenge for future model development. As more organizations train large models, competition for high-quality data may increase. More data do not always lead to better models if the data are noisy, repetitive, or unreliable.

Training data controversies

Another challenge concerns how training data are collected and used. Many large language models are trained on data gathered from books, websites, articles, code, and other online materials. This has raised concerns about copyright, consent, privacy, and compensation.

These concerns are especially important in information professions, where questions of authorship, access, licensing, and responsible data use are central.

Hallucination

Large language models can produce responses that sound confident and well written but are incorrect. This problem is commonly called hallucination. Some authors also use the term confabulation.

Hallucinations may include false claims, invented citations, incorrect summaries, or inaccurate descriptions of sources. For this reason, LLM outputs should be checked carefully, especially in academic, legal, medical, or policy contexts.

Unexpected or illogical answers

LLMs can also produce answers that are confusing, inconsistent, or illogical even when the question seems simple.

An example is a prompt asking whether a person who lives very close to a car wash should walk there or drive there in order to wash a car. A human reader would quickly notice the practical problem in the question. A language model, however, may respond in a way that sounds reasonable at first but does not fully address the contradiction.

These cases are useful reminders that fluent language does not always indicate sound reasoning.

Hands-on: Testing a simple logic prompt

Goal: Observe how a language model handles a simple question that contains a practical contradiction.

Step 1: Read an example

Read the following short article: I asked ChatGPT if I should drive or walk to the car wash to get my car washed — and it struggled with basic logic

Then review this shared conversation created using ChatGPT 5.2.

Step 2: Try the same idea with a current model

Open a current large language model and test a similar prompt.

You may use the same question or a slightly revised version, such as:

I live 15 meters away from a car wash. Should I walk or drive there to wash my car?
I live next to a car wash. Should I walk there or drive there to get my car washed?

Step 3: Compare the responses

As you read the answer, consider the following questions:

Does the model recognize the practical contradiction in the question?
Does the newer model perform better than the earlier example?

From GenAI to AGI

Generative AI and artificial general intelligence are not the same. Generative AI refers to systems that can produce new content such as text, images, audio, or code. AGI usually refers to a system with much broader and more flexible intelligence across many tasks and domains.

Recent advances in large language models have led some people to ask whether generative AI is moving toward AGI. This remains an open question. Current systems can perform many useful tasks, but they still show important limitations in reasoning, factual accuracy, reliability, and transfer across contexts.

For this reason, discussions of AGI should be approached with caution. Strong performance on selected tasks does not necessarily mean that a system has general intelligence in the broader sense.

Further information

Richard Sutton – Father of RL thinks LLMs are a dead end

How close is AGI? What the experts say.

Is AI Hiding Its Full Power? With Geoffrey Hinton

Introduction to Generative AI

Attention Is All You Need

What Is a LLM?

Common Terms in LLMs

Further information

How LLMs Work?

Words as vectors

Meaning depends on context

Transformers process text through layers

The attention mechanism

Feed-forward layers and pattern matching

Training through next-token prediction

Why scale matters

Further information

Evaluating LLMs

Benchmarking

Human evaluation: LM Arena

Further information

Emerging directions

Multi-modal models

Agentic AI and AI agents

Further information

Limitations and Open Challenges

Shortage of high-quality data

Training data controversies

Hallucination

Unexpected or illogical answers

From GenAI to AGI

Further information