Schedule & syllabus

The lecture slides, labs, and assignments will be posted here as the course progresses.
Lecture times are 3pm-4:20pm PST on Tuesdays and Thursdays. All deadlines are at 11:59pm PST.

This schedule is subject to change according to the pace of the class.

Date Description Materials Events
Part I: Background (Week 1)
Week 1
Tue Sept 26 Trustworthiness of LLMs
Course Overview
Projects
Big picture: LLM tech stack
Slides





Thu Sept 28 Guest Lecture:
Jerry Liu (LlamaIndex)

LlamaIndex for building LLM apps
TruLens for LLM app evaluation
Intro to Homework 1
Slides
Homework 1 Introduction

Supplemental Materials:
LlamaIndex
TruLens
Description: Homework 1 is designed to get you bootstrapped to an LLM prototype and set you up for a project.
Homework 1 Due Oct 9th on Gradescope
Part II: Key LLM Application Areas (Weeks 2, 3, and 4)
Week 2
Tues Oct 3 Guest Lecture:
Isabelle Hau (Stanford GSE)
Josh Weiss (Stanford GSE)

Application areas (education)
Project ideas -- evaluations
Slides
Thu Oct 5 Guest Lecture:
Nicholas Carlini (Google DeepMind)
Zifan Wang (Center for AI Safety)

Application areas (Security)
Adversarial attacks on security
LLMs for security
Project ideas -- evals
Slides
Zifan Wang's Slides
Homework 1 Due Oct 9th
Week 3
Tue Oct 10 Guest Lecture:
Monica Agrawal (LayerHealth)
Divya Gopinath (LayerHealth)

Application areas (Healthcare)
Project ideas - evals
Guest Lecture Slides
Thu Oct 12 Space of evaluations
    Groundedness, Consistency, Confidence and Uncertainty, Adversarial attacks, Privacy, Fairness
Summarize application areas and explore Trustworthiness angles
Project ideas
Slides
Final project group formations due Saturday Oct. 14th. Refer to Ed for more info.
Week 4
Tues Oct 17 Project proposals and feedback
Thurs Oct 19 Project proposals and feedback
Part III: LLM Evaluations (Weeks 5 and beyond)
Week 5
Tue Oct 24 RAG triad
    Context relevance, groundedness, QA relevance
Relevance
Groundedness evaluations
    Definition, Techniques, Tools
Slides
References:
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
TRUE: Re-evaluating Factual Consistency Evaluation
Do Language Models Know When They're Hallucinating References?
RARR: Researching and Revising What Language Models Say, Using Language Models
The Internal State of an LLM Knows When its Lying
SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Measuring Reliability of Large Language Models through Semantic Consistency
Thu Oct 26 Guest Lecture
Eric Mitchell (Stanford University)
Confidence, Calibration, Uncertainty
    Chelsea Finn’s work on Calibration
    Yarin Gal’s work on Uncertainty
    Self-Consistency, GD-Consistency, Prompt-Consistency and other topics
Slides
References:
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Teaching models to express their uncertainty in words
Reducing conversational agents’ overconfidence through linguistic calibration
Week 6
Tue Oct 31 Guest Lecture
Juhan Bae, Cem Anil (University of Toronto, Anthropic)
Explainability: Influence functions LLM training data privacy: membership inference
Slides
References:
Studying Large Language Model Generalization with Influence Functions
Understanding Black-box Predictions via Influence Functions
Estimating Training Data Influence by Tracing Gradient Descent
Representer Point Selection for Explaining Deep Neural Networks
Thu Nov 2 Explainability: Attributions (Mechanistic interpretability)
    IG for text
    Influence patterns for BERT models
Slides
Axiomatic Attribution for Deep Networks
The Explanation Game: Explaining Machine Learning Models Using Shapley Values
Influence Patterns for Explaining Information Flow in BERT
Week 7
Tues Nov 7 No Class (Democracy Day)
Thurs Nov 9 Project mid-term presentations and feedback
Week 8
Tue Nov 14 Project mid-term presentations and feedback
Thu Nov 16 Lecture on LLM agents and multi-modal LLMs
Slides: Evaluating Agents
Slides: Evaluating Multi-Modal RAGs
Thanksgiving Break (Nov 21, Nov 23)
Part IV: Project Presentations
Week 9
Tue Nov 28 No class, extra time to work on final project
Thu Nov 30 Final project presentations (3 - 5 pm, in Allen 101X)
Week 10
Tue Dec 5 Poster and demo session (Fujitsu Conference Room, 4th floor of Gates)
Thu Dec 7 No class