The lecture slides, labs, and assignments will be posted here as the course progresses.
Lecture times are 3pm-4:20pm PST on Tuesdays and Thursdays. All deadlines are at 11:59pm PST.
This schedule is subject to change according to the pace of the class.
Date | Description | Materials | Events |
---|---|---|---|
Part I: Background (Week 1) | |||
Week 1 | |||
Tue Sept 26 |
Trustworthiness of LLMs Course Overview Projects Big picture: LLM tech stack |
Slides |
|
Thu Sept 28 | Guest Lecture: Jerry Liu (LlamaIndex) LlamaIndex for building LLM apps TruLens for LLM app evaluation Intro to Homework 1 |
Slides Homework 1 Introduction Supplemental Materials: LlamaIndex TruLens |
Description: Homework 1 is designed to get you bootstrapped to an LLM prototype and set you up for a project. Homework 1 Due Oct 9th on Gradescope |
Part II: Key LLM Application Areas (Weeks 2, 3, and 4) | |||
Week 2 | |||
Tues Oct 3 |
Guest Lecture: Isabelle Hau (Stanford GSE) Josh Weiss (Stanford GSE) Application areas (education) Project ideas -- evaluations |
Slides | |
Thu Oct 5 | Guest Lecture: Nicholas Carlini (Google DeepMind) Zifan Wang (Center for AI Safety) Application areas (Security) Adversarial attacks on security LLMs for security Project ideas -- evals |
Slides Zifan Wang's Slides |
Homework 1 Due Oct 9th |
Week 3 | |||
Tue Oct 10 | Guest Lecture: Monica Agrawal (LayerHealth) Divya Gopinath (LayerHealth) Application areas (Healthcare) Project ideas - evals |
Guest Lecture Slides |
|
Thu Oct 12 | Space of evaluations
Project ideas |
Slides |
Final project group formations due Saturday Oct. 14th. Refer to Ed for more info. |
Week 4 | |||
Tues Oct 17 | Project proposals and feedback |
||
Thurs Oct 19 | Project proposals and feedback |
||
Part III: LLM Evaluations (Weeks 5 and beyond) | |||
Week 5 | |||
Tue Oct 24 |
RAG triad
Groundedness evaluations
|
Slides References: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks TRUE: Re-evaluating Factual Consistency Evaluation Do Language Models Know When They're Hallucinating References? RARR: Researching and Revising What Language Models Say, Using Language Models The Internal State of an LLM Knows When its Lying SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models Measuring Reliability of Large Language Models through Semantic Consistency |
|
Thu Oct 26 | Guest Lecture Eric Mitchell (Stanford University) Confidence, Calibration, Uncertainty
Yarin Gal’s work on Uncertainty Self-Consistency, GD-Consistency, Prompt-Consistency and other topics |
Slides References: Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback Teaching models to express their uncertainty in words Reducing conversational agents’ overconfidence through linguistic calibration |
|
Week 6 | |||
Tue Oct 31 |
Guest Lecture Juhan Bae, Cem Anil (University of Toronto, Anthropic) Explainability: Influence functions LLM training data privacy: membership inference |
Slides References: Studying Large Language Model Generalization with Influence Functions Understanding Black-box Predictions via Influence Functions Estimating Training Data Influence by Tracing Gradient Descent Representer Point Selection for Explaining Deep Neural Networks |
|
Thu Nov 2 | Explainability: Attributions (Mechanistic interpretability)
Influence patterns for BERT models |
Slides Axiomatic Attribution for Deep Networks The Explanation Game: Explaining Machine Learning Models Using Shapley Values Influence Patterns for Explaining Information Flow in BERT |
|
Week 7 | |||
Tues Nov 7 | No Class (Democracy Day) |
||
Thurs Nov 9 | Project mid-term presentations and feedback |
||
Week 8 | |||
Tue Nov 14 | Project mid-term presentations and feedback
|
||
Thu Nov 16 | Lecture on LLM agents and multi-modal LLMs |
Slides: Evaluating Agents Slides: Evaluating Multi-Modal RAGs |
|
Thanksgiving Break (Nov 21, Nov 23) | |||
Part IV: Project Presentations | |||
Week 9 | |||
Tue Nov 28 | No class, extra time to work on final project | ||
Thu Nov 30 | Final project presentations (3 - 5 pm, in Allen 101X) | ||
Week 10 | |||
Tue Dec 5 | Poster and demo session (Fujitsu Conference Room, 4th floor of Gates) | ||
Thu Dec 7 | No class |