| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
- memory bank
- LeNet-5
- 코딩테스트
- leetcode
- CS285
- Github
- multimodal machine learning
- Python
- 백준
- CNN
- MySQL
- Kaggle
- Server
- 용어
- tensorflow
- Linux
- vision-language-action
- quantification
- error
- autogluon
- hackerrank
- Anaconda
- Artificial Intelligence
- Reinforcement Learning
- ma-lmm
- sliding video q-former
- long video understanding
- transference
- jmeter
- deeprl
- Today
- Total
| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
- memory bank
- LeNet-5
- 코딩테스트
- leetcode
- CS285
- Github
- multimodal machine learning
- Python
- 백준
- CNN
- MySQL
- Kaggle
- Server
- 용어
- tensorflow
- Linux
- vision-language-action
- quantification
- error
- autogluon
- hackerrank
- Anaconda
- Artificial Intelligence
- Reinforcement Learning
- ma-lmm
- sliding video q-former
- long video understanding
- transference
- jmeter
- deeprl
- Today
- Total
Juni_DEV
[CS285: Deep RL 2023] Lecture 1, Introduction 본문
[CS285: Deep RL 2023] Lecture 1, Introduction
junni :p 2025. 7. 21. 17:01 I’ve finally picked up the CS285 reinforcement learning lectures again, after putting them off for a while...
Let’s get it!
What is reinforce learning?
- Mathematical formalism for learning-based decision making
- Approach for learning decision making and control from experience

How is this different from other machine learning topics?
- Standard (supervised) machine learning
- Usually assumes: i.i.d data
- known ground truth outputs in training
- Reinforcement learning
- Data is not i.i.d.: previous outputs influence future inputs!
- Groudn truth andwer is not known, only know if we succeeded or failed
- more generally, we know the reward
Good policy is one that maximizes the cumulative total reward
so not just the rewarding the point in time but the total reward the agent receives.
So where does that leave us?
- Data-Driven AI ⇒ All about using data
- Pros: Learns about the real world from data
- Cons: Doesn’t try to do better than the data
- Reinforcement Learning ⇒ All about optimization
- Pros: Optimizes a goal with emergent behavior
- Cons: But need to figure out how to use at scale!
⇒ Data without Optimization doesn’t allow us to solve new problems in new ways
“The Bitter Lesson” - Richard Sutton
Learning
- Use data to extract patterns
- allows us to understand the world
Search
- Use computation to extract inferences
- leverages that understanding for emergence
- ⇒ Some optimization process that used(typically iterative) computation to make rational decisions
⇒ Optimization without data is hard to apply to the real world outside of simulators
What other problems do we need to solve to enable real-world sequential decision making?
Beyond learning from reward
- Basic reinforcement learning deals with maximizing rewards
- This is not the only problem that matters for sequential decision making!
- We will cover more advanced topics
- Learning reward functions from example (inverse reinforcement learning)
- Transferring knowledge between domains (transfer learning, meta-learning)
- Learning to predict and using prediction to act
Are there other forms of supervision?
- Learning from demonstrations
- Directly copying observed behavior
- Inferring rewards from observed behavior (inverse reinforcement learning)
- Learning from observing the world
- Learning to predict
- Unsupervised learning
- Learning from other tasks
- Transfer learning
- Meta-Learning: learning to learn
How do we build intelligent machines?
- Imagine you have to build an intellient machine, where do you start?
Learning as the basis of Intelligence
- Some things we can all do (e.g. walking)
- Some things we can only learn (e.g. driving car)
- We can learn a huge variety of things, including very difficult things
- Therefore our learning mechanism(s) are likely powerful enough to do everything we associate with intelligence
- But it may still very convenient to “hard-code” a few really important bits
A single algorithm?
- An algorithm for each “module”?
- Or a single flexible algorithm?
What must that single algorithm do?
- Interpret rich sensory inputs
- Choose complex actions
Why deep reinforcement learning?
- Deep = scalable learning from large, complex datasets
- Reinforcement learning = optimization
What challenges still remain?
- We have great methods that can learn from huge amounts of data
- We have great optimization methods for RL
- We don’t (yet) have amazing methods that both use data and RL
- Humans can learn incredibly quickly, deep RL methods are usually slow
- Humans reuse past knowledge, transfer learning in RL is an open problem
- Not clear what the reward function should be
- Not clear what the role of prediction should be
https://rail.eecs.berkeley.edu/deeprlcourse/
https://rail.eecs.berkeley.edu/deeprlcourse/deeprlcourse/static/slides/lec-1.pdf
https://www.youtube.com/playlist?list=PL_iWQOsE6TfVYGEGiAOMaOzzv41Jfm_Ps
CS 285: Deep RL, 2023
Playlist for videos for the UC Berkeley CS 285: Deep Reinforcement Learning course, fall 2023.
www.youtube.com
'Robotics > Reinforcement Learning' 카테고리의 다른 글
| [CS285: Deep RL 2023] Lecture 3, PyTorch and Neural Nets (0) | 2025.07.23 |
|---|---|
| [CS285: Deep RL 2023] Lecture 2, Imitation Learning (4) | 2025.07.21 |