« 2026/02 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Tags more

Archives

Today

Total

« 2026/02 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

Juni_DEV

[CS285: Deep RL 2023] Lecture 1, Introduction 본문

Robotics/Reinforcement Learning

[CS285: Deep RL 2023] Lecture 1, Introduction

junni :p 2025. 7. 21. 17:01

I’ve finally picked up the CS285 reinforcement learning lectures again, after putting them off for a while...
Let’s get it!

What is reinforce learning?

Mathematical formalism for learning-based decision making
Approach for learning decision making and control from experience

How is this different from other machine learning topics?

Standard (supervised) machine learning
1. Usually assumes: i.i.d data
2. known ground truth outputs in training
Reinforcement learning
1. Data is not i.i.d.: previous outputs influence future inputs!
2. Groudn truth andwer is not known, only know if we succeeded or failed
  1. more generally, we know the reward

Good policy is one that maximizes the cumulative total reward
so not just the rewarding the point in time but the total reward the agent receives.

So where does that leave us?

Data-Driven AI ⇒ All about using data
- Pros: Learns about the real world from data
- Cons: Doesn’t try to do better than the data
Reinforcement Learning ⇒ All about optimization
- Pros: Optimizes a goal with emergent behavior
- Cons: But need to figure out how to use at scale!

⇒ Data without Optimization doesn’t allow us to solve new problems in new ways

“The Bitter Lesson” - Richard Sutton

Learning

Use data to extract patterns
allows us to understand the world

Search

Use computation to extract inferences
leverages that understanding for emergence
⇒ Some optimization process that used(typically iterative) computation to make rational decisions

⇒ Optimization without data is hard to apply to the real world outside of simulators

What other problems do we need to solve to enable real-world sequential decision making?

Beyond learning from reward

Basic reinforcement learning deals with maximizing rewards
This is not the only problem that matters for sequential decision making!
We will cover more advanced topics
- Learning reward functions from example (inverse reinforcement learning)
- Transferring knowledge between domains (transfer learning, meta-learning)
- Learning to predict and using prediction to act

Are there other forms of supervision?

Learning from demonstrations
- Directly copying observed behavior
- Inferring rewards from observed behavior (inverse reinforcement learning)
Learning from observing the world
- Learning to predict
- Unsupervised learning
Learning from other tasks
- Transfer learning
- Meta-Learning: learning to learn

How do we build intelligent machines?

Imagine you have to build an intellient machine, where do you start?

Learning as the basis of Intelligence

Some things we can all do (e.g. walking)
Some things we can only learn (e.g. driving car)
We can learn a huge variety of things, including very difficult things
Therefore our learning mechanism(s) are likely powerful enough to do everything we associate with intelligence
- But it may still very convenient to “hard-code” a few really important bits

A single algorithm?

An algorithm for each “module”?
Or a single flexible algorithm?

What must that single algorithm do?

Interpret rich sensory inputs
Choose complex actions

Why deep reinforcement learning?

Deep = scalable learning from large, complex datasets
Reinforcement learning = optimization

What challenges still remain?

We have great methods that can learn from huge amounts of data
We have great optimization methods for RL
We don’t (yet) have amazing methods that both use data and RL
Humans can learn incredibly quickly, deep RL methods are usually slow
Humans reuse past knowledge, transfer learning in RL is an open problem
Not clear what the reward function should be
Not clear what the role of prediction should be

https://rail.eecs.berkeley.edu/deeprlcourse/

https://rail.eecs.berkeley.edu/deeprlcourse/deeprlcourse/static/slides/lec-1.pdf

https://www.youtube.com/playlist?list=PL_iWQOsE6TfVYGEGiAOMaOzzv41Jfm_Ps

CS 285: Deep RL, 2023

Playlist for videos for the UC Berkeley CS 285: Deep Reinforcement Learning course, fall 2023.

www.youtube.com

저작자표시 (새창열림)

'Robotics > Reinforcement Learning' 카테고리의 다른 글

[CS285: Deep RL 2023] Lecture 3, PyTorch and Neural Nets (0)	2025.07.23
[CS285: Deep RL 2023] Lecture 2, Imitation Learning (4)	2025.07.21

'Robotics/Reinforcement Learning' Related Articles

Comments