Juni_DEV

[CS285: Deep RL 2023] Lecture 1, Introduction 본문

Robotics/Reinforcement Learning

[CS285: Deep RL 2023] Lecture 1, Introduction

junni :p 2025. 7. 21. 17:01
반응형

I’ve finally picked up the CS285 reinforcement learning lectures again, after putting them off for a while...
Let’s get it!

What is reinforce learning?

  • Mathematical formalism for learning-based decision making
  • Approach for learning decision making and control from experience

How is this different from other machine learning topics?

  1. Standard (supervised) machine learning
    1. Usually assumes: i.i.d data
    2. known ground truth outputs in training
  2. Reinforcement learning
    1. Data is not i.i.d.: previous outputs influence future inputs!
    2. Groudn truth andwer is not known, only know if we succeeded or failed
      1. more generally, we know the reward

Good policy is one that maximizes the cumulative total reward
so not just the rewarding the point in time but the total reward the agent receives.

So where does that leave us?

  1. Data-Driven AI ⇒ All about using data
    • Pros: Learns about the real world from data
    • Cons: Doesn’t try to do better than the data
  2. Reinforcement Learning ⇒ All about optimization
    • Pros: Optimizes a goal with emergent behavior
    • Cons: But need to figure out how to use at scale!

Data without Optimization doesn’t allow us to solve new problems in new ways

“The Bitter Lesson” - Richard Sutton

Learning

  • Use data to extract patterns
  • allows us to understand the world

Search

  • Use computation to extract inferences
  • leverages that understanding for emergence
  • ⇒ Some optimization process that used(typically iterative) computation to make rational decisions

Optimization without data is hard to apply to the real world outside of simulators

What other problems do we need to solve to enable real-world sequential decision making?

Beyond learning from reward

  • Basic reinforcement learning deals with maximizing rewards
  • This is not the only problem that matters for sequential decision making!
  • We will cover more advanced topics
    • Learning reward functions from example (inverse reinforcement learning)
    • Transferring knowledge between domains (transfer learning, meta-learning)
    • Learning to predict and using prediction to act

Are there other forms of supervision?

  • Learning from demonstrations
    • Directly copying observed behavior
    • Inferring rewards from observed behavior (inverse reinforcement learning)
  • Learning from observing the world
    • Learning to predict
    • Unsupervised learning
  • Learning from other tasks
    • Transfer learning
    • Meta-Learning: learning to learn

How do we build intelligent machines?

  • Imagine you have to build an intellient machine, where do you start?

Learning as the basis of Intelligence

  • Some things we can all do (e.g. walking)
  • Some things we can only learn (e.g. driving car)
  • We can learn a huge variety of things, including very difficult things
  • Therefore our learning mechanism(s) are likely powerful enough to do everything we associate with intelligence
    • But it may still very convenient to “hard-code” a few really important bits

A single algorithm?

  • An algorithm for each “module”?
  • Or a single flexible algorithm?

What must that single algorithm do?

  • Interpret rich sensory inputs
  • Choose complex actions

Why deep reinforcement learning?

  • Deep = scalable learning from large, complex datasets
  • Reinforcement learning = optimization

What challenges still remain?

  • We have great methods that can learn from huge amounts of data
  • We have great optimization methods for RL
  • We don’t (yet) have amazing methods that both use data and RL
  • Humans can learn incredibly quickly, deep RL methods are usually slow
  • Humans reuse past knowledge, transfer learning in RL is an open problem
  • Not clear what the reward function should be
  • Not clear what the role of prediction should be

https://rail.eecs.berkeley.edu/deeprlcourse/

https://rail.eecs.berkeley.edu/deeprlcourse/deeprlcourse/static/slides/lec-1.pdf

https://www.youtube.com/playlist?list=PL_iWQOsE6TfVYGEGiAOMaOzzv41Jfm_Ps

 

CS 285: Deep RL, 2023

Playlist for videos for the UC Berkeley CS 285: Deep Reinforcement Learning course, fall 2023.

www.youtube.com

반응형
Comments