
180: Reinforcement Learning
Falha ao colocar no Carrinho.
Tente novamente mais tarde
Falha ao adicionar à Lista de Desejos.
Tente novamente mais tarde
Falha ao remover da Lista de Desejos
Tente novamente mais tarde
Falha ao adicionar à Biblioteca
Tente outra vez
Falha ao seguir podcast
Tente outra vez
Falha ao parar de seguir podcast
Tente outra vez
-
Narrado por:
-
De:
Sobre este áudio
Intro topic: Grills
News/Links:
- You can’t call yourself a senior until you’ve worked on a legacy project
- https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
- Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
- NASA has a list of 10 rules for software development
- https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
- AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
- https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre
Book of the Show
- Patrick:
- The Player of Games (Ian M Banks)
- https://a.co/d/1ZpUhGl (non-affiliate)
- The Player of Games (Ian M Banks)
- Jason:
- Basic Roleplaying Universal Game Engine
- https://amzn.to/3ES4p5i
- https://amzn.to/3ES4p5i
- Basic Roleplaying Universal Game Engine
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h
Tool of the Show
- Patrick:
- Pokemon Sword and Shield
- Jason:
- Features and Labels ( https://fal.ai )
Topic: Reinforcement Learning
- Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Online vs Offline RL
- Optimization algorithms
- Value optimization
- SARSA
- Q-Learning
- Policy optimization
- Policy Gradients
- Actor-Critic
- Proximal Policy Optimization
- Value optimization
- Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
- Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
- Policy Evaluation
- Propensity scoring versus model-based
- Challenges to training RL model
- Two optimization loops
- Collecting feedback vs updating the model
- Difficult optimization target
- Policy evaluation
- Two optimization loops
- RLHF & GRPO
O que os ouvintes dizem sobre 180: Reinforcement Learning
Nota média dos ouvintes. Apenas ouvintes que tiverem escutado o título podem escrever avaliações.Avaliações - Selecione as abas abaixo para mudar a fonte das avaliações.
Nenhuma revisão disponível