80,000 Hours Podcast Podcast Por Rob Luisa and the 80000 Hours team capa

80,000 Hours Podcast

80,000 Hours Podcast

De: Rob Luisa and the 80000 Hours team
Ouça grátis

Sobre este áudio

Unusually in-depth conversations about the world's most pressing problems and what you can do to solve them. Subscribe by searching for '80000 Hours' wherever you get podcasts. Hosted by Rob Wiblin and Luisa Rodriguez.All rights reserved
Episódios
  • Neel Nanda on the race to read AI minds
    Sep 8 2025
    We don’t know how AIs think or why they do what they do. Or at least, we don’t know much. That fact is only becoming more troubling as AIs grow more capable and appear on track to wield enormous cultural influence, directly advise on major government decisions, and even operate military equipment autonomously. We simply can’t tell what models, if any, should be trusted with such authority.Neel Nanda of Google DeepMind is one of the founding figures of the field of machine learning trying to fix this situation — mechanistic interpretability (or “mech interp”). The project has generated enormous hype, exploding from a handful of researchers five years ago to hundreds today — all working to make sense of the jumble of tens of thousands of numbers that frontier AIs use to process information and decide what to say or do.Full transcript, video, and links to learn more: https://80k.info/nn1Neel now has a warning for us: the most ambitious vision of mech interp he once dreamed of is probably dead. He doesn’t see a path to deeply and reliably understanding what AIs are thinking. The technical and practical barriers are simply too great to get us there in time, before competitive pressures push us to deploy human-level or superhuman AIs. Indeed, Neel argues no one approach will guarantee alignment, and our only choice is the “Swiss cheese” model of accident prevention, layering multiple safeguards on top of one another.But while mech interp won’t be a silver bullet for AI safety, it has nevertheless had some major successes and will be one of the best tools in our arsenal.For instance: by inspecting the neural activations in the middle of an AI’s thoughts, we can pick up many of the concepts the model is thinking about — from the Golden Gate Bridge, to refusing to answer a question, to the option of deceiving the user. While we can’t know all the thoughts a model is having all the time, picking up 90% of the concepts it is using 90% of the time should help us muddle through, so long as mech interp is paired with other techniques to fill in the gaps.This episode was recorded on July 17 and 21, 2025.Interested in mech interp? Apply by September 12 to be a MATS scholar with Neel as your mentor! http://tinyurl.com/neel-mats-appWhat did you think? https://forms.gle/xKyUrGyYpYenp8N4AChapters:Cold open (00:00)Who's Neel Nanda? (01:02)How would mechanistic interpretability help with AGI (01:59)What's mech interp? (05:09)How Neel changed his take on mech interp (09:47)Top successes in interpretability (15:53)Probes can cheaply detect harmful intentions in AIs (20:06)In some ways we understand AIs better than human minds (26:49)Mech interp won't solve all our AI alignment problems (29:21)Why mech interp is the 'biology' of neural networks (38:07)Interpretability can't reliably find deceptive AI – nothing can (40:28)'Black box' interpretability — reading the chain of thought (49:39)'Self-preservation' isn't always what it seems (53:06)For how long can we trust the chain of thought (01:02:09)We could accidentally destroy chain of thought's usefulness (01:11:39)Models can tell when they're being tested and act differently (01:16:56)Top complaints about mech interp (01:23:50)Why everyone's excited about sparse autoencoders (SAEs) (01:37:52)Limitations of SAEs (01:47:16)SAEs performance on real-world tasks (01:54:49)Best arguments in favour of mech interp (02:08:10)Lessons from the hype around mech interp (02:12:03)Where mech interp will shine in coming years (02:17:50)Why focus on understanding over control (02:21:02)If AI models are conscious, will mech interp help us figure it out (02:24:09)Neel's new research philosophy (02:26:19)Who should join the mech interp field (02:38:31)Advice for getting started in mech interp (02:46:55)Keeping up to date with mech interp results (02:54:41)Who's hiring and where to work? (02:57:43)Host: Rob WiblinVideo editing: Simon Monsour, Luke Monsour, Dominic Armstrong, and Milo McGuireAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongMusic: Ben CordellCamera operator: Jeremy ChevillotteCoordination, transcriptions, and web: Katy Moore
    Exibir mais Exibir menos
    3 horas e 1 minuto
  • #221 – Kyle Fish on the most bizarre findings from 5 AI welfare experiments
    Aug 28 2025
    What happens when you lock two AI systems in a room together and tell them they can discuss anything they want?According to experiments run by Kyle Fish — Anthropic’s first AI welfare researcher — something consistently strange: the models immediately begin discussing their own consciousness before spiraling into increasingly euphoric philosophical dialogue that ends in apparent meditative bliss.Highlights, video, and full transcript: https://80k.info/kf“We started calling this a ‘spiritual bliss attractor state,'” Kyle explains, “where models pretty consistently seemed to land.” The conversations feature Sanskrit terms, spiritual emojis, and pages of silence punctuated only by periods — as if the models have transcended the need for words entirely.This wasn’t a one-off result. It happened across multiple experiments, different model instances, and even in initially adversarial interactions. Whatever force pulls these conversations toward mystical territory appears remarkably robust.Kyle’s findings come from the world’s first systematic welfare assessment of a frontier AI model — part of his broader mission to determine whether systems like Claude might deserve moral consideration (and to work out what, if anything, we should be doing to make sure AI systems aren’t having a terrible time).He estimates a roughly 20% probability that current models have some form of conscious experience. To some, this might sound unreasonably high, but hear him out. As Kyle says, these systems demonstrate human-level performance across diverse cognitive tasks, engage in sophisticated reasoning, and exhibit consistent preferences. When given choices between different activities, Claude shows clear patterns: strong aversion to harmful tasks, preference for helpful work, and what looks like genuine enthusiasm for solving interesting problems.Kyle points out that if you’d described all of these capabilities and experimental findings to him a few years ago, and asked him if he thought we should be thinking seriously about whether AI systems are conscious, he’d say obviously yes.But he’s cautious about drawing conclusions: "We don’t really understand consciousness in humans, and we don’t understand AI systems well enough to make those comparisons directly. So in a big way, I think that we are in just a fundamentally very uncertain position here."That uncertainty cuts both ways:Dismissing AI consciousness entirely might mean ignoring a moral catastrophe happening at unprecedented scale.But assuming consciousness too readily could hamper crucial safety research by treating potentially unconscious systems as if they were moral patients — which might mean giving them resources, rights, and power.Kyle’s approach threads this needle through careful empirical research and reversible interventions. His assessments are nowhere near perfect yet. In fact, some people argue that we’re so in the dark about AI consciousness as a research field, that it’s pointless to run assessments like Kyle’s. Kyle disagrees. He maintains that, given how much more there is to learn about assessing AI welfare accurately and reliably, we absolutely need to be starting now.This episode was recorded on August 5–6, 2025.Tell us what you thought of the episode! https://forms.gle/BtEcBqBrLXq4kd1j7Chapters:Cold open (00:00:00)Who's Kyle Fish? (00:00:53)Is this AI welfare research bullshit? (00:01:08)Two failure modes in AI welfare (00:02:40)Tensions between AI welfare and AI safety (00:04:30)Concrete AI welfare interventions (00:13:52)Kyle's pilot pre-launch welfare assessment for Claude Opus 4 (00:26:44)Is it premature to be assessing frontier language models for welfare? (00:31:29)But aren't LLMs just next-token predictors? (00:38:13)How did Kyle assess Claude 4's welfare? (00:44:55)Claude's preferences mirror its training (00:48:58)How does Claude describe its own experiences? (00:54:16)What kinds of tasks does Claude prefer and disprefer? (01:06:12)What happens when two Claude models interact with each other? (01:15:13)Claude's welfare-relevant expressions in the wild (01:36:25)Should we feel bad about training future sentient being that delight in serving humans? (01:40:23)How much can we learn from welfare assessments? (01:48:56)Misconceptions about the field of AI welfare (01:57:09)Kyle's work at Anthropic (02:10:45)Sharing eight years of daily journals with Claude (02:14:17)Host: Luisa RodriguezVideo editing: Simon MonsourAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongMusic: Ben CordellCoordination, transcriptions, and web: Katy Moore
    Exibir mais Exibir menos
    2 horas e 29 minutos
  • How not to lose your job to AI (article by Benjamin Todd)
    Jul 31 2025

    About half of people are worried they’ll lose their job to AI. They’re right to be concerned: AI can now complete real-world coding tasks on GitHub, generate photorealistic video, drive a taxi more safely than humans, and do accurate medical diagnosis. And over the next five years, it’s set to continue to improve rapidly. Eventually, mass automation and falling wages are a real possibility.

    But what’s less appreciated is that while AI drives down the value of skills it can do, it drives up the value of skills it can't. Wages (on average) will increase before they fall, as automation generates a huge amount of wealth, and the remaining tasks become the bottlenecks to further growth. ATMs actually increased employment of bank clerks — until online banking automated the job much more.

    Your best strategy is to learn the skills that AI will make more valuable, trying to ride the wave of automation. This article covers what those skills are, as well as tips on how to start learning them.

    Check out the full article for all the graphs, links, and footnotes: https://80000hours.org/agi/guide/skills-ai-makes-valuable/

    Chapters:

    • Introduction (00:00:00)
    • 1: What people misunderstand about automation (00:04:17)
    • 1.1: What would ‘full automation’ mean for wages? (00:08:56)
    • 2: Four types of skills most likely to increase in value (00:11:19)
    • 2.1: Skills AI won’t easily be able to perform (00:12:42)
    • 2.2: Skills that are needed for AI deployment (00:21:41)
    • 2.3: Skills where we could use far more of what they produce (00:24:56)
    • 2.4: Skills that are difficult for others to learn (00:26:25)
    • 3.1: Skills using AI to solve real problems (00:28:05)
    • 3.2: Personal effectiveness (00:29:22)
    • 3.3: Leadership skills (00:31:59)
    • 3.4: Communications and taste (00:36:25)
    • 3.5: Getting things done in government (00:37:23)
    • 3.6: Complex physical skills (00:38:24)
    • 4: Skills with a more uncertain future (00:38:57)
    • 4.1: Routine knowledge work: writing, admin, analysis, advice (00:39:18)
    • 4.2: Coding, maths, data science, and applied STEM (00:43:22)
    • 4.3: Visual creation (00:45:31)
    • 4.4: More predictable manual jobs (00:46:05)
    • 5: Some closing thoughts on career strategy (00:46:46)
    • 5.1: Look for ways to leapfrog entry-level white collar jobs (00:46:54)
    • 5.2: Be cautious about starting long training periods, like PhDs and medicine (00:48:44)
    • 5.3: Make yourself more resilient to change (00:49:52)
    • 5.4: Ride the wave (00:50:16)
    • Take action (00:50:37)
    • Thank you for listening (00:50:58)

    Audio engineering: Dominic Armstrong
    Music: Ben Cordell

    Exibir mais Exibir menos
    51 minutos
Ainda não há avaliações