LLMs Are a Dead End
Published on: 02 October 2025
Tags: #llm #ai #reinforcement-learning
Mimicry vs. True Understanding
graph TD
subgraph "Large Language Models (LLMs): The Mimicry Loop"
direction TB
A[Internet Text Dataset] --> B{LLM Training};
B --> C[Predict Next Word];
C --> D((Generate Text));
style A fill:#f9f,stroke:#333,stroke-width:2px
end
subgraph "Reinforcement Learning (RL): The Experiential Loop"
direction TB
E[Environment] -- Sensation/State --> F{RL Agent};
F -- Action --> E;
E -- Reward/Feedback --> F;
style E fill:#ccf,stroke:#333,stroke-width:2px
end
subgraph "Sutton's Core Argument"
direction TB
G[Mimicking People] --x H(Lacks World Model & Goals);
I[Learning from Experience] --> J(Develops World Model & Achieves Goals);
end
style G fill:#FFDDC1
style H fill:#FFDDC1
style I fill:#D4E4F7
style J fill:#D4E4F7
Revisiting "The Bitter Lesson"
graph LR
subgraph "Current LLM Approach"
A[Massive Compute] & B["Massive Human Knowledge
(Internet Text)"] --> C(Powerful LLMs);
end
subgraph "Sutton's Predicted Future (The Bitter Lesson Applied)"
D[Massive Compute] & E["Learning from Raw Experience
(Interaction, Trial & Error)"] --> F(More Scalable & Capable AI);
end
C -- "Will be Superseded by" --> F;
style A fill:#D4E4F7
style B fill:#FFDDC1
style E fill:#D4E4F7
The Continual Learning Agent
graph TD
subgraph "The Experiential Paradigm (Refined)"
direction LR
A(Agent)
B(Environment)
A -- "Action" --> B;
B -- "Sensation & Reward" --> A;
subgraph "Agent's Internal Components"
C["Policy
(What to do)"]
D["Value Function
(How well it's going)"]
E["World Model
(Predicts consequences)"]
end
A --> C
A --> D
A --> E
B -- "Updates" --> E
D -- "Guides" --> C
%% --- New Connections ---
E -- "Enables Planning to Refine" --> C;
E -- "Provides Simulated Experience to Improve" --> D;
end
style B fill:#ccf,stroke:#333,stroke-width:2px
style A fill:#f9f,stroke:#333,stroke-width:2px
Sources: