-

@ S!ayer
2025-05-11 21:43:01
not from LLM based models, no.
RLVR models are the new method, reinforcement learning with verifiable rewards - but then also zero data/zero knowledge based learning.
In other words, they have AI teach AI, become self aware. Reinforced self-play reasoning with zero data. So basically it starts as an SI, iterates, teaches itself based on it's own inputs/outputs, iterates again all without any human inputs (data or prompts instruction)
This new method allows for verified rewards to be the tool that defines the ai reasoning model