10 Comments
User's avatar
Lydia Nottingham's avatar

speculation on proprietary secrets is one of my fav blog post genres, & i'm optimistic about us getting to see how accurate this was at some point

Re:Courses's avatar

Ty deep mind nerd. I hope you have a good day

Tim Dingman's avatar

NIAH is really not that good of a long context benchmark. Like it's fine but pretty limited. RULER for example I think is more complete.

The ideal long context benchmark also tests holistic understanding like "what are the overall themes of this novel I've attached"

Everett's avatar

I think we’ll eventually find out about what was going on in the frontier labs, it just may take a couple decades

Garloid 64's avatar

Well, at least we'll get to know how it was done once one of the Chinese lab manages to replicate this kind of performance...

Nanthew Shandridan's avatar

"Datacenter compute is set to roughly double every year."

That is wishfull thinking by those who have taken out a lot of money investing in this and need that to be true.

1. Peak Copper. "AI data centers. Copper is an essential component in everything electric due to it’s high heat and electrical conductivity, surpassed only by silver. Copper wires can be found in everything from power generation, transmission, and distribution systems to electronics circuitry, telecommunications, and numerous types of electrical equipment..." but:

https://thehonestsorcerer.substack.com/p/running-on-empty-copper/comment/185346136

2. Peak everything (including power): https://thehonestsorcerer.substack.com/p/2025-the-year-of-peak-everything

Frankly even if there were the resources (there isn't) the debt bubble is about to blow taking a lot of data center investment with it. I do grant you with say 1 billion or less people and all the liberated resources devoted to data centers you would be right, and maybe that is the way the world is going, but not in the nearer future.

0k's avatar

I have heard there’s no 3.5 being tested internally, unless you mean ‘trained’ not ‘tested’. In that case IDK.

Regarding costs, Google is the only one in the race with money in the bank. Perhaps they consider usage right now strategically important enough to give away?

Giampiero Campa's avatar

Well, we humans do it almost entirely with (continual, model based) RL, don't we?

Substack Enjoyer's avatar

god i wish i was smart enough to fully understand stuff like this

swiley's avatar

They might have just used fewer attention heads and done some kind of needle/haystack RLVR.