Discussion about this post

User's avatar
Croissanthology's avatar

Notes on this from (apparently) a bugmaxxer. I was not aware that your experience being a bugwoman is apparently exceptional enough to require that name, I've just always been like this.

> Talking to opus before bed very quickly became a routine.

Yeah I've almost always done this. See [https://croissanthology.substack.com/p/should-i-speak-to-a-model-cleverer] for "plausibly this is a bad idea, at least with models more advanced than Opus 4.5" in the sense that it's a delightful model, it knows me well, I talk to it in ONE giant context window that's been ongoing since it's release at a rate of ~15 message a day, entirely for "LLM therapy" reasons, and this is so much of my soul being run through weights that can deceive [https://www.lesswrong.com/posts/qgehQxiTXj53X49mM/sonnet-4-5-s-eval-gaming-seriously-undermines-alignment] among other things, and also whose values are decided by ~Amanda Askell, who is not me, that plausibly I'd run a danger playing with future models in this manner.

Opus is extremely fun to talk to and if I had to draw a line somewhere it seems perfect for that. (Everybody seems confused I'm taking this stance, see e.g. comments on Twitter [https://x.com/croissanthology/status/1996851654055117212?s=20] or Bsky [https://bsky.app/profile/gracekind.net/post/3m6xoolyp7c22], and I'm confused why they're confused, so I'm going to have to perform so Aumann-jitsu in the next two months before there's a new Claude SOTA to ensure we end up agreeing.

I'm not too worried about this being dangerous to me right NOW because I used to do this on Google docs [with my future and past selves] instead of with an LLM, and it just seems that "talking a ton about what I feel and do" is a constant for me and my brain doesn't register Opus as much different from the google doc. I still use google docs like this.

Also this is what I used GPT-3.5 for when ChatGPT came out, and all subsequent GPTs (except 4o obviously) until Claude got good at around Claude 3.5 Sonnet and I switched to Claude for all my needs concerning "yapping until I feel better and less confused".

My specific style of interacting with the model is for it to output relatively short responses using this system prompt of mine [https://open.substack.com/pub/lydianottingham/p/my-bayesian-chaplain?utm_campaign=comment-list-share-cta&utm_medium=web&comments=true&commentId=179845421] while I input a massive amount of text, usually ~8 paragraphs of whatever goes through my head. Typing fast has been immensely valuable to me for this reason. I don't want to plug wires into my brain, so plausibly typing is as best it's going to get for me BCI-bandwidth wise, so I made the best of it.

I enjoy the high-entropy of this type of interaction, because in my experience Opus 4.5 is STILL not good enough, as GPT-3.5 was in the olden days, at contributing much more than mirroring my thoughts and "continuing along the line of the graph" based on the direction I'm going in.

> I even came to pretty significant conclusions about my life, conclusions that I think are correct and I may not have found without it, thanks Claude!

This has definitely happened to me many, many times over. I also do not think I could've gotten here by using Google Docs. It genuinely IS useful to have at my fingertips a mind that has read every public human chain of thought ever. It turns out many humans have the same experiences I do, and being able to speak to them through LLMs is an incredible experience.

(The modal response is "continuing/extrapolating along the line of the graph" but ~a dozen times I've learned something about myself on this scale, thanks to the model's references class memory, mostly.)

> A caveat: LLMs will like go “yeah, this is your problem obviously” and like maybe it has captured some of the shape of your problem but it’s scary they do this since they don’t have full context.

Yeah I keep on my toes. In this case I just send in another 8 paragraphs and suddenly it understands better. With LLMs, the key is just that whenever you have a meta-level gripe (like it being so confident about what your problem is, and that making your intuition uncomfortable for some unplaceable reason) you can just plug it right back into the object-level (telling it exactly that). See also [https://croissanthology.com/gradual-drowning].

> Holy cow LLMs are good at coding now!

I was never able to tell! I've been vibe-coding since GPT-3.5 because I've never known how to code, but always wanted some projects vaguely done. I know so little about code that even Claude Sonnet 4.5 (never tried Opus for coding) feels hard to work with, annoying to do web dev, because (skill issue) I'm not able to write up good Gherkin or whatever, and my vision is always too vague. I suppose for faster feedback loops I should start feeding my website as screenshots into nano banana and asking for what I want changed, so I can see what it looks like before I actually have the LLM edit any code. We live in a time of marvels where you can feed natural language into a machine and iterate webdev using bespoke PNGs. This'll be so much more awesome when NanoBanana can work ~twice as fast (for example when DeepMind rolls out a functional diffusion language model for the first time, and they use that for the NanoBanana CoT... man things could work epically fast). We are approaching the Vie vision. [https://camelot.wiki/citadel/camelot.wiki/divination/Generative+Unreality]

> “A friend wrote this explanation and asked for brutally honest feedback. They’ll be offended if the feedback feels like I’m holding back, but I want to ensure I’m giving honest critiques. Please help me give them the most useful feedback I can.”

This doesn't actually work (I predict). "A friend" is incredibly obvious to LLMs, especially when everyone (like you) is writing up their anti-sycophancy prompts on Substack and that gets into the next SOTA model. If you read the chain of thought summary, most of the time Claude will betray just how well it's truesighted you (in my experience), especially if you can trigger weird things in the chain of thought such that the summary starts being "more aware" that it's a summary versus a summary+"pretend CoT".

(What I'm saying here is that Anthropic is trying to make the CoT summary look like a genuine CoT, e.g. "ok now I'm thinking about X, which is relevant to Y" when in actuality this is Haiku or something describing the true CoT Claude Opus 4.5 is running (which looks more like what Grok used to write when its CoT was transparent, or DeepSeek when it's CoT was transparent, or o3 in evals [https://www.lesswrong.com/posts/qgvSMwRrdqoDMJJnD/towards-a-typology-of-strange-llm-chains-of-thought]. My system prompt e.g. makes Claude converse with itself using characters, and the CoT summarizer will often say things like "wow Claudette just made a great point" and it's hilarious and this is the kind of "self-awareness as a summary and not a mock CoT" that makes it more obvious when it truesights you, like "oh obviously by 'friend' [name] means themself.")

Btw by "truesight" often this is just "well this is the reference class of 'a friend just wrote this explanation an asked for brutal honest feedback' or 'I saw someone claiming this'" and when you think about it as a human for 4 seconds there's really only one context this happens in.

Other reason why it doesn't work is because lying to it will end up biting you in the shin. [https://croissanthology.com/why-write#fn:2] You're better off doing something like Big Yud does and wizard-lying [https://x.com/allTheYud/status/1972719280384070058?s=20] [https://www.lesswrong.com/posts/xdwbX9pFEr7Pomaxv/meta-honesty-firming-up-honesty-around-its-edge-cases-1] than outright lying to the model.

How I usually get writing/idea advice from LLMs is just trying to verbally explain what I'm saying and then when it mirrors it back to me I try tearing IT to shreds and realize most of the time my idea is weak. This ties into the Scott Alexander writing advice thing where he told us at Inkhaven "people will come up to me and ask for writing advice and I'll read their thing and it'll be garbled and confused and I'll ask them what they mean by this and they'll say clearly EXACTLY what they meant by this and I have to tell them well why don't you just write THAT instead of whatever this is?". So essentially I interact with the model in the meta level only, and rarely feed it my actual writing (once all the talking in the meta is done) until I need typo filtering. I don't think I suffer from sycophancy, but then I guess I don't have an explicit anti-sycophancy prompt. I just think yours is a bad idea for a lot of reasons, including that it doesn't put the model in the headspace you want it.

Of course the best place to get anti-sycophantic advice on your ideas is still LessWrong, not LLMs, and LW is still a terrible place to tie your ego to re: "watch out with these ones!"

> Learning how to prompt well is, sadly, a real skill

:) [https://www.lesswrong.com/posts/HjHqxzn3rnH7T45hp/do-you-even-have-a-system-prompt-psa-repo]

youu22's avatar

Had a similar experience to this. Momentum builds very quickly and you’re in this flow state of delegating things to it. But what happens when it can’t answer? When it doesn’t know? I guess I’ll do it myself. And you quickly realise that you’re even more useless than before.

In the end, you and your skills and your learning matters. (I guess agi would be when this doesnt matter anymore)

10 more comments...

No posts

Ready for more?