Rendered at 20:46:57 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
brian-m 11 minutes ago [-]
I haven't done it with Claude Code itself, but I've built a deterministic + inferential engine that works inside a defined discipline with defined purposes. Broad enough to be practical, niche enough for most of the world to look at me funny.
The short version: let logic decide, and if there are multiple solutions let the model reason within a fixed range grounded by GraphRAG. Test every output against the logic, re-parse on contradiction, and emit 'unsure' after a couple of iterations rather than guessing.
It's no use for general knowledge. But where the judgement is largely codifiable it holds up well. There's an edge case out there that'll turn it to custard, I just haven't found it yet.
I've connected Claude Desktop to it over MCP and the results are good, not great. I designed the thing so I'm working in the sweet spot and there's still the occasional WTF.
Leftium 23 hours ago [-]
I've wondered if it would be possible (and beneficial) to make LLM's deterministic via a seed. Like how PRNG can specify a seed for repeatable deterministic pseudorandom numbers.
Theoretically, if you could specify a seed and the exact version of the model the output should always be the same. I wonder if this is possible with any open-weight models today?
---
On a more practical level, scripts (small programs) are deterministic so having the coding agent write (and possibly reuse) scripts might help.
23 hours ago [-]
anonym29 1 days ago [-]
Anyone telling you they have tamed LLMs into producing 100% deterministic answers has either scoped the problem space so narrowly as to border on meaningless (e.g. "Is earth flat?" with a structured output schema of a single JSON boolean value), hasn't done robust statistical validation to actually confirm truly deterministic outputs, or both.
LLMs are fundamentally non-deterministic. Trying to use them to solve deterministic problem spaces is selecting the wrong tool for the job, and expecting them to be 100% reliable is the wrong mindset for working with them.
hbarka 1 days ago [-]
I agree with your example about a deterministic answer but what I’m looking for is deterministic process. I seek the LLM’s opinion, not a boolean answer. For example, having an agentic skill or hook to do a SWOT analysis may one day (out of 1000 consistent days) result in the agent just produce S-W-O and no T in the process because it was simply context muddled that day.
anonym29 1 days ago [-]
In that case, you don't want to be using Claude Code, which is more of a consumer product; you instead want to control the inference stack yourself. What you are looking for is structured output (you give the inference engine a JSON schema you define that the response must conform to) + a JSON schema validator that parses the output, checks if the response is valid JSON. If it is, you're good to go, if not, run the inference again. llama.cpp supports structured outputs, as do some more consumer-oriented tools that wrap like LM Studio. If you don't want to buy hardware yourself or pay exorbitant cloud rental prices, p2p GPU rental marketplaces like vast.ai can offer much more economical options.
magicalhippo 1 days ago [-]
Right, but do you care about how the sausage was made, or just how it looks and tastes?
You can get Claude Code to fulfill some interface contract with almost certainty. Exactly how it does that will vary between runs.
So to me the more interesting question is, what exactly is it you care about inside the sausage, and how do you verify that it's there in the right amounts?
The short version: let logic decide, and if there are multiple solutions let the model reason within a fixed range grounded by GraphRAG. Test every output against the logic, re-parse on contradiction, and emit 'unsure' after a couple of iterations rather than guessing.
It's no use for general knowledge. But where the judgement is largely codifiable it holds up well. There's an edge case out there that'll turn it to custard, I just haven't found it yet.
I've connected Claude Desktop to it over MCP and the results are good, not great. I designed the thing so I'm working in the sweet spot and there's still the occasional WTF.
Theoretically, if you could specify a seed and the exact version of the model the output should always be the same. I wonder if this is possible with any open-weight models today?
---
On a more practical level, scripts (small programs) are deterministic so having the coding agent write (and possibly reuse) scripts might help.
LLMs are fundamentally non-deterministic. Trying to use them to solve deterministic problem spaces is selecting the wrong tool for the job, and expecting them to be 100% reliable is the wrong mindset for working with them.
You can get Claude Code to fulfill some interface contract with almost certainty. Exactly how it does that will vary between runs.
So to me the more interesting question is, what exactly is it you care about inside the sausage, and how do you verify that it's there in the right amounts?