Skip to content
The Executives BriefThe Executives BriefBeta

Springboards’ Flint pushes LLMs past “groupthink” by targeting randomness where it matters

Instead of cranking temperature globally, Flint injects “oddballs” at key moments to diversify outputs.

ByOmar Al-BalawiTechnology Correspondent, The Executives Brief
·5 min read
Springboards’ Flint pushes LLMs past “groupthink” by targeting randomness where it matters
Executive summary

Springboards, led by CEO and cofounder Pip Bingemann with CTO Kieran Browne, built an LLM called Flint on top of Qwen 3 to reduce repetitive, consensus-like responses. For decision-makers, it signals a new product lever in LLM systems: controlled novelty without sacrificing reliability.

If your favorite chatbot keeps handing you the same “creative” answers as everyone else, you are not imagining it. Springboards’ Flint is designed to break that pattern by injecting variety into the exact places where an LLM can afford to be different, rather than just turning up randomness and hoping for the best.

The clearest demo comes from a simple game Bingemann ran with Claude, ChatGPT, and Gemini: “Give me a random number between 1 and 10.” Claude and ChatGPT returned 7, then Bingemann prompted Flint, and it also returned 7. But when Bingemann restarted and prompted again, the mainstream models converged on 7 again while Flint returned 3.7916. Same question. Same models. Different outcome. The point is not that Flint is “more correct.” The point is that mainstream LLM behavior can converge into familiar, high-probability responses, even when the prompt sounds open-ended.

That convergence matters because the LLM use case most people actually want from these tools is often also the use case where repetition hurts. Coding and research can tolerate predictability, but brainstorming, planning, and ideation are where groupthink is a problem. Bingemann frames the issue bluntly through his sales trick. He says most language models have a narrower “groove” than users assume. And when you watch the outputs back-to-back, the groove is visible: when prompted to name a type of car, he predicted the mainstream models would go for Toyota or Honda, and they did. Flint suggested a Ford F-150. When asked for a New Balance campaign tagline, Claude and ChatGPT both answered “Run your way.” Flint came back with “Built to last, run to win.”

This isn’t just a vibes-based complaint. In November, researchers released a paper titled “Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond),” which exposed repetition not only within individual models but also across models. The authors speculated that today’s LLMs are often trained in similar ways on similar data for similar tasks, which encourages convergence. The paper won the best paper award at NeurIPS. In one test described in the article, the researchers asked 25 different LLMs, including models from major US firms and open-source models from China and elsewhere, to write a metaphor about time 50 times each. Across 1,250 responses, they found most of the answers were variations of “Time is a river” or “Time is a weaver.”

When you look for this kind of repetition, it pops up everywhere, and Springboards has a front-row seat. Browne, cofounder and CTO at Springboards, points to how chat interfaces can feel like personal conversation even when users are seeing the same material as everyone else. His other example is “What should I name my band?” Most models, he says, steer toward themes involving “glass,” “neon,” “velvet,” or “static.” In the article’s own try, ChatGPT produced a list of 56 band names, with “Glass Harbor” at the top, followed by options like “Static Empire,” “Neon Hearts,” and “Velvet Echo.” Gemini generated 15, including “Static Horizon.” Even when the suggestions are stylish, the structure can look copy-paste across models.

Springboards is turning this observation into a product workflow. The company built a tool backed by a selection of LLMs, including ChatGPT and Claude, for creative professionals in advertising and marketing to brainstorm ideas by dragging around text from multiple models and combining the parts they like into something new. In that workflow, Flint is pitched as a way to get more variety when the “normal” models tend to cluster. Zoe Scaman, founder of Bodacious and chief strategy officer at 77X, has been trying it. She describes using it to “catapult” herself into different directions and runs a test in the article: she gave Flint and Claude, Gemini, and ChatGPT a classic MBA case study prompt about reinventing a finance company for today’s youth. The mainstream models, she says, all went down the same path about teaching financial literacy in fun and funky ways. Flint suggested a different framing, rebranding the concept of wealth accumulation. Scaman adds an important caveat: Flint is still a prototype and does not work all the time, sometimes “falls over” when pushed.

So how does Flint actually achieve the variety? Springboards built Flint on top of Qwen 3, an open-source model from Alibaba. Browne emphasizes the economic constraint: Springboards is a small team, and “training a foundation model is not on the table” because it is too expensive. The company instead targets a lever most LLM users never think about: the temperature setting, the common control for randomness. Browne says the temptation is to raise temperature to get creativity, but that can also destroy coherence. He offers an example from OpenAI models: dialing temperature to its maximum caused the model to switch from English into code halfway through a sentence.

Flint’s approach is to treat randomness like a scalpel, not a volume knob. Springboards realized it does not make sense to boost randomness across the entire output. It only wants more variety at specific points. Browne explains the logic using the prompt “Where should I go in Europe?” The model only needs to tweak randomness just before it names a destination, not for every word. To do that, Springboards trained its version of Qwen 3 to identify the points in its output where more variety is possible, then fill those spots with words or phrases that are more random. Browne describes it as being “programmed to throw an oddball in,” more like an invitation to think wider than a forced extreme.

For executives and boards, the subtext is bigger than creative writing. When LLM outputs converge, teams can waste time reviewing near-duplicates, brands can sound “samey,” and strategy work can become a mechanical extension of what the model already expects. Meanwhile, the industry is already aware of the tradeoff: OpenAI says training models to give reliable and coherent answers can lead them to converge around familiar, high-probability responses, and pushing harder for novelty can produce weaker or less reliable responses. Springboards is effectively betting that you do not need global novelty. You need targeted novelty, delivered safely at the moments that matter.

If you are building products on top of LLMs, Flint is a reminder that model capability is not just about who trained on what data. It is also about how you steer the generation process. The strategic stakes are simple: if “good enough” output becomes indistinguishable from everyone else’s, differentiation shifts from model choice to control architecture. Flint is one attempt to make that shift concrete, by translating the research-backed problem of open-ended homogeneity into a deployable product behavior.

Executive ActionsLocked

This story's Key Insights and Take-aways are locked.

Create a free account to unlock Executive Actions for one credit.

Register to Unlock

Always free for Executives Club members. Join the Club

More in Technology