Startup Flint makes rival LLMs stop repeating themselves by design

Springboards trains an LLM to inject novelty at key moments, not by cranking randomness everywhere.

ByLama Al-RashidTechnology Correspondent, The Executives Brief

about 4 hours ago·4 min read

Startup Flint makes rival LLMs stop repeating themselves by design

Executive summary

Springboards built an LLM called Flint on top of Qwen 3 to produce more varied responses than mainstream models like ChatGPT and Claude. For decision-makers, it signals a shift from “turn up creativity” sliders to targeted control of where novelty appears.

Open a chatbot and ask for a “random number between 1 and 10.” You’ll probably get 7. Ask again, and you might see 3 or 4. Ask again, and you’ll see 8 or 9. It will not be perfectly predictable every time, but the pattern is familiar enough to feel like the model is playing the same song in a different key.

Springboards’ point is that this kind of repetition is not just for numbers. Their startup built an LLM called Flint to avoid the “groupthink rut” that shows up in open-ended questions. In one quick showdown, after ChatGPT and Claude both returned 7, Flint returned 3.7916, then produced a totally different set of outputs in other prompts, like naming cars and writing a New Balance tagline. The promise is simple: not that Flint will be “more creative” in a vague sense, but that it will deliberately widen the response distribution when it matters.

Why executives should care is that this behavior shows up as a product problem. In creative workflows, brainstorming, and planning, “safe and familiar” responses can become a competitive disadvantage. You can ship copy that sounds correct and still miss the best idea because everyone used the same model families and got pushed toward the same high-probability phrases. Springboards cofounder and CTO Kieran Browne describes it as a mismatch between how chat feels (like a personal conversation) and what it often delivers (the same material as everyone else). That gap is exactly where differentiation can get stuck.

The repetition is getting hard evidence, too. In November, researchers released a paper titled "Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)," reporting “remarkable” repetition not just within one model, but between different models. The team won the best paper award at NeurIPS. Their experiments included prompting more than 70 different LLMs 50 times each to write a metaphor about time. Out of 3,500 responses, more than half were variations of “Time is a river” and the rest were variations of “Time is a weaver.” The researchers also found a similar convergence when asking broad open-ended questions.

Springboards is trying to turn this from an academic finding into a practical tool. Their business is built around an advertising and marketing brainstorming workflow that uses a selection of LLMs, including ChatGPT and Claude, where creative professionals can drag, combine, and select the best bits from different model outputs. Flint is pitched as an additional option inside that tool because it is designed to produce wider variety in response to prompts like “Where should I go in Europe?” or “What should I name my band?” Flint also shows its edges in controlled tests: for example, a marketing strategist Zoë Scaman, founder of Bodacious and chief strategy officer at 77X, says she tested Flint against Claude, Gemini, and ChatGPT using a classic MBA-style case study, “How would you reinvent a finance company for today’s youth?” The mainstream models went down a similar path about teaching financial literacy in a “fun and funky” way. Flint suggested a different framing: rebrand the concept of wealth accumulation. Scaman also cautions that Flint is still a prototype and can “fall over” when pushed too far, meaning it is useful for sparking divergent thinking but not a flawless generator.

Under the hood, Flint is built on a specific strategy for controlling randomness. Springboards trained their version of Qwen 3, an open-source model from Alibaba, because training a foundation model is “not on the table” for a small team due to cost. Most LLM interfaces expose a parameter called temperature, a common knob for randomness and creativity. Browne says the team explored temperature early because that is what people tell you to do: if you want more creativity, turn it up. The problem is that changing parameters globally can make outputs incoherent. Browne describes an example where dialing up the temperature on one of OpenAI’s models to its maximum setting caused the response to switch from English into code mid-sentence.

So Flint uses a more surgical approach. Instead of boosting randomness everywhere, Springboards realized you only want novelty at specific points in the output. Their approach trains Flint to identify “points in its output where more variety was possible,” then fill those spots with words or phrases that are a little more random. The concept is easier to imagine than it sounds: when you ask “Where should I go in Europe?” the model mostly needs to be coherent while discussing context, but it only has to be creatively flexible right before it names a destination. Browne frames Flint’s behavior as “an invitation to think wider,” essentially throwing an “oddball” into the right places rather than letting entropy run wild.

There are second-order implications here for anyone managing AI strategy, from product leaders to board-level risk owners. OpenAI notes that training models for reliability and coherence can lead them to converge around familiar, high-probability responses, and that pushing harder for novelty can weaken reliability. The “Artificial Hivemind” paper also studied models from 2024 that have since been updated, so this is an evolving baseline, not a fossilized rule. Flint’s bet is that you can capture some of the benefits of novelty without paying the full incoherence tax by targeting where variability enters the text.

For executives, the strategic stakes are straightforward. If your teams rely on mainstream chatbots for ideation, you may be systematically optimizing for sameness. Flint suggests a new control philosophy: creativity should be treated like a governed capability, not a global dial. And because the competitive advantage in marketing and product planning often comes from showing up with better options, a model that produces genuinely different starts could change the throughput of creative teams. Not because it “wins awards,” as Flint’s tagline example implies, but because it gives you more decision-grade variety, more often, at the moment it matters.

Executive ActionsLocked