Bright Data’s Or Lenchner pushes web-data infra as the AI bottleneck
Enterprises need real-time, trustworthy web inputs at scale, or AI answers get stale, slower, and less usable.

Or Lenchner, CEO of Bright Data, argues that enterprises hit a data bottleneck for AI because the web was not built for automated, real-time discovery and retrieval. The payoff is a new “web data infrastructure layer” that can navigate massive domains while enforcing compliance, so AI systems stay current and trustworthy.
AI is booming, but the real constraint is increasingly not the model. It is the feed. Or Lenchner, CEO of Bright Data, is betting the next frontier in AI is a “web data infrastructure layer” that can reliably discover and map the expanding public web, in real time, at the scale enterprises now demand.
Why this matters immediately: Lenchner frames the situation as a knowledge problem. “The data suggests there's far more data out there,” he says. “Think of the universe: It's out there, but you don't know what you don't know.” That line lands because the web itself was not designed for the kind of automated, always-on retrieval that modern AI applications want. Enterprises often find the information they need blocked, unstructured, or technically hard to access. Without infrastructure that can navigate the messy reality of websites at speed, AI output tends to degrade into stale context, and stale answers lead to bad decisions and disappointed consumers. The stakes get sharper because companies need to track live changes like competitor pricing, consumer sentiment, and market trends, not the “snapshot” picture you get from training on static datasets.
This is where the infrastructure conversation gets practical. The source makes the point that AI performance depends not only on model architecture, but also on compute, networking, retrieval, and data engineering capabilities. Traditional AI training relies on snapshots of information collected at a point in time, which quickly stops matching the real world. In operational settings, delayed retrieval can cut usefulness even if the model is otherwise sophisticated. The promise of real-time, high-quality web data is twofold: it supplies fresh context, and it can reduce hallucinations because the model has a more relevant knowledge base. A survey cited in the piece adds a measurable trust angle, finding that 56% of AI practitioners said businesses need access to real-time web data to improve trust in AI outputs.
But “retrieving the internet” is not the same as retrieving the right internet, fast enough to be usable. The source explicitly calls out that many AI systems still struggle to deliver outputs that are current, contextually relevant, and trustworthy in operation, even with retrieval-augmented generation (RAG), where models pull in external data at the moment of a query. Large-scale retrieval alone does not solve it. Gartner is cited with a blunt statistic: 60% of AI projects that are not supported by AI-ready data-accurate, structured, organized, and contextualized-will be abandoned by the end of the year. Translation: teams cannot treat data readiness as an afterthought, and they cannot assume “we can fetch something” equals “we can ground decisions.”
The infrastructure described here aims to solve that mismatch by handling scale, latency, and access friction together. Lenchner says the layer should enable models to navigate “hundreds of millions of existing web domains and billions of new URLs created each week,” while delivering real-time information. He also describes the approach as emulating human browsing behavior to access available content and transform raw code into structured data feeds. That includes working with websites that may not play nicely with traditional scraping tools, such as those heavy in JavaScript or protected by aggressive antibot defenses. The source gets very specific about what “mimic a web user” can mean operationally, including using “identifying information-IP address, location, and 1,000 more parameters,” at enormous volume, repeatedly: “Think of doing that 80 billion times a day for millions of websites.” The point is not trivia. The point is that retrieval at scale is a systems engineering problem, not a one-time script.
And then comes the part that boards and compliance leaders should care about just as much as engineering. Continuous retrieval creates data governance challenges. The piece says platforms can enforce strict compliance aligned with global privacy frameworks, including the EU’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). It also notes design limits like restricting access to openly accessible, public information, avoiding paywalls or private logins. Networks can be vetted and consent-based, and incentives can be provided to owners of IP addresses. The message is that if this is “critical infrastructure for a company,” doing it in-house becomes “a full-time engineering problem that competes with the actual AI work.” So the business case is partly speed to capability, and partly risk containment.
There is also a market-structure reason the “data infrastructure layer” framing is gaining traction. Many enterprise systems already combine public web retrieval with APIs, licensed datasets, and proprietary internal data. Integrating those fragmented sources into a timely, usable knowledge layer requires specialized capabilities. The source cites research that 97% of AI organizations depend on real-time web data infrastructure, while 90% feel boxed in by various restrictions. That combination explains why infrastructure vendors can matter: they can become the orchestration and observability layer that makes data actually usable for models.
Finally, the strategic implication is almost philosophical, but it lands like an operating memo. Lenchner uses a metaphor: think of the trained model as intelligence and relevant data as knowledge. A “powerful intelligence layer sitting on top of a hollow knowledge layer” is like “a genius who knows nothing-useless in practice.” Over time, he suggests, the distinction between AI models and the infrastructure that feeds them may even begin to disappear. The world is changing, and “everything that is happening in the world is being uploaded to the public web.” For executives building AI systems now, the question is not whether AI needs data, it is whether your architecture can keep pace with real-world change without turning governance and latency into the project-killers.
This story's Key Insights and Take-aways are locked.
Create a free account to unlock Executive Actions for one credit.
Register to UnlockAlways free for Executives Club members. Join the Club
More in Business

SpaceX sells $25B in debt under two weeks after IPO, despite $90B in orders
The satellite and rocket company’s quick $25 billion borrowing move signals how it plans to finance scale after going public.

Accenture’s $4.18bn play fails as AI fears spark a 20% worst-ever stock plunge
On Thursday, Accenture hit its biggest one-day drop on record after forecasting worries that AI could hollow out consulting.

SpaceX stock jumps 3% after it overtakes Amazon’s market cap
CNBC says SpaceX’s shares surge following its IPO Friday, forcing investors to reprice what “space” and “AI” are worth.
