Perplexity’s Aravind Srinivas pushes AI routing onto the PC

Perplexity says its new hybrid orchestrator decides, task by task, what stays local and what hits the cloud, which could reshape privacy, latency, and chip demand.

ByLama Al-RashidTechnology Correspondent, The Executives Brief

about 2 months ago·5 min read

Perplexity’s Aravind Srinivas pushes AI routing onto the PC

Executive summary

Perplexity AI and CEO Aravind Srinivas unveiled a hybrid local-cloud inference orchestrator at Computex 2026, showing software that routes AI work between a user’s device and frontier models in real time. For decision-makers, the bigger issue is that the routing layer may become as strategic as the model layer, with implications for privacy, infrastructure spend, and AI hardware buying decisions.

Perplexity AI, led by CEO Aravind Srinivas, used Computex 2026 to show off something it says no product has done before: a hybrid local-server inference orchestrator that decides, in real time and mid-task, whether an AI workload stays on a user’s device or gets sent to the cloud. The demo mattered because it was not just a local AI brag. It was a live attempt to make the machine itself decide where computation happens, which is a much bigger deal for privacy, cost, and enterprise control than simply running a model offline.

The company put that claim on stage during Intel’s keynote, with Srinivas demonstrating Perplexity’s Personal Computer agent while Intel CEO Lip-Bu Tan stood alongside him. In the demo, local models running on Intel Core Ultra Series 3 were shown determining what information should remain on the device and what could be routed to cloud-based models. Perplexity says the system balances intelligence, accuracy, privacy, and cost. More importantly, it shifts the user experience from “pick local or cloud” to “let the system decide as it goes,” which is the kind of workflow change that can quietly reshape product expectations if it works in the wild.

There is a big caveat here: the product is not yet available to users. Perplexity says the hybrid inference feature will launch in the coming weeks. But the company is clearly telegraphing where it thinks the market is heading. The useful contrast is with the state of AI tools today. Plenty can run on-device. Plenty can call frontier models in the cloud. What Perplexity is pitching is orchestration, the software brain that chooses, task by task, which environment should handle which piece of work. That distinction matters because it turns infrastructure choice into software logic rather than a manual user setting.

This is also the next step in a product line Perplexity has been building all year. On February 25, the company launched Computer, a multi-model AI agent that orchestrates 19 different AI models to complete complex, long-running tasks on behalf of users. That system ran entirely in the cloud and broke goals into subtasks, routing each to the best model for the job, including Claude, Gemini, GPT, Grok, and others. Then in March, Perplexity introduced Personal Computer at its inaugural Ask 2026 developer conference. That Mac app added a hybrid local-cloud AI agent, which Perplexity described as a “personal orchestrator” that hybridizes local and server environments for security and productivity. It could access the Mac’s file system and native Mac apps, create and execute workflows, and keep files in a secure sandbox with actions that were auditable and reversible.

What Computex changes is where the intelligence sits in the stack. Previously, the split was relatively clean: local file access on the device, heavy computation on Perplexity’s servers. Now the system itself is supposed to reason about location, not just model selection. In other words, the orchestrator is no longer just choosing which AI is best for the task. It is deciding which physical place should process each part of the task. Perplexity says the system reportedly asks for user permission before sending sensitive tasks to the cloud, which addresses one of the biggest objections enterprises have to agentic AI: who sees what, and where does it go?

That privacy angle is not a side note. It is the business case. Sensitive data such as financial records or health information stays on the local machine, while heavier reasoning tasks that need frontier-scale models get sent to the cloud. For executives, that is appealing because it offers a cleaner answer to data governance concerns without giving up access to more capable models when they are actually needed. It also means the economics of AI agents could become tied to the quality of the local silicon under the hood. The more capable the chip, the more inference can happen locally, which can reduce cloud costs and improve latency for sensitive workloads. Perplexity says the system is chip-agnostic, but the initial demo ran on Intel silicon, which is not exactly a subtle message.

The timing was deliberate. Computex 2026 has been dominated by on-device AI. Just hours before Intel’s keynote, Nvidia CEO Jensen Huang unveiled the RTX Spark, a new Arm-based superchip that the company says is the foundation for a new generation of AI-native Windows PCs. Nvidia says the RTX Spark Superchip offers up to 20 Arm CPU cores, a Blackwell GPU with 6,144 CUDA cores, 128GB of LPDDR5X RAM, and up to 300 GB/s of memory bandwidth, enough for AI agents and 120-billion-parameter models with context lengths stretching to a million tokens. Nvidia says RTX Spark systems will begin arriving in the fall. Intel, for its part, used its keynote to show Xeon 6+ processors with 288 efficiency cores built on 18A technology for the data center, and positioned Core Ultra Series 3 as the client silicon that makes hybrid inference possible on the PC. Perplexity’s demo sits right between those pitches.

The company’s own framing makes the strategic implication hard to miss. As a Perplexity spokesperson told VentureBeat, “As chips become more powerful, more intelligence moves onto a person's machine, alongside server inference for the complex tasks that still need frontier models.” The spokesperson added, “Sensitive and sovereign work can stay local, which changes the need for massive country-level infrastructure.” That last point is the one that should make policymakers and infrastructure investors pay attention. Countries including the UAE, France, and India have been spending heavily on domestic AI compute capacity partly because sensitive data is expected to stay within borders, which has driven demand for local data centers. If meaningful inference can happen on an end user’s device without data leaving the machine, some of that urgency could soften. It would not kill the data center buildout. But it could change the pace and size of the bet.

Beneath all of this is Perplexity’s broader thesis that the orchestration layer matters more than any single model. The company has been leaning into the idea that models are specializing, not commoditizing, and that the real value sits in a system that can coordinate multiple third-party models, tools, and now physical compute locations. That is an attractive pitch for a world where the best answer may require a cheap local model for one step and a frontier cloud model for the next. It is also a technically hard one. The orchestrator has to judge task complexity, detect data sensitivity, understand the user’s hardware, and keep state intact while work bounces between environments. If it works, Perplexity could help normalize a new expectation for AI products: the smartest system is the one that knows when not to leave the room. If it fails, the privacy and latency promise becomes just another nice slide deck. Either way, rivals building AI agents, PC software, or infrastructure need to watch this closely, because the battle is no longer just about which model answers best. It is about who controls where the answer gets made.

Executive ActionsLocked