Deployment Mode · 1 of 5
Public Cloud AI Deployment
The fastest way to run enterprise AI. Frontier models delivered through API endpoints from Anthropic, OpenAI, Google, and others. Production-ready in days, not quarters — with enterprise-grade SLAs, billion-dollar infrastructure, and the newest models the moment they ship.
For workloads where speed matters more than data sovereignty — this is the right mode. For workloads where it does not — it is not. BrainPack manages the difference.
Public Cloud Is Where AI Starts. It Should Not Be Where It Stays.
Almost every enterprise AI initiative begins on public cloud. The reasons are obvious — frontier models are there, the integration is API-call simple, costs are pay-per-use, and the security posture of the major providers is genuinely good. So most teams ship the first wave on public cloud, see real results, and move on to the next priority. That works. Until it doesn't.
Eight months in, the data scientist quietly fed customer PII into a public model. Someone in finance ran an analysis on un-redacted board materials. A regulator asks a question nobody can fully answer. Suddenly the conversation is no longer "should we use public cloud AI" — it's "how do we figure out which workloads should not have been there in the first place."
The right answer is not to abandon public cloud. The right answer is to use it for the workloads where it fits, route the rest to other deployment modes, and have one governance layer that enforces the boundary automatically. Public cloud is one of five deployment modes BrainPack runs across. The decision of which workload goes where is a policy decision — made once, enforced forever.
A Data-Classification Decision, Not A Security-Trust One.
Public cloud AI means running inference on the standard API endpoints provided by major AI vendors. The data leaves your infrastructure during the inference call, processes on the vendor's servers, and returns. The vendor's standard terms govern what they can do with the data — generally including content filtering, abuse monitoring, and (in default configurations) potential use for service improvement.
The major providers in this category are Anthropic (Claude), OpenAI (GPT and o-series), Google (Gemini), Meta (Llama via cloud APIs), Mistral (cloud), and a long tail of specialized providers for voice (ElevenLabs, Deepgram), embeddings, and other capabilities.
Public cloud AI is not insecure by default — providers operate at SOC 2, ISO 27001, and HIPAA-eligible service tiers, with encryption in transit and at rest. But "secure infrastructure" and "data appropriate for that infrastructure" are different questions. Public cloud is appropriate for some data classes and inappropriate for others. The deployment decision is a data-classification decision, not a security-trust decision.
BrainPack treats Public Cloud as one execution surface among five. The Connect, Orchestrate, and Govern layers do not change. What changes is where the inference actually executes — and which data is allowed to be sent there.
How It Actually Works — Govern LayerWhen Public Cloud Is The Right Mode.
Five Workloads Where It Wins.
Five workload categories where public cloud is the better choice — even if you also run other modes for sensitive data.
General Productivity & Knowledge Work
Drafting emails, summarizing documents, brainstorming, generating code, analyzing publicly available information. Internal-but-not-regulated data is appropriate for public cloud. The latency, model quality, and cost benefits are real.
External or Already-Public Data
Market research, competitor analysis, public regulatory filings, news monitoring, social media analytics. Data that is not your IP and is not subject to confidentiality obligations is the natural fit for public cloud inference.
Proof-of-Concept & Pilots
Early experiments where the goal is to validate whether AI can solve a business problem at all. Public cloud lets you ship a working prototype in days. If the problem is solvable, graduate the production deployment to whichever mode the data class requires.
Non-Regulated Customer-Facing Apps
Marketing copy generation, public-facing chatbots that do not handle PII, content moderation on public posts. Exposure is limited and the value of frontier-model quality is high.
When You Need the Newest Models the Day They Ship
Frontier model releases happen on public cloud first. Self-hosted open source lags by months on capability. If your workload requires the latest reasoning, multimodal, or coding capabilities, public cloud is where they live first.
When Public Cloud Is The Wrong Mode.
And Where The Workload Should Go Instead.
Five workload categories where public cloud is the wrong answer — and where BrainPack routes work to ZDR, self-hosted, on-premise, or air-gapped instead.
Customer PII & Protected Health Data
Even with provider HIPAA-eligibility, most legal and compliance teams prefer to keep PHI and detailed PII out of public cloud inference. Use ZDR endpoints with signed BAA, or self-hosted, or on-premise depending on the regulatory framework.
Financial Detail That Moves Markets
Pre-announcement earnings, M&A documents, board materials, executive compensation, trading strategy. The marginal value of using a frontier model on this data does not justify the risk profile of public cloud.
Core IP & Competitive Technology
Source code for products you sell, proprietary algorithms, trade secrets, manufacturing process documentation. Self-hosted open source on dedicated GPU is the right choice — full control, no third-party AI vendor in the data path.
Regulated Industry & Residency Requirements
Banking core data in jurisdictions with sovereignty requirements, defense contracts with controlled data classifications, government workloads with FedRAMP or equivalent obligations. On-premise or air-gapped is mandatory; public cloud is non-compliant.
Any Data Where a Regulator Can Ask "Who Touched This?"
If you cannot give a complete answer in the audit log without involving a third-party AI vendor's logs, the workload should not be on public cloud. The Govern layer must own the audit trail end-to-end.
Where to route them instead
How Public Cloud Orchestrates.
With Every Other Deployment Mode.
The point of having five deployment modes is not to pick one. The point is to route each workload to the mode that fits its data class — automatically, by policy, with one governance layer enforcing the routing.
A real BrainPack deployment looks like this:
Same user. Same conversational interface. Same agent library. Same governance policies. Five different inference paths — selected automatically by the Govern layer based on data classification, regulatory framework, and policy.
The user never picks the deployment mode. The mode picks itself.
Public Cloud Inside The Layer.
What BrainPack Adds On Top Of A Raw API Call.
When public cloud is the right mode, BrainPack does several things on top of a raw API call that change the operational posture meaningfully.
Multi-Vendor Model Routing
The orchestrator does not lock you to one provider. Reasoning to Claude. Vision to GPT. Multilingual to Gemini. High-volume cheap tasks to Llama. The user sees one interface; the orchestrator picks the right model per task.
Identity-Aware Retrieval
Data sent to the model is filtered by the requesting user's permissions before it leaves your environment. No global service account. No prompt-level bypass of access control.
Contractual Posture
Enterprise contracts with the major providers. ZDR-eligible endpoints activated where you need them. BAAs in place where HIPAA workloads might land. SOC 2 reports mapped to the workloads that depend on them.
Cost Controls
The orchestrator routes to the cheapest model that meets task requirements, catches runaway agents, and produces cost attribution that maps to your existing chargeback or showback model.
Full Audit Trail
Every public cloud inference call is logged in your audit system: which user, which query, which model, which data sent, how long, how much. Provider audit logs are supplemental, not primary.
Failover & Redundancy
If a provider has an outage, the orchestrator routes to a different provider. Your AI does not depend on one vendor's uptime.
Calling an AI API directly is the easiest 5% of enterprise AI. The other 95% is everything BrainPack adds on top. Public cloud is where AI starts for most enterprises — and the BrainPack layer is what makes it production-grade rather than a security incident waiting to happen.
Costs And Speed.
What You Actually Get.
Public cloud is the fastest deployment mode and, for most workloads, the cheapest unit cost. Both statements come with caveats.
To first capability. API integration. No GPU procurement, no infrastructure standup.
Per call. Frontier models on public cloud are the fastest available — optimized to the limits of physics.
No upfront commitment. Light workloads cost near-zero. Heavy reasoning still beats self-hosted unless utilization is extreme.
Tokens-per-day where self-hosted GPU becomes cheaper. BrainPack models this and routes accordingly.
The real expense of public cloud AI is not the inference bill — it is the cost of a workload going to the wrong mode and creating a compliance, IP, or audit problem. The Govern layer makes this misclassification structurally impossible.
Public Cloud, Running Now.
Alongside Every Other Mode, Per Data Class.
Public cloud is part of every BrainPack deployment we have built — running alongside other modes per data class.
Public cloud handles general HR queries and recruitment screening; ZDR handles employee-specific data; on-premise handles payroll source data. One unified interface.
Public cloud powers merchandising analytics and marketing copy; self-hosted open source handles financial analysis on un-announced numbers. Same agent library, two paths.
Public cloud runs inventory queries and customer service summaries; ZDR handles individual customer interactions. Cost-optimized routing across both.
Public Cloud Is Where You Start. The Layer Is What Makes It Safe.
Public cloud AI is fast, capable, and right for most enterprise workloads — when it sits inside an operating layer that knows which workloads belong there and which do not. Talk to an architect about how the deployment-mode mix should look in your environment.