When buyers evaluate AI chat integrations, they typically get one number from the vendor: a monthly platform fee. What they don't get — until they're six months in and running real volume — is a clear explanation of where the other costs land and why the pricing model they chose either rewards or penalizes their AI's performance.

This guide covers the actual cost structure of connecting an AI agent to a chat platform: the two-bill reality, why per-conversation pricing is structurally different from per-token or per-resolution pricing, and the specific cost math of KB ingestion versus real-time tool calls. We'll also cover Velaro's conversation pricing tiers and when BYOK makes economic sense.

The Two-Bill Reality

When you integrate an existing AI agent — a Copilot Studio agent, an Azure AI Foundry deployment, or your own model running behind an API — with a chat platform, you are looking at two separate cost centers that most buyers initially treat as one.

Bill 1: Your AI agent's infrastructure

Your agent runs on your infrastructure. Whether that's Azure OpenAI with a PTU commitment, OpenAI API in pay-per-token mode, or a self-hosted model on compute you own — that cost goes directly to you. The chat platform doesn't see it, doesn't bill it, and can't control it. Every token your agent processes while handling a Velaro tool call is billed to your Azure subscription or OpenAI account.

This bill scales with how often the agent is called and how large its inputs and outputs are. A tool call that passes 500 tokens of context and receives a 200-token response is cheap. A tool call that passes the entire conversation history (2,000 tokens) and asks the agent to generate a detailed summary (1,500 tokens output) is 15x more expensive. The design of what you send and what you ask for back directly controls this bill.

Bill 2: Velaro's conversation charge

Velaro charges per AI-powered conversation — a conversation where Velaro's AI is actively doing work: reading your agent's responses, forming answers, maintaining context across turns, and orchestrating tool calls. This includes Velaro's model inference, vector search, and workflow orchestration. It does not include the cost of your agent running on your infrastructure.

The practical implication: if you have 2,000 AI conversations per month, you know what Velaro's bill is before the month starts. Your agent's infrastructure bill varies with the complexity and volume of those conversations, but the Velaro charge is predictable.

💡
The structural difference between per-conversation and per-resolution pricing: Per-resolution pricing (Intercom Fin at $0.99, Zendesk at $1.50) means your bill rises as your AI gets better at resolving contacts. Velaro's per-conversation model means a higher deflection rate lowers your cost per resolved contact — because you're resolving more in the same number of conversations, not paying more per resolution.

Why Per-Conversation Is the Right Pricing Model

Per-token pricing is technically accurate but operationally useless for support teams. You can't budget for it because token count varies wildly by conversation type — a simple order status lookup uses 200 tokens; a complex multi-turn troubleshooting conversation might use 4,000. Over a month with a mix of both, your bill fluctuates by 20x depending on which contact types dominate.

Per-resolution pricing is the worst of all models for buyers. It penalizes you for AI success. A team that goes from 30% deflection to 60% deflection — which is genuinely hard, valuable work — sees their AI bill double. The vendor's revenue grows in direct proportion to your team's improvement, which creates a perverse incentive for vendors to define "resolution" as broadly as possible.

Per-conversation pricing aligns vendor and customer: Velaro's revenue is predictable based on your contact volume, not your deflection rate. Your bill is predictable based on conversations, not token variance. And as your AI improves, you resolve more contacts within the same conversation allotment — meaning your cost per resolution drops as your AI gets better, instead of rising.

The right pricing model rewards your AI for performing well. If your bill goes up every time your deflection rate improves, you have the wrong vendor.

KB Ingestion vs. Real-Time Tool Calls: The Cost Comparison

This is where most buyers have the least visibility before they've run a real integration. The cost difference between these two patterns is not marginal — it's an order of magnitude at scale.

KB ingestion costs

KB ingestion runs in two phases. The indexing phase happens when you ingest content: documents are chunked, embeddings are generated (typically via a small embedding model), and vectors are stored in a search index. For a 1,000-page knowledge base, this typically costs $0.01–$0.05 in embedding model compute. You pay this once per reindex, not per conversation.

At query time, a vector search costs fractions of a cent. Velaro runs the query, retrieves the top matching chunks (typically 3–5), and passes them as context to the AI response layer. The total model compute for a KB-grounded response — embedding the query, retrieving chunks, generating a response — is roughly 300–600 tokens per exchange. At current Azure OpenAI pricing, that's under $0.002 per response.

Real-time tool call costs

A real-time tool call has a different cost structure. The call passes context to your agent (the customer's question, conversation history, extracted variables), your agent runs its logic, and returns a response that Velaro incorporates into the final answer.

The 40x difference in context size

A KB chunk that answers "what is your return policy?" is typically 150–200 tokens. A tool call to a CRM agent that returns the customer's full account record — recent orders, tickets, entitlements, contact history — might return 3,000–5,000 tokens. That context has to be processed by the AI to form the final response.

A tool call returning 3,000 tokens costs roughly 40x more in model compute than a 50-token vector search result. At low volumes this is irrelevant. At 50,000 conversations per month with a tool call on every exchange, the difference is thousands of dollars per month on Bill 1 alone.

This doesn't mean tool calls are wrong — they're essential for customer-specific data that can't be indexed. But it does mean you should be deliberate about which tool calls run on every conversation versus only when triggered by a specific customer need.

The optimization pattern is: use KB ingestion for everything that can be indexed, use tool calls only for live lookups that genuinely require current data, and design your tool calls to return only the fields you need rather than a full entity dump.

The Pro/Cons Table: Real-Time Tool Call vs. KB Ingestion

Dimension Real-Time Tool Call KB Ingestion
Data freshness Always current — reflects live system state Only as fresh as last reindex
Cost per query High — full model inference on tool output Very low — vector search + small context
Latency Adds external API round-trip time Sub-100ms vector retrieval
Reliability Depends on your agent's uptime Index always available, no external dependency
Customer-specific data Essential — only option for account-level data Not possible — indexed content is shared
Setup complexity Requires API, auth, response mapping URL or file upload, automatic chunking
Scales cheaply at volume Costs scale linearly with conversation volume Reindex cost fixed; query cost is trivial
Right use case Orders, accounts, tickets, entitlements Docs, FAQs, policies, product content

Velaro's AI Conversation Pricing

Velaro's conversation pricing is structured by monthly volume with a flat overage rate — no per-resolution surprise charges.

Plan Included AI Conversations Overage Rate Best For
Basic 500 / month $0.08 / conversation Small teams getting started with AI deflection
Pro 2,000 / month $0.06 / conversation Mid-size teams with established support volume
Enterprise 10,000 / month $0.04 / conversation High-volume operations with custom SLA requirements

These rates cover Velaro's AI inference, vector search at query time, and workflow orchestration. They do not cover the cost of your own AI agent running on your infrastructure — that appears on Bill 1, as described above.

At 2,000 conversations per month on the Pro plan: zero overage. At 3,000 conversations: 1,000 overage conversations at $0.06 = $60. Compare this to a per-resolution model at $0.99 per resolution — 2,000 resolved conversations would cost $1,980 in AI fees alone before any platform or seat costs.

Want to model your specific volume against Velaro's pricing? The pricing page has a calculator.

Calculate your integration cost

BYOK: When It Makes Sense

BYOK (Bring Your Own Key) means routing Velaro's AI inference through your own Azure OpenAI deployment rather than Velaro's shared inference pool. You pay Azure directly for the tokens Velaro uses, and the per-conversation rate adjusts to reflect that Velaro is no longer providing model inference.

BYOK is the right choice if any of these apply:

You have Azure OpenAI committed capacity. If you're already paying for PTU (Provisioned Throughput Units) that have spare capacity, routing Velaro through your deployment turns idle committed spend into productive compute. The incremental cost per conversation is close to zero if you're not otherwise saturating your PTU.

Data governance requires tenant isolation. If your compliance posture requires that model inference never leave your Azure tenant — common in financial services, healthcare, and government — BYOK is the only path. Velaro's orchestration runs in Velaro's infrastructure, but the actual model inference happens in your Azure environment.

You're at volumes where your negotiated rate beats Velaro's included rate. Enterprise Azure agreements often include AI credits or discounted rates on GPT-4o class models. If your negotiated token rate is below $0.003 per 1K tokens and your conversation complexity is high, BYOK may be cheaper than Velaro's per-conversation rate at high volumes.

For most teams under 10,000 AI conversations per month, BYOK is not worth the setup overhead. Velaro's shared inference pool is GPT-4o backed, and the per-conversation pricing includes inference, so you're not leaving money on the table. BYOK is an enterprise optimization for teams with existing Azure commitments or compliance constraints — not a default configuration.

Practical Cost Modeling: An Example

A mid-size SaaS company with 50 support agents and roughly 8,000 customer conversations per month, of which about 40% are suitable for AI deflection (3,200 AI conversations):

The math is not subtle. Per-conversation pricing at meaningful deflection volumes is structurally cheaper than per-resolution pricing, and it stays cheaper as your deflection rate improves.

What Controls Your Cost Most

In order of impact on your total AI bill:

  1. Pricing model choice — per-conversation vs. per-resolution is a 10–40x difference at meaningful deflection rates
  2. KB ingestion coverage — more content in the KB index means fewer tool calls per conversation, lowering Bill 1
  3. Tool call design — returning only the fields you need vs. full entity dumps reduces token usage by 3–10x per call
  4. Conversation scope definition — a "conversation" that includes 12 back-and-forth turns uses more tokens than one that resolves in 3
  5. Model tier selection — GPT-4o-mini for simple lookups, GPT-4o for complex reasoning; routing calls appropriately reduces Bill 1

Most teams focus on #5 first (model selection) when #1 and #2 have far greater impact on total cost. Picking a smaller model saves you fractions of a cent per token. Choosing a per-conversation pricing model instead of per-resolution saves you $0.93 per resolution at Intercom's rates.

Frequently Asked Questions

Who pays the AI token bill when my agent runs inside Velaro chat?

There are two separate bills. Your existing AI agent runs on your own infrastructure — that bill goes directly to you. Velaro's AI reads the agent's response and uses it to form the final answer shown to the customer. Velaro's AI token costs are included in Velaro's conversation pricing, unless you're on a BYOK plan where you route inference through your own Azure deployment.

Why does Velaro charge per conversation instead of per token?

Per-token pricing is unpredictable for support teams because token count varies dramatically by conversation type. A simple order lookup uses 200 tokens; a complex multi-turn conversation might use 4,000. Per-conversation pricing gives you a predictable cost regardless of how complex individual conversations get, and it means your bill doesn't go up when your AI resolves things more efficiently.

Is KB ingestion cheaper than real-time tool calls?

Yes, significantly. KB ingestion costs pennies per reindex. At conversation time, a vector search costs fractions of a cent. A real-time tool call that returns 3,000 tokens of context costs 40–60x more in model compute than a 50-token vector search result. For content that doesn't change frequently, KB ingestion is the correct choice for both cost and latency.

What is BYOK and when does it make sense?

BYOK (Bring Your Own Key) means routing Velaro's AI inference through your own Azure OpenAI deployment. It makes sense when you have Azure OpenAI committed capacity with spare headroom, when compliance requires inference within your own Azure tenant, or when your negotiated Azure rate is better than Velaro's included per-conversation rate at high volumes.

What are Velaro's AI conversation prices?

Basic includes 500 AI conversations/month with $0.08 overage; Pro includes 2,000/month at $0.06 overage; Enterprise includes 10,000/month at $0.04 overage. These rates include Velaro's AI inference, vector search, and orchestration — not the cost of your own AI agent running on your infrastructure.