The AI Chatbot Buyer's Guide for SMBs
A vendor-neutral framework for evaluating AI chatbot platforms. Written by the team behind Wengrow, but most of what's below applies regardless of which platform you end up buying. If you're reading this to evaluate Wengrow, the honest answers to every question are on our pricing page and in the relevant feature docs. If you're evaluating someone else, print this and use it as a checklist.
Why buying an AI chatbot is weirder than buying other SaaS
Most SaaS purchases fit a familiar pattern. You know what the category does. You know what "good" looks like. You compare features, get a quote, negotiate terms, sign. The vendors you're comparing agree on what the product is, even if they disagree on who does it best.
AI chatbots are different right now, for three reasons.
The category is bifurcated. "Chatbot" covers tools doing fundamentally different jobs. Support-deflection platforms (Intercom Fin, Ada, many others) exist to close tickets without a human. Lead-capture platforms (Wengrow, Drift, many others) exist to turn conversations into pipeline. FAQ-content bots (Chatbase, Chatsimple, SiteGPT, many others) exist to answer questions from your website content. They share UI patterns — a widget in the corner, an LLM behind it — but the features that matter and the pricing that makes sense are not the same across the three. Starting with "we need a chatbot" is like starting with "we need a vehicle." You need to know if you're moving pallets, commuters, or dirt.
Billing units are opaque on purpose. Per-resolution, per-conversation, per-session, per-user, per-message, per-token, per-MAU — every platform has its own definition of what counts as "one of the things you're paying for." The definitions are not standardized and often change quietly. You cannot compare $0.99/resolution to $299/month to $0.005/message without a translator. Vendors know this and the ambiguity benefits them at contract renewal time. We wrote a separate essay on why we picked engaged-only billing to fix our corner of this — but the broader lesson is: insist on a clear unit, a clear definition, and a clear forecast of your expected bill.
The buyer is often not the user. Marketing signs the contract; sales lives in the CRM; IT owns the site; support uses the handoff inbox; customer success gets paged at 11pm when the bot says something weird. If your evaluation only consults one of those, you'll miss something material. The best RFPs we've seen put all four in a room for an hour before writing the questions.
This guide walks through what we think matters: the 12 questions every buyer should ask vendors, the 8 features that actually move the needle, and the 3 things that tend to get hidden on a sales call. We'll also point to relevant Wengrow pages where you can see how we'd answer each question — not because this is a pitch, but because a guide that doesn't show its own answers is a guide that's hiding something.
What a chatbot actually is in 2026
Before the 12 questions, a short primer on architecture. Skip this section if you already know it cold.
A modern AI chatbot is four things glued together:
- A retrieval layer — typically a hybrid of keyword search (BM25) and vector search over your content, with a re-ranker to pick the top chunks to feed to the LLM. This is where "answer quality" largely comes from; a great LLM with mediocre retrieval still sounds confidently wrong.
- An LLM — usually a frontier model (Anthropic's Claude, OpenAI's GPT-4 class, or similar) running with a system prompt. This is where "voice" and "reasoning" come from. Most platforms abstract the model choice; some let you swap it.
- A widget or hosted page — the interface the visitor actually uses. Embedded widgets live on your existing site; hosted pages are standalone URLs where chat is the page. Deployment choice affects conversion rates more than almost any other variable.
- The glue — CRM sync, webhooks, analytics, lead scoring, email sequences, human handoff, admin UI, audit logs. This is where tools differentiate most from each other and where the difference between "demo works" and "production works" lives.
When people say "your chatbot hallucinated," they usually mean the retrieval layer missed the right chunk and the LLM answered from parametric memory instead. When they say "your chatbot is slow," they usually mean one of the four layers has a latency problem. When they say "your chatbot is expensive," they usually mean the billing unit doesn't match the value metric. The 12 questions below cover all four layers.
The 12 questions every buyer should ask
Ask every vendor you're evaluating — Wengrow included. Expect honest, specific answers. If a vendor dodges or says "it depends" without following up with what it depends on, that's a signal.
1. What is the billing unit, and what counts as one?
The single most important question. Get a precise definition, in writing, of what counts as a billable event. "A conversation" is not specific enough. Ask: does a widget-open count? A one-message exchange? A bot-to-bot conversation? A scraper? A user who types "test"?
What a good answer sounds like: "A billed conversation is defined as X. It does not include Y. You can see every billable event in your invoice detail."
Why this matters: mismatched billing definitions are the #1 cause of contract regret. We published ours and stand behind it; other vendors define things differently on purpose.
2. What happens on a traffic spike?
You run an ad campaign, a blog post goes viral, a partner sends a burst of traffic. Does your bill 10× with the traffic? Cap automatically? Require a phone call to add capacity?
What a good answer sounds like: "Your tier caps at X conversations; above that, overages are billed at Y per unit up to Z, then the bot pauses / escalates / keeps running with a notice to you."
Why this matters: lead-gen traffic is spiky by nature. Support traffic is not. A platform priced for stable support volume is a terrible fit for campaign-driven marketing unless they explicitly engineer for the spikes.
3. How do you handle scrapers, bots, and adversarial traffic?
Not all widget-opens are humans. Some are scrapers indexing your content, some are competitors running curiosity loads, some are LLM training bots. If those count as billed events, you're subsidizing someone else's research budget.
What a good answer sounds like: "We detect and exclude X categories of non-human traffic from your billing. Here's our detection approach. You can audit which events were excluded."
Why this matters: invisible overbilling. You will not notice a 10% scraper tax on your invoice unless you go looking for it.
4. How often does the knowledge base refresh?
If you update a product page, a pricing page, or an FAQ, how long until the bot knows? Hourly? Daily? Manual re-crawl only?
What a good answer sounds like: "On-change re-index for uploaded docs; hourly/daily crawl schedule for URLs; manual re-crawl available; audit trail of which KB chunks were used in each answer so you can tell when stale content is still in rotation."
Why this matters: stale knowledge bases are the most common cause of production embarrassment. A bot answering from three-month-old pricing is worse than no bot.
5. What's the human handoff cost and UX?
When the bot should hand off to a human — because the question is too nuanced, the user asked, or the lead is high-value — what happens? Does handoff require a seat license? Is it a chat inbox, an email, a Slack ping, a CRM task?
What a good answer sounds like: "Handoff is $X/seat/month or included through tier Y. Handoff UX is A, B, or C. Transcript travels to the human automatically; the human can reply in the same thread."
Why this matters: handoff gating is a common hidden-cost trap. The sales motion lands you at $X/month for the AI, but a full deployment needs handoff seats at $Y/agent, and suddenly your all-in cost doubled.
6. How deep are the native integrations?
"Integrates with Salesforce" can mean anything from "Zapier has a recipe" to "native bidirectional sync with custom field mapping and duplicate detection." The difference is the difference between a weekend project and nothing to build.
What a good answer sounds like: "Native integrations with A, B, C — list them specifically. For others, we support webhooks with HMAC signing and documented payload schemas. Here's our integration doc."
Why this matters: "integrated" is the most overloaded word in SaaS sales. Ask for specifics. Ask for field-mapping docs. Ask to see it working during a demo.
7. Where is the data stored and processed?
EU customer data. HIPAA-adjacent workflows. Enterprise security review. If any of these apply to you, data residency matters. If none apply, you still want to know — because they might apply later and re-platforming is painful.
What a good answer sounds like: "Data processed in region X, stored in region Y. We offer options for Z. Our SOC 2 / ISO / HIPAA / GDPR posture is at [link]. Subprocessor list at [link]."
Why this matters: some vendors have answers to this that don't match their marketing. Ask your security team to look at the real posture, not the marketing page.
8. Is there a custom domain / white-label option?
If you want the chatbot to be chat.yourdomain.com rather than yourcompany.competitor-domain.com, or if you resell to end-clients, white-label matters. On many platforms, white-label lives behind a sales-assisted enterprise tier.
What a good answer sounds like: "Custom domain is available starting at tier X at $Y/month. Full white-label (your logo, no vendor branding) is available from tier Z. Here's how to set up your domain — our features/hosted-page doc walks through it."
Why this matters: the difference between "your bot lives at your domain" and "your bot lives at theirs" is brand consistency in a place users are forming trust.
9. What's the SLA?
If the platform goes down during your peak traffic hour, what's the contractual commitment? 99%? 99.9%? 99.95%? Remedy if breached?
What a good answer sounds like: "We commit to X% uptime with Y credit structure for breaches. Status page at [link]. Incident history at [link]."
Why this matters: lower-tier plans often have no SLA. If your chatbot is a meaningful revenue-driver, no-SLA is not okay. If it's a convenience feature, no-SLA might be fine. Know which bucket you're in.
10. How long is the free trial, and what are its limits?
14 days with no credit card is the modern default; many tools offer longer. Some offer a free tier indefinitely. Some require a credit card up front and auto-convert. Some gate features during trial.
What a good answer sounds like: "14 days, no credit card, full-feature access on [tier]. Auto-cancel at day 14 if you don't add payment info. Or: free tier exists with these limits indefinitely."
Why this matters: bad trials are a leading indicator of bad post-sale experience. If the trial is short, gated, or requires a sales call to unlock, expect the rest of the relationship to feel the same.
11. What happens on overage?
Related to but distinct from traffic spikes (Q2). If you exceed your tier's quota, what is the specific behavior? Bot disables? Overages billed at a defined rate? Rate-limited? Upsell email?
What a good answer sounds like: "Overages bill at $X per conversation up to a cap of Y. Above the cap, the bot [specific behavior]. You receive an email at 80% and 100%."
Why this matters: this is the detail where surprise bills live. "$0.99 per overage conversation" can turn into a five-figure surprise after a good ad week.
12. What are renewal terms and price escalators?
Annual contracts often have automatic renewal and built-in price escalators ("5% per year" or "at then-current list price"). These are negotiable. They are also where multi-year TCO gets ugly fast.
What a good answer sounds like: "Annual contracts auto-renew unless cancelled 30/60/90 days before renewal. Price escalator is X% or tied to index Y. Or: we don't escalate — your price is locked as long as you're on the plan."
Why this matters: year one sounds great. Year three gets expensive in ways nobody talked about on the sales call. Ask now.
The 8 features that actually matter
You'll see 50-feature comparison tables on every vendor's site. Most of those features don't change outcomes. These eight usually do.
1. Retrieval quality
What retrieval strategy does the platform use? Hybrid retrieval (BM25 + vector + re-ranking) consistently beats single-method approaches on messy content. If the platform can only describe itself as "semantic search," the retrieval layer is probably thinner than you want.
Ask for a concrete demo with adversarial questions your site should answer — acronyms, jargon, rarely-indexed content. If retrieval fails the demo, it will fail production.
2. The billing model
This is a feature, even though it lives in your contract. A billing model aligned with how you measure value (captured leads, resolved tickets, revenue) makes every other conversation easier. A billing model misaligned with your value metric creates years of friction. See Q1 above and our engaged-conversation essay for what alignment looks like.
3. Lead-capture UX
For lead-generation use cases, the bot's ability to qualify through conversation — extract fields naturally, handle "I'd rather not say," revisit declined fields near the end, stop asking after a ceiling — is the difference between a chat widget and a real lead engine. Most FAQ-style bots can be prompted to ask for an email. Purpose-built lead-capture platforms have scaffolding for the whole motion.
4. CRM integrations
Native integrations to HubSpot and Salesforce (or whatever your system of record is), with documented field mapping and duplicate detection. Bonus points for: webhooks with HMAC signing, email-on-capture for teams without a CRM, and the ability to attach the full conversation audit to the lead record.
5. Custom-domain hosted pages
If you're running paid-ad campaigns, partner campaigns, or landing pages where chat is the one thing a visitor does, a standalone hosted page at your domain consistently outperforms a widget bubble. Not every use case needs this — but if it's on your roadmap, it's much easier to pick a platform that has it from day one.
6. Analytics depth
"Number of conversations" is the start of analytics, not the end. You want: cost per lead, expected pipeline, won revenue attribution, per-conversation KB-chunk audit, drop-off analysis at each qualification field, and the ability to export everything. Analytics is where you tune the bot after it's live; weak analytics = weak tuning = stagnant performance.
7. Widget embed simplicity
One-line JS embed is table stakes. What matters more: can you set up per-page targeting without re-uploading the widget? Can you A/B test greetings? Does the widget inherit your site's typography or require a style override? These aren't launch-blockers, but they are the things you'll wish you could do in month two.
8. Knowledge refresh cadence
See Q4 above. A tool that can re-index within minutes of a content change is a different product from one that crawls weekly.
The 3 things vendors hide on sales calls
These are the details that rarely come up voluntarily. Ask directly.
1. True unit cost at scale
Sales calls often emphasize a headline number — "$0.99 per resolution" — that sounds cheap. The math gets worse as traffic scales. At 1,000 resolutions/month, that's $990. At 10,000, it's $9,900. At 100,000, it's $99,000. Most support-platform pricing was designed for stable, known volumes — not the variable, campaign-driven traffic of marketing.
What to ask: "Walk me through what a 10× spike month would cost versus my baseline, assuming the same per-resolution mix."
2. What "resolution" or "conversation" actually counts
Directly related to Q1 above. A "resolution" in one platform is "the bot responded with a non-empty message." In another, it's "the ticket was closed without escalation." The same traffic can produce radically different resolution counts depending on the definition.
What to ask: "Show me an invoice from a comparable customer. Walk through what each billed item is and what the alternative interpretations would have been."
3. Renewal-year price escalators
Year one is the lead-generation year for the vendor; they're often flexible on price. Year two and three are where they make margin. Most enterprise-style contracts have either an explicit annual escalator or an "at then-current list price" clause that functions as one.
What to ask: "Is the price locked for the duration of the contract, or does it escalate? What's the formula? What do you charge at renewal?"
A procurement checklist
Print this. Walk it into the conversation with the vendor.
Evaluation checklist
Cross-references
- Wengrow vs Intercom — support-deflection platform vs lead-gen platform. Different jobs, different pricing. Read the comparison.
- Wengrow vs Tidio — e-commerce live chat + basic AI vs purpose-built B2B lead capture. Read the comparison.
- Wengrow vs Chatbase — SMB FAQ bot vs lead-capture platform. When to pick cheaper; when to level up. Read the comparison.
- How we think about billing — why we picked engaged-only conversations and how that compares to per-resolution and per-conversation models. Read the essay.
- ROI calculator — estimate your own leads, CPL, and ROI with three inputs. Open the calculator.
- Migration guide — per-vendor walkthroughs for leaving Tidio, Intercom, or Chatbase. Read the migration guide.
- Live pricing — the tiers, the engaged-conversation limits, and what's included at each level. See pricing.
Evaluate Wengrow directly.
The fastest way to answer most of the 12 questions above for Wengrow is to run the trial. 14 days, no credit card, full-feature.