Skip to main content

Chat

Every conversation is a chance to convert. The Chat interface is where Turing ES's reasoning, search, and brand voice meet your customer's question.

The Chat interface is the front door of every AI capability in Turing ES. It's organized into three modes, each tuned for a different shape of conversation:

ModeWhen it shinesWhat grounds the answer
Chat (direct LLM)The user wants raw assistance — "summarize this PDF", "draft this email"The LLM's parametric knowledge + any tools the user enables
Semantic NavigationThe user is looking for something inside your indexed contentStrict — only your indexed sites and documents
AI Agent (one tab per agent)The user is having a purpose-specific conversation (sales, onboarding, support)The agent's tools + persona + your content

Every mode streams responses token by token, supports rich content (Markdown, code, charts, diagrams, HTML previews), and stores history in the user's browser — not on your servers.

LLM required

The Chat interface is only available when at least one LLM Instance is configured and enabled. See LLM Instances to set one up.


The Big Picture

The chat surface looks the same across modes, but each mode reaches a different backend and uses a different prompt strategy.

Chat Interface — Layout Overview

Header controls:

ControlDescription
Tab navigationSwitch between Chat, Semantic Navigation, and the dynamic AI Agent tabs
LLM model selectorPick the LLM Instance to run this conversation against
New ChatStart a fresh session (the current one is saved automatically)
Dark mode toggleSwitch between light and dark themes; code highlighting follows
Session historyBrowse, restore, or delete previous conversations from this browser

Context Bar — sits below the message area:

IndicatorBehaviour
Token counterShows current/max (e.g., 2.5k/128k); estimated client-side at ~4 chars per token
Progress barVisual fill — blue under 60%, yellow at 60–79%, red at 80%+
CompactCompresses the conversation via the LLM to free context space

Mode 1 — Chat (Direct LLM)

A general-purpose conversation with the selected LLM. Use this when the question doesn't depend on your enterprise content — the user wants the model itself, sometimes augmented with a few opt-in tools.

Real conversations this mode handles well:

  • "Rewrite this paragraph in a more formal tone."
  • "Plot the rolling 30-day average of these numbers."
  • "What's the weather in Lisbon next Tuesday?"
  • "Find me an image of a Saturn V launch."

File Attachments

Drag-and-drop, or click the paperclip. Two paths, depending on the file:

File typeHow it's handled
Documents (PDF, DOCX, XLSX, PPTX, HTML, TXT, …)Text extracted via Apache Tika and added to the prompt as context
Images (PNG, JPEG, WebP, GIF, …)Sent directly as media to vision-capable models (Claude Sonnet, GPT-4o, Gemini)

Attached files appear as badges on the message they're sent with. Multiple files per message are supported.

Streaming

Every response streams in real time over Server-Sent Events (SSE). The user sees tokens arrive as the model produces them — no spinner, no wait. Perceived latency drops by roughly half compared to wait-then-show.

Tools You Can Enable

In direct LLM mode, the user toggles tools per conversation. The LLM decides when to invoke them.

ToolWhat it doesLatency typical
Code InterpreterRuns Python in a sandbox (Matplotlib supported). 30s timeout. Files generated (charts, CSVs) come back as download links inline.1–10s
Web CrawlerFetches and extracts a public web page. Up to 12,000 chars of body text and 30 links.1–3s
Image SearchSearches images via DuckDuckGo / Bing. Up to 8 results.1–2s
Weather1–7 day forecast for any location (Open-Meteo).<1s
FinanceStock quotes and historical prices via Yahoo Finance.1–2s
Date / TimeCurrent date/time in any timezone.<100ms
RAG SearchSearches the Knowledge Base by semantic similarity; lists files; reads file contents.100–500ms

The model picks which tool to call based on the question. The user enables the menu of options; the LLM picks within it.


Mode 2 — Semantic Navigation

This is where the chat becomes a grounded conversation: the LLM's job is no longer to answer from its training data, but to search your indexed sites and explain what it finds.

Use this mode when:

  • The question is about your products, internal documentation, or published content.
  • You need answers that always trace back to a real document.
  • Hallucinations are unacceptable — the LLM is constrained to answer from indexed content.

The system prompt for this mode is built per request and includes:

  • The list of available SN Sites and their locales,
  • The facets configured on each site (so the model knows which filters it can use),
  • An instruction to ground every answer in search results.

Tools available:

ToolPurpose
list_sitesEnumerates available SN Sites and their locales
get_site_fieldsReturns the indexed fields and facets for a specific site
get_valid_filter_valuesLists valid values for a given facet field — prevents the model from inventing filters
search_sitePerforms a semantic search and returns results with snippets and metadata

Any MCP Servers configured globally are also available here, extending the tool set with external capabilities (e.g., a CRM lookup, a translation service).

When to use SN mode vs. an Agent

SN mode is the right answer when the user is looking for something inside your content but doesn't need a specific persona or workflow. It's the AI version of "search the site, but smart". Agent mode is the right answer when the conversation has a purpose — booking a demo, qualifying a lead, getting onboarded. Both can search the same content; the agent layers persona + workflow on top.


Mode 3 — AI Agents

Each AI Agent configured and enabled in Administration → AI Agents appears as its own tab in Chat. The visitor picks the specialist that fits their need.

What's per-agentSet in
Name & AvatarAgent → Settings tab
System promptAgent → Settings tab
LLM InstanceAgent → LLM tab
Native tool selectionAgent → Tools tab
MCP serversAgent → MCP Servers tab
Persona (brand voice)Agent → Settings (Persona dropdown)

The Persona is the voice layer — see Personas for full coverage. With a persona attached, every conversation in this tab speaks in your brand's tone, uses your mandatory vocabulary, avoids your forbidden vocabulary, and (if configured) draws on a few-shot store of curated Q/A pairs plus live brand context from an MCP server.

Use agents when you want consistency across thousands of conversations — "every visitor who lands on the discovery agent should hear the same voice, regardless of LLM or time of day".


Picking a Mode for the Job

If you take one thing away from this page, take this:

The user is trying to…Use this mode
Generate, summarize, or transform content (no enterprise grounding needed)Chat (direct LLM)
Find or understand something inside your indexed contentSemantic Navigation
Do a purposeful conversation (sell, support, onboard, qualify)AI Agent with a persona
Investigate a specific past conversation and what went well or wrongOpen the Chat Analytics page (separate, not in Chat)

You don't have to pick one mode and stick with it. Most deployments use all three — direct chat for power users, SN chat for content exploration, and agents for customer-facing flows.


How a Message Travels End-to-End

What actually happens between the user pressing Enter and the response appearing?

  1. Front-end captures the message + any attachments and POSTs to the appropriate endpoint:
    • Direct LLM → /api/v2/llm/{instanceId}/chat
    • SN chat → /api/v2/llm/{instanceId}/semantic-chat
    • Agent → /api/v2/ai-agent/{agentId}/chat
  2. Spring AI assembles the prompt: agent system prompt → persona overlays (if any) → tool definitions → conversation history → current message + media.
  3. Tool calling — if the LLM requests a tool, Spring AI executes it (native or MCP), returns the result, and the loop continues until the model decides to answer.
  4. Streaming — the response flows back through Flux<ChatResponse>. The front-end consumes it as SSE, rendering each token as it arrives.
  5. Persona post-validationTurPersonaToneValidator redacts any forbidden terms before the response is finalized.
  6. Analytics emissionTurChatAnalyticsService.recordSessionStart / recordTurn / recordSessionEnd record turn count, token usage, and outcome to the analytics store. See Chat Analytics.
  7. Browser stores the session in IndexedDB, attaches an auto-generated title (LLM-summarized from the first exchange), and updates the session sidebar.

If observability is enabled (Prometheus scraping /actuator/prometheus), every step also emits metrics — see Observability.


Rich Content Rendering

Chat responses are rendered with full media-type awareness:

Content typeRendering
MarkdownGitHub Flavored — tables, strikethrough, task lists, inline code, blockquotes
Code blocksSyntax highlighting via highlight.js with automatic light/dark theme switching
D2 diagramsRendered to SVG via WASM; falls back to a dev server in development
HTMLSandboxed preview in an iframe; toggle between rendered view and source, with fullscreen
Generated filesFiles from the Code Interpreter (charts, CSVs, processed data) appear as inline download links

Code blocks pick up the chat's dark/light theme automatically — no flash on switch.


Session History

Sessions are stored in the browser's IndexedDB. They never leave the user's machine.

What that means in practice:

  • Sessions are per browser and per device — clearing browser data removes them.
  • No authentication is required to access past sessions.
  • No server cost for session storage.

Session sidebar features:

FeatureDescription
Auto-titleA short title is generated by the LLM from the first exchange; falls back to the first message text if generation fails
Model badgeWhich LLM model ran the session
Message countNumber of messages
TimestampDate/time of the last message
RestoreClick to resume a previous session
DeleteRemove a session from local history

Sessions are saved automatically after each complete response.

Browser-local vs. Chat Analytics

The IndexedDB session history is a user convenience — it's how a single user finds their own past conversations. The Chat Analytics store is operator analytics — anonymized session metadata + transcripts that ship to MongoDB or Redis for dashboarding. They serve different audiences and don't depend on each other.


Context Window Management

A context bar at the bottom of the input shows token usage in real time:

2.5k / 128k ████████░░░░░░░░░░░░

Tokens are estimated client-side at ~4 characters per token (Math.ceil(text.length / 4)), counting the full message history. The bar's fill and color reflect how close you are to the model's context window.

Resolving the Context Window Size

When the front-end first loads a session, it figures out the model's window via a three-tier fallback:

  1. Backend APIGET /v2/llm/{instanceId}/chat/context-info returns the configured limit. The front-end caches it.
  2. LLM Instance configuration — the contextWindow field set on the instance.
  3. Default — 128,000 tokens, used when neither of the above is available.

Progress Bar Colours

UsageColourMeaning
Below 60%BluePlenty of room
60–79%YellowConsider compacting
80%+RedCompact before the limit

Compact

The Compact button (lightning bolt) becomes available when the conversation has at least 4 messages and the model isn't currently generating. Compacting:

  1. Sends the full history to the same LLM with a summarisation prompt.
  2. The LLM produces a concise summary preserving facts, decisions, and context.
  3. The conversation in IndexedDB is replaced with a single **[Context compacted]** block + the summary.
  4. The next user turn continues from the compacted state, with significantly reduced token usage.

Compacting works in all three modes.

When to compact

When the bar turns yellow, compact proactively. By the time it's red, the model may have already started losing context from the earliest messages. The Compact tooltip shows how much context remains.


Files & Attachments Reference

CapabilityDetail
Upload methodDrag-and-drop onto the chat window, or click the attachment button
Transfer formatMultipart form (files sent together with the message)
Document processingApache Tika extracts text from PDF, DOCX, XLSX, PPTX, HTML, TXT, and more
Image processingPassed directly as media bytes to vision-capable models
DisplayShown as file badges on the sent message bubble

API Endpoints

MethodEndpointMode
POST/api/v2/llm/{instanceId}/chatDirect LLM (SSE stream)
POST/api/v2/llm/{instanceId}/semantic-chatSemantic Navigation (SSE stream, tool-calling enabled)
POST/api/v2/ai-agent/{agentId}/chatAgent (SSE stream, full agent execution loop)
GET/api/v2/llm/{instanceId}/chat/context-infoContext window size resolution

See REST API Reference → GenAI API for request/response shapes and examples.


PageDescription
AI AgentsThe specialists that power the per-tab agent conversations
PersonasThe voice layer applied to every customer-facing agent
LLM InstancesConfigure and connect the model providers
Tool CallingThe 27 native tools and how they appear in chat
MCP ServersConnect external tool servers to chat
Chat AnalyticsInvestigate, classify, and learn from past conversations
ObservabilityPrometheus + Grafana dashboards over chat traffic