Chat
Every conversation is a chance to convert. The Chat interface is where Turing ES's reasoning, search, and brand voice meet your customer's question.
The Chat interface is the front door of every AI capability in Turing ES. It's organized into three modes, each tuned for a different shape of conversation:
| Mode | When it shines | What grounds the answer |
|---|---|---|
| Chat (direct LLM) | The user wants raw assistance — "summarize this PDF", "draft this email" | The LLM's parametric knowledge + any tools the user enables |
| Semantic Navigation | The user is looking for something inside your indexed content | Strict — only your indexed sites and documents |
| AI Agent (one tab per agent) | The user is having a purpose-specific conversation (sales, onboarding, support) | The agent's tools + persona + your content |
Every mode streams responses token by token, supports rich content (Markdown, code, charts, diagrams, HTML previews), and stores history in the user's browser — not on your servers.
The Chat interface is only available when at least one LLM Instance is configured and enabled. See LLM Instances to set one up.
The Big Picture
The chat surface looks the same across modes, but each mode reaches a different backend and uses a different prompt strategy.
Header controls:
| Control | Description |
|---|---|
| Tab navigation | Switch between Chat, Semantic Navigation, and the dynamic AI Agent tabs |
| LLM model selector | Pick the LLM Instance to run this conversation against |
| New Chat | Start a fresh session (the current one is saved automatically) |
| Dark mode toggle | Switch between light and dark themes; code highlighting follows |
| Session history | Browse, restore, or delete previous conversations from this browser |
Context Bar — sits below the message area:
| Indicator | Behaviour |
|---|---|
| Token counter | Shows current/max (e.g., 2.5k/128k); estimated client-side at ~4 chars per token |
| Progress bar | Visual fill — blue under 60%, yellow at 60–79%, red at 80%+ |
| Compact | Compresses the conversation via the LLM to free context space |
Mode 1 — Chat (Direct LLM)
A general-purpose conversation with the selected LLM. Use this when the question doesn't depend on your enterprise content — the user wants the model itself, sometimes augmented with a few opt-in tools.
Real conversations this mode handles well:
- "Rewrite this paragraph in a more formal tone."
- "Plot the rolling 30-day average of these numbers."
- "What's the weather in Lisbon next Tuesday?"
- "Find me an image of a Saturn V launch."
File Attachments
Drag-and-drop, or click the paperclip. Two paths, depending on the file:
| File type | How it's handled |
|---|---|
| Documents (PDF, DOCX, XLSX, PPTX, HTML, TXT, …) | Text extracted via Apache Tika and added to the prompt as context |
| Images (PNG, JPEG, WebP, GIF, …) | Sent directly as media to vision-capable models (Claude Sonnet, GPT-4o, Gemini) |
Attached files appear as badges on the message they're sent with. Multiple files per message are supported.
Streaming
Every response streams in real time over Server-Sent Events (SSE). The user sees tokens arrive as the model produces them — no spinner, no wait. Perceived latency drops by roughly half compared to wait-then-show.
Tools You Can Enable
In direct LLM mode, the user toggles tools per conversation. The LLM decides when to invoke them.
| Tool | What it does | Latency typical |
|---|---|---|
| Code Interpreter | Runs Python in a sandbox (Matplotlib supported). 30s timeout. Files generated (charts, CSVs) come back as download links inline. | 1–10s |
| Web Crawler | Fetches and extracts a public web page. Up to 12,000 chars of body text and 30 links. | 1–3s |
| Image Search | Searches images via DuckDuckGo / Bing. Up to 8 results. | 1–2s |
| Weather | 1–7 day forecast for any location (Open-Meteo). | <1s |
| Finance | Stock quotes and historical prices via Yahoo Finance. | 1–2s |
| Date / Time | Current date/time in any timezone. | <100ms |
| RAG Search | Searches the Knowledge Base by semantic similarity; lists files; reads file contents. | 100–500ms |
The model picks which tool to call based on the question. The user enables the menu of options; the LLM picks within it.
Mode 2 — Semantic Navigation
This is where the chat becomes a grounded conversation: the LLM's job is no longer to answer from its training data, but to search your indexed sites and explain what it finds.
Use this mode when:
- The question is about your products, internal documentation, or published content.
- You need answers that always trace back to a real document.
- Hallucinations are unacceptable — the LLM is constrained to answer from indexed content.
The system prompt for this mode is built per request and includes:
- The list of available SN Sites and their locales,
- The facets configured on each site (so the model knows which filters it can use),
- An instruction to ground every answer in search results.
Tools available:
| Tool | Purpose |
|---|---|
list_sites | Enumerates available SN Sites and their locales |
get_site_fields | Returns the indexed fields and facets for a specific site |
get_valid_filter_values | Lists valid values for a given facet field — prevents the model from inventing filters |
search_site | Performs a semantic search and returns results with snippets and metadata |
Any MCP Servers configured globally are also available here, extending the tool set with external capabilities (e.g., a CRM lookup, a translation service).
SN mode is the right answer when the user is looking for something inside your content but doesn't need a specific persona or workflow. It's the AI version of "search the site, but smart". Agent mode is the right answer when the conversation has a purpose — booking a demo, qualifying a lead, getting onboarded. Both can search the same content; the agent layers persona + workflow on top.
Mode 3 — AI Agents
Each AI Agent configured and enabled in Administration → AI Agents appears as its own tab in Chat. The visitor picks the specialist that fits their need.
| What's per-agent | Set in |
|---|---|
| Name & Avatar | Agent → Settings tab |
| System prompt | Agent → Settings tab |
| LLM Instance | Agent → LLM tab |
| Native tool selection | Agent → Tools tab |
| MCP servers | Agent → MCP Servers tab |
| Persona (brand voice) | Agent → Settings (Persona dropdown) |
The Persona is the voice layer — see Personas for full coverage. With a persona attached, every conversation in this tab speaks in your brand's tone, uses your mandatory vocabulary, avoids your forbidden vocabulary, and (if configured) draws on a few-shot store of curated Q/A pairs plus live brand context from an MCP server.
Use agents when you want consistency across thousands of conversations — "every visitor who lands on the discovery agent should hear the same voice, regardless of LLM or time of day".
Picking a Mode for the Job
If you take one thing away from this page, take this:
| The user is trying to… | Use this mode |
|---|---|
| Generate, summarize, or transform content (no enterprise grounding needed) | Chat (direct LLM) |
| Find or understand something inside your indexed content | Semantic Navigation |
| Do a purposeful conversation (sell, support, onboard, qualify) | AI Agent with a persona |
| Investigate a specific past conversation and what went well or wrong | Open the Chat Analytics page (separate, not in Chat) |
You don't have to pick one mode and stick with it. Most deployments use all three — direct chat for power users, SN chat for content exploration, and agents for customer-facing flows.
How a Message Travels End-to-End
What actually happens between the user pressing Enter and the response appearing?
- Front-end captures the message + any attachments and POSTs to the appropriate endpoint:
- Direct LLM →
/api/v2/llm/{instanceId}/chat - SN chat →
/api/v2/llm/{instanceId}/semantic-chat - Agent →
/api/v2/ai-agent/{agentId}/chat
- Direct LLM →
- Spring AI assembles the prompt: agent system prompt → persona overlays (if any) → tool definitions → conversation history → current message + media.
- Tool calling — if the LLM requests a tool, Spring AI executes it (native or MCP), returns the result, and the loop continues until the model decides to answer.
- Streaming — the response flows back through
Flux<ChatResponse>. The front-end consumes it as SSE, rendering each token as it arrives. - Persona post-validation —
TurPersonaToneValidatorredacts any forbidden terms before the response is finalized. - Analytics emission —
TurChatAnalyticsService.recordSessionStart/recordTurn/recordSessionEndrecord turn count, token usage, and outcome to the analytics store. See Chat Analytics. - Browser stores the session in IndexedDB, attaches an auto-generated title (LLM-summarized from the first exchange), and updates the session sidebar.
If observability is enabled (Prometheus scraping /actuator/prometheus), every step also emits metrics — see Observability.
Rich Content Rendering
Chat responses are rendered with full media-type awareness:
| Content type | Rendering |
|---|---|
| Markdown | GitHub Flavored — tables, strikethrough, task lists, inline code, blockquotes |
| Code blocks | Syntax highlighting via highlight.js with automatic light/dark theme switching |
| D2 diagrams | Rendered to SVG via WASM; falls back to a dev server in development |
| HTML | Sandboxed preview in an iframe; toggle between rendered view and source, with fullscreen |
| Generated files | Files from the Code Interpreter (charts, CSVs, processed data) appear as inline download links |
Code blocks pick up the chat's dark/light theme automatically — no flash on switch.
Session History
Sessions are stored in the browser's IndexedDB. They never leave the user's machine.
What that means in practice:
- Sessions are per browser and per device — clearing browser data removes them.
- No authentication is required to access past sessions.
- No server cost for session storage.
Session sidebar features:
| Feature | Description |
|---|---|
| Auto-title | A short title is generated by the LLM from the first exchange; falls back to the first message text if generation fails |
| Model badge | Which LLM model ran the session |
| Message count | Number of messages |
| Timestamp | Date/time of the last message |
| Restore | Click to resume a previous session |
| Delete | Remove a session from local history |
Sessions are saved automatically after each complete response.
The IndexedDB session history is a user convenience — it's how a single user finds their own past conversations. The Chat Analytics store is operator analytics — anonymized session metadata + transcripts that ship to MongoDB or Redis for dashboarding. They serve different audiences and don't depend on each other.
Context Window Management
A context bar at the bottom of the input shows token usage in real time:
2.5k / 128k ████████░░░░░░░░░░░░
Tokens are estimated client-side at ~4 characters per token (Math.ceil(text.length / 4)), counting the full message history. The bar's fill and color reflect how close you are to the model's context window.
Resolving the Context Window Size
When the front-end first loads a session, it figures out the model's window via a three-tier fallback:
- Backend API —
GET /v2/llm/{instanceId}/chat/context-inforeturns the configured limit. The front-end caches it. - LLM Instance configuration — the
contextWindowfield set on the instance. - Default — 128,000 tokens, used when neither of the above is available.
Progress Bar Colours
| Usage | Colour | Meaning |
|---|---|---|
| Below 60% | Blue | Plenty of room |
| 60–79% | Yellow | Consider compacting |
| 80%+ | Red | Compact before the limit |
Compact
The Compact button (lightning bolt) becomes available when the conversation has at least 4 messages and the model isn't currently generating. Compacting:
- Sends the full history to the same LLM with a summarisation prompt.
- The LLM produces a concise summary preserving facts, decisions, and context.
- The conversation in IndexedDB is replaced with a single
**[Context compacted]**block + the summary. - The next user turn continues from the compacted state, with significantly reduced token usage.
Compacting works in all three modes.
When the bar turns yellow, compact proactively. By the time it's red, the model may have already started losing context from the earliest messages. The Compact tooltip shows how much context remains.
Files & Attachments Reference
| Capability | Detail |
|---|---|
| Upload method | Drag-and-drop onto the chat window, or click the attachment button |
| Transfer format | Multipart form (files sent together with the message) |
| Document processing | Apache Tika extracts text from PDF, DOCX, XLSX, PPTX, HTML, TXT, and more |
| Image processing | Passed directly as media bytes to vision-capable models |
| Display | Shown as file badges on the sent message bubble |
API Endpoints
| Method | Endpoint | Mode |
|---|---|---|
POST | /api/v2/llm/{instanceId}/chat | Direct LLM (SSE stream) |
POST | /api/v2/llm/{instanceId}/semantic-chat | Semantic Navigation (SSE stream, tool-calling enabled) |
POST | /api/v2/ai-agent/{agentId}/chat | Agent (SSE stream, full agent execution loop) |
GET | /api/v2/llm/{instanceId}/chat/context-info | Context window size resolution |
See REST API Reference → GenAI API for request/response shapes and examples.
Related Pages
| Page | Description |
|---|---|
| AI Agents | The specialists that power the per-tab agent conversations |
| Personas | The voice layer applied to every customer-facing agent |
| LLM Instances | Configure and connect the model providers |
| Tool Calling | The 27 native tools and how they appear in chat |
| MCP Servers | Connect external tool servers to chat |
| Chat Analytics | Investigate, classify, and learn from past conversations |
| Observability | Prometheus + Grafana dashboards over chat traffic |