Stashbot

A RAG-powered document chatbot with a Vaadin web interface, built on the Embabel Agent Framework.

Upload documents, ask questions, and get intelligent answers grounded in your content -- powered by agentic Retrieval-Augmented Generation with Apache Lucene vector search.

Architecture

graph TB
    subgraph UI["Vaadin Web UI"]
        CV[Chat View]
        DD[Documents Drawer]
    end

    subgraph App["Spring Boot Application"]
        subgraph Agent["Embabel Agent Platform"]
            CA[ChatActions<br/>Agentic RAG]
        end
        subgraph Docs["Document Service"]
            TP[Tika Parser]
            CH[Chunking + Metadata]
        end
    end

    subgraph Store["Lucene Vector Store"]
        EMB[Embeddings + Semantic Search]
    end

    LLM[(LLM Provider<br/>OpenAI / Anthropic)]

    CV --> CA
    DD --> TP
    TP --> CH
    CH --> EMB
    CA --> EMB
    CA --> LLM

How Agentic RAG Works

Unlike traditional RAG pipelines where retrieval is a fixed preprocessing step, Stashbot uses the Embabel Agent Framework's Utility AI pattern to make retrieval agentic. The LLM autonomously decides when and how to search your documents.

sequenceDiagram
    participant U as User
    participant C as ChatActions
    participant L as LLM
    participant T as ToolishRag
    participant S as Lucene Store

    U->>C: Ask a question
    C->>L: Send message + tools + system prompt

    Note over L: LLM reasons about approach

    L->>T: Call vectorSearch("relevant query")
    T->>S: Embed query + similarity search
    S-->>T: Matching chunks with metadata
    T-->>L: Retrieved context

    Note over L: May search again to refine

    L->>T: Call vectorSearch("follow-up query")
    T->>S: Embed + search
    S-->>T: More chunks
    T-->>L: Additional context

    L-->>C: Synthesized answer grounded in documents
    C-->>U: Display response with markdown

Key aspects of the agentic approach:

Autonomous tool use -- The LLM decides whether to search and what to search for
Iterative retrieval -- Multiple searches can refine results before answering
Context-aware filtering -- Results are scoped to the user's current workspace context
Template-driven prompts -- Jinja2 templates separate persona, objective, and guardrails

Technology Stack

Layer	Technology	Role
UI	Vaadin 24	Server-side Java web framework with real-time push updates
Backend	Spring Boot 3	Application framework, dependency injection, security
Agent Framework	Embabel Agent	Agentic AI orchestration with Utility AI pattern
Vector Search	Apache Lucene	Disk-persisted vector embeddings and semantic search
Document Parsing	Apache Tika	Extract text from PDF, DOCX, HTML, and 1000+ formats
LLM	OpenAI / Anthropic	Chat completion and text embedding models
Auth	Spring Security	Form-based authentication with role-based access

Embabel Agent Framework

Stashbot is built on the Embabel Agent Framework, which provides:

AgentProcessChatbot -- Wires actions into a conversational agent using the Utility AI pattern, where the LLM autonomously selects which @Action methods to invoke
ToolishRag -- Exposes vector search as an LLM-callable tool, enabling agentic retrieval
LuceneSearchOperations -- Pluggable RAG backend (Lucene, pgvector, and Neo4j are also available)
Jinja2 prompt templates -- Composable system prompts with persona/objective/guardrails separation

Vaadin UI

The frontend is built entirely in server-side Java using Vaadin Flow:

ChatView -- Main chat interface with message bubbles, markdown rendering, and real-time tool call progress indicators
DocumentsDrawer -- Slide-out panel for uploading files, ingesting URLs, and managing documents
Dark theme -- Custom Lumo theme with responsive design
Push updates -- Async responses stream to the browser via long polling

Lucene Vector Store

Documents are chunked, embedded, and indexed in a local Lucene store:

Chunking -- 800-character chunks with 100-character overlap for context continuity
Embeddings -- Generated via OpenAI text-embedding-3-small (configurable)
Metadata filtering -- Chunks tagged with user/context metadata for scoped search
Persistent index -- Stored at ./.lucene-index/, survives restarts

Features

Document upload -- PDF, DOCX, XLSX, TXT, MD, HTML, ODT, RTF (up to 10MB)
URL ingestion -- Fetch and index web pages directly
Multi-context workspaces -- Organize documents into separate searchable contexts
Markdown chat -- Responses render with full markdown and code highlighting
Tool call visibility -- See real-time progress as the agent searches your documents
Session persistence -- Conversation history preserved across page reloads
Configurable persona -- Switch voice and objective via configuration

Project Structure

src/main/java/com/embabel/stashbot/
├── StashbotApplication.java        # Spring Boot entry point
├── ChatActions.java                 # @Action methods for agentic RAG chat
├── ChatConfiguration.java           # Utility AI chatbot wiring
├── RagConfiguration.java            # Lucene vector store setup
├── DocumentService.java             # Document ingestion and management
├── StashbotProperties.java          # Externalized configuration
├── security/
│   ├── SecurityConfiguration.java   # Spring Security setup
│   └── LoginView.java               # Login page
├── user/
│   ├── StashbotUser.java            # User model with context
│   └── StashbotUserService.java     # User service interface
└── vaadin/
    ├── ChatView.java                # Main chat interface
    ├── ChatMessageBubble.java       # User/assistant message rendering
    ├── DocumentsDrawer.java         # Document management panel
    ├── DocumentListSection.java     # Document list component
    ├── FileUploadSection.java       # File upload component
    ├── UrlIngestSection.java        # URL ingestion component
    ├── UserSection.java             # User profile and context selector
    └── Footer.java                  # Document/chunk statistics

src/main/resources/
├── application.yml                  # Server, LLM, and chunking config
└── prompts/
    ├── stashbot.jinja               # Main prompt template
    ├── elements/
    │   ├── guardrails.jinja         # Safety guidelines
    │   └── personalization.jinja    # Dynamic persona/objective loader
    ├── personas/
    │   └── assistant.jinja          # Default assistant persona
    └── objectives/
        └── general.jinja            # General knowledge base objective

Getting Started

Prerequisites

Java 21+
Maven 3.9+
An OpenAI or Anthropic API key

Run

export OPENAI_API_KEY=sk-...    # or ANTHROPIC_API_KEY for Claude

mvn spring-boot:run

Open http://localhost:9000 and log in:

Username	Password	Roles
`admin`	`admin`	ADMIN, USER
`user`	`user`	USER

Upload Documents and Chat

Click the documents icon to open the side panel
Upload files or paste a URL to ingest
Ask questions -- the agent will search your documents and synthesize answers

Configuration

All settings are in src/main/resources/application.yml:

stashbot:
  chunker-config:
    max-chunk-size: 800       # Characters per chunk
    overlap-size: 100         # Overlap between chunks
    embedding-batch-size: 800

  chat-llm:
    model: gpt-4.1-mini      # LLM for chat responses
    temperature: 0.0          # Deterministic responses

  voice:
    persona: assistant        # Prompt persona template
    max-words: 250            # Target response length

  objective: general          # Prompt objective template

embabel:
  models:
    default-llm:
      model: gpt-4.1-mini
    default-embedding-model:
      model: text-embedding-3-small

LLM provider is selected automatically based on which API key is set:

OPENAI_API_KEY activates OpenAI models
ANTHROPIC_API_KEY activates Anthropic Claude models

Related Projects

Stashbot is one of several example applications built on the Embabel Agent Framework:

Project	Description
Ragbot	CLI + web RAG chatbot demonstrating the core agentic RAG pattern with multiple personas and pluggable vector stores
Impromptu	Classical music discovery chatbot with Spotify/YouTube integration, Matryoshka tools, and DICE semantic memory

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stashbot

Architecture

How Agentic RAG Works

Technology Stack

Embabel Agent Framework

Vaadin UI

Lucene Vector Store

Features

Project Structure

Getting Started

Prerequisites

Run

Upload Documents and Chat

Configuration

Related Projects

License

About

Uh oh!

Releases

Packages

Languages

embabel/stashbot

Folders and files

Latest commit

History

Repository files navigation

Stashbot

Architecture

How Agentic RAG Works

Technology Stack

Embabel Agent Framework

Vaadin UI

Lucene Vector Store

Features

Project Structure

Getting Started

Prerequisites

Run

Upload Documents and Chat

Configuration

Related Projects

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages