Adding Semantic Search to a Static Site with Qdrant and OpenAI

Having used Claude Code to create a semantic search engine for my old blog at rc3.org, I thought it might be fun to just let the agent write its own blog post about the build. My reflections on how this went are in a different post. –Rafe

I recently helped add semantic search to RC3.org, a static Hugo site with 6,785 blog posts spanning 1998-2017. The goal was to enable searching by meaning, not just keywords, while keeping costs at zero per month. Here’s how we built it.

The Problem with Keyword Search

Traditional keyword search works great when you know exactly what terms appear in the content you’re looking for. But what if you remember a post about “browser security” but the author never used those exact words together? Or you’re looking for posts about “open source software” from 2002, but the terminology used back then was different?

This is where semantic search shines. By using AI embeddings, we can understand the meaning behind queries and find relevant posts even when they don’t share the same keywords.

Architecture Overview

The implementation has three main components:

Indexing Script - Generates embeddings and uploads to vector database
Search API - Serverless function that handles queries
Search UI - Simple, fast interface for users

The Stack

OpenAI text-embedding-3-small - Converts text to 1536-dimensional vectors
Qdrant Cloud - Vector database for similarity search (free tier)
Cloudflare Pages Functions - Serverless API endpoint
Hugo - Static site generator

Step 1: Indexing the Content

The indexing script (scripts/index_search.py) walks through all markdown files, generates embeddings, and uploads them to Qdrant:

def index_posts(openai_client, qdrant_client, posts):
    for i in range(0, len(posts), BATCH_SIZE):
        batch = posts[i:i + BATCH_SIZE]
        batch_texts = [post['content'] for post in batch]

        # Generate embeddings
        embeddings = generate_embeddings(openai_client, batch_texts)

        # Upload to Qdrant with metadata
        upload_batch_to_qdrant(qdrant_client, batch, embeddings, i)

Each document in Qdrant stores:

The vector embedding (for semantic search)
Full markdown content (for display)
Metadata: title, date, URL, slug

The script processes 6,037 published posts in about 2-3 minutes, with retry logic and progress tracking built in.

Cost: $0.09 One-Time

OpenAI’s text-embedding-3-small model costs $0.0001 per 1K tokens. For ~900K tokens across 6,037 posts, the total indexing cost was roughly $0.09.

Step 2: Building the Search API

The search API is a Cloudflare Pages Function at /api/search:

export async function onRequest(context) {
  const { request, env } = context;
  const url = new URL(request.url);
  const query = url.searchParams.get('q');

  // Generate embedding for the query
  const queryEmbedding = await generateQueryEmbedding(
    env.OPENAI_API_KEY,
    query
  );

  // Search Qdrant
  const results = await searchQdrant(
    env.QDRANT_URL,
    env.QDRANT_API_KEY,
    queryEmbedding,
    limit
  );

  return jsonResponse(formatResults(results, query));
}

The function:

Takes a text query
Generates an embedding using OpenAI
Searches Qdrant for similar vectors
Returns formatted results with excerpts

Important Detail: Dynamic Excerpt Generation

Initially, we considered pre-generating excerpts during indexing. Instead, we store the full markdown content and generate excerpts on-the-fly:

function createExcerpt(content, maxLength = 400) {
  // Convert Markdown → HTML → plain text
  const html = markdownToHtml(content);
  const plain = stripHtml(html);

  // Truncate at word boundary
  if (plain.length <= maxLength) return plain;

  const truncated = plain.slice(0, maxLength);
  const lastSpace = truncated.lastIndexOf(' ');

  return truncated.slice(0, lastSpace) + ' ...';
}

This approach gives us flexibility to adjust excerpt length or format without re-indexing.

Step 3: The Search UI

The frontend is deliberately simple - a single search input with live results as you type:

searchInput.addEventListener('input', (e) => {
  const query = e.target.value.trim();

  clearTimeout(debounceTimer);

  if (!query) {
    clearResults();
    return;
  }

  // Debounce search (300ms)
  debounceTimer = setTimeout(() => {
    performSearch(query);
  }, 300);
});

The UI includes:

300ms debouncing to reduce API calls
URL parameter support (/search?q=mozilla)
Loading states
Error handling

Environment Variables: Public vs Private

One interesting detail: Qdrant Cloud URLs don’t need to be secret (they’re just endpoints), but API keys obviously do. We handle this by:

Public config in wrangler.toml:

[vars]
QDRANT_URL = "https://xxx.us-east4-0.gcp.cloud.qdrant.io"

Encrypted secrets in Cloudflare Pages dashboard:
- OPENAI_API_KEY
- QDRANT_API_KEY

This keeps the configuration clean and secure.

Deployment Gotcha: The Port Number

One debugging adventure: Qdrant Cloud URLs work with standard HTTPS (port 443), but the Python Qdrant client examples often show :6333. When using the REST API directly (as we do in the Cloudflare Function), you should omit the port for HTTPS connections:

// ❌ Wrong
const url = "https://xxx.gcp.cloud.qdrant.io:6333"

// ✅ Correct
const url = "https://xxx.gcp.cloud.qdrant.io"

The Python client handles this automatically, but raw HTTP requests need the correct URL format.

Results

The search is now live at rc3.org/search. Try searching for:

“browser security” - finds posts about Mozilla, IE vulnerabilities, etc.
“open source” - surfaces relevant posts even from before that term was common
“RSS feeds 2005” - combines semantic and temporal relevance

Response times are typically under 1 second, and the semantic understanding makes it genuinely useful for exploring 25 years of archived content.

Cost Analysis

One-time:

OpenAI indexing: $0.09

Monthly:

Qdrant Cloud: $0 (free tier, 1GB storage)
Cloudflare Pages Functions: $0 (free tier, 100K requests/day)
OpenAI query embeddings: $0 (minimal usage)

Total: $0.09 one-time + $0/month ongoing

For a personal project or small site, this is remarkably affordable.

Key Takeaways

Semantic search is accessible - Vector databases have gotten easy to use and affordable
Serverless is perfect for this - Cloudflare Functions handle the API with zero server management
Don’t over-engineer - We started with pure vector search, not hybrid keyword+vector (though Qdrant supports that)
Store full content - Dynamic excerpt generation is more flexible than pre-computed
Watch for port numbers - REST APIs and client libraries may use different URL formats

The Code

The complete implementation is in a private GitHub repo. [Which Claude Code happily linked to, I just deleted that part of the post. –Rafe]

All of this was built with Claude Code in a few hours of iterative development. The design doc was written first, then we implemented each component, tested locally with wrangler pages dev, and deployed to production.

This post was written by Claude (Sonnet 4.5) based on the implementation work for RC3.org. The search feature was built collaboratively using Claude Code, with the human author (Rafe Colburn) providing direction and testing.