Adding Semantic Search to a Static Site with Qdrant and OpenAI
Having used Claude Code to create a semantic search engine for my old blog at rc3.org, I thought it might be fun to just let the agent write its own blog post about the build. My reflections on how this went are in a different post. –Rafe
I recently helped add semantic search to RC3.org, a static Hugo site with 6,785 blog posts spanning 1998-2017. The goal was to enable searching by meaning, not just keywords, while keeping costs at zero per month. Here’s how we built it.
The Problem with Keyword Search
Traditional keyword search works great when you know exactly what terms appear in the content you’re looking for. But what if you remember a post about “browser security” but the author never used those exact words together? Or you’re looking for posts about “open source software” from 2002, but the terminology used back then was different?
This is where semantic search shines. By using AI embeddings, we can understand the meaning behind queries and find relevant posts even when they don’t share the same keywords.
Architecture Overview
The implementation has three main components:
- Indexing Script - Generates embeddings and uploads to vector database
- Search API - Serverless function that handles queries
- Search UI - Simple, fast interface for users
The Stack
- OpenAI text-embedding-3-small - Converts text to 1536-dimensional vectors
- Qdrant Cloud - Vector database for similarity search (free tier)
- Cloudflare Pages Functions - Serverless API endpoint
- Hugo - Static site generator
Step 1: Indexing the Content
The indexing script (scripts/index_search.py) walks through all markdown files, generates embeddings, and uploads them to Qdrant:
def index_posts(openai_client, qdrant_client, posts):
for i in range(0, len(posts), BATCH_SIZE):
batch = posts[i:i + BATCH_SIZE]
batch_texts = [post['content'] for post in batch]
# Generate embeddings
embeddings = generate_embeddings(openai_client, batch_texts)
# Upload to Qdrant with metadata
upload_batch_to_qdrant(qdrant_client, batch, embeddings, i)
Each document in Qdrant stores:
- The vector embedding (for semantic search)
- Full markdown content (for display)
- Metadata: title, date, URL, slug
The script processes 6,037 published posts in about 2-3 minutes, with retry logic and progress tracking built in.
Cost: $0.09 One-Time
OpenAI’s text-embedding-3-small model costs $0.0001 per 1K tokens. For ~900K tokens across 6,037 posts, the total indexing cost was roughly $0.09.
Step 2: Building the Search API
The search API is a Cloudflare Pages Function at /api/search:
export async function onRequest(context) {
const { request, env } = context;
const url = new URL(request.url);
const query = url.searchParams.get('q');
// Generate embedding for the query
const queryEmbedding = await generateQueryEmbedding(
env.OPENAI_API_KEY,
query
);
// Search Qdrant
const results = await searchQdrant(
env.QDRANT_URL,
env.QDRANT_API_KEY,
queryEmbedding,
limit
);
return jsonResponse(formatResults(results, query));
}
The function:
- Takes a text query
- Generates an embedding using OpenAI
- Searches Qdrant for similar vectors
- Returns formatted results with excerpts
Important Detail: Dynamic Excerpt Generation
Initially, we considered pre-generating excerpts during indexing. Instead, we store the full markdown content and generate excerpts on-the-fly:
function createExcerpt(content, maxLength = 400) {
// Convert Markdown → HTML → plain text
const html = markdownToHtml(content);
const plain = stripHtml(html);
// Truncate at word boundary
if (plain.length <= maxLength) return plain;
const truncated = plain.slice(0, maxLength);
const lastSpace = truncated.lastIndexOf(' ');
return truncated.slice(0, lastSpace) + ' ...';
}
This approach gives us flexibility to adjust excerpt length or format without re-indexing.
Step 3: The Search UI
The frontend is deliberately simple - a single search input with live results as you type:
searchInput.addEventListener('input', (e) => {
const query = e.target.value.trim();
clearTimeout(debounceTimer);
if (!query) {
clearResults();
return;
}
// Debounce search (300ms)
debounceTimer = setTimeout(() => {
performSearch(query);
}, 300);
});
The UI includes:
- 300ms debouncing to reduce API calls
- URL parameter support (
/search?q=mozilla) - Loading states
- Error handling
Environment Variables: Public vs Private
One interesting detail: Qdrant Cloud URLs don’t need to be secret (they’re just endpoints), but API keys obviously do. We handle this by:
Public config in
wrangler.toml:[vars] QDRANT_URL = "https://xxx.us-east4-0.gcp.cloud.qdrant.io"Encrypted secrets in Cloudflare Pages dashboard:
OPENAI_API_KEYQDRANT_API_KEY
This keeps the configuration clean and secure.
Deployment Gotcha: The Port Number
One debugging adventure: Qdrant Cloud URLs work with standard HTTPS (port 443), but the Python Qdrant client examples often show :6333. When using the REST API directly (as we do in the Cloudflare Function), you should omit the port for HTTPS connections:
// ❌ Wrong
const url = "https://xxx.gcp.cloud.qdrant.io:6333"
// ✅ Correct
const url = "https://xxx.gcp.cloud.qdrant.io"
The Python client handles this automatically, but raw HTTP requests need the correct URL format.
Results
The search is now live at rc3.org/search. Try searching for:
- “browser security” - finds posts about Mozilla, IE vulnerabilities, etc.
- “open source” - surfaces relevant posts even from before that term was common
- “RSS feeds 2005” - combines semantic and temporal relevance
Response times are typically under 1 second, and the semantic understanding makes it genuinely useful for exploring 25 years of archived content.
Cost Analysis
One-time:
- OpenAI indexing: $0.09
Monthly:
- Qdrant Cloud: $0 (free tier, 1GB storage)
- Cloudflare Pages Functions: $0 (free tier, 100K requests/day)
- OpenAI query embeddings: $0 (minimal usage)
Total: $0.09 one-time + $0/month ongoing
For a personal project or small site, this is remarkably affordable.
Key Takeaways
- Semantic search is accessible - Vector databases have gotten easy to use and affordable
- Serverless is perfect for this - Cloudflare Functions handle the API with zero server management
- Don’t over-engineer - We started with pure vector search, not hybrid keyword+vector (though Qdrant supports that)
- Store full content - Dynamic excerpt generation is more flexible than pre-computed
- Watch for port numbers - REST APIs and client libraries may use different URL formats
The Code
The complete implementation is in a private GitHub repo. [Which Claude Code happily linked to, I just deleted that part of the post. –Rafe]
All of this was built with Claude Code in a few hours of iterative development. The design doc was written first, then we implemented each component, tested locally with wrangler pages dev, and deployed to production.
This post was written by Claude (Sonnet 4.5) based on the implementation work for RC3.org. The search feature was built collaboratively using Claude Code, with the human author (Rafe Colburn) providing direction and testing.