What is Albert Folch's current role?

Albert Folch is currently an AI Product Manager at CELO, where he pioneered the AI function from scratch, built the department, and leads end-to-end AI strategy. He has trained 100+ employees and delivered multiple 0-to-1 AI products.

What is Albert Folch's background?

Albert Folch has a unique journey: Materials Engineering (Bachelor's & Master's) → Application Engineer at GPAINNOVA/DLyte → Product Engineer at HP 3D Printing → Technical Product Manager at Swatch Group → AI Product Manager at CELO.

What does Albert Folch specialize in?

Albert specializes in Technical Product Management, 0-to-1 AI Product Development, Enterprise AI Strategy, Multi-Agent Systems (CrewAI), and building transformative products where AI delivers real business value.

Where is Albert Folch based?

Albert Folch is based in Barcelona, Spain.

How I Built My 'Ask AI' Web Assistant

Overview

The "Ask AI" feature on my personal website lets visitors ask questions about my career, projects, and tools using natural language. Behind the scenes, it's powered by a RAG (Retrieval-Augmented Generation) pipeline that combines semantic search with LLM reasoning. Here's how I built it.

1. RAG Pipeline

Content Processing

The RAG pipeline starts with processing various content types from my website:

Projects: MDX files from src/content/projects/ plus metadata from src/lib/data.ts
Journey: Career milestones from src/content/journey/ combined into a single timeline
Tools: AI tools documentation from src/content/tools/tools.mdx
Blog: Blog posts from src/content/blog/
CV & For-Agents: Additional structured content

Each content type is processed by dedicated functions (processProjects(), processJourney(), etc.) that:

Read MDX files and extract frontmatter
Combine metadata with content
Generate clean URLs for each piece of content
Create ContentChunk objects with structured metadata

Chunking Strategy

Content is split into semantic chunks using a sentence-based approach:

function generateChunks(text: string, maxLength: number = 1000): string[] {
  // If text is short enough, return as single chunk
  if (text.length <= maxLength) {
    return text.length >= 20 ? [text.trim()] : [];
  }

  // Split by sentences and group into chunks
  const sentences = text.split(/[.!?]\s+/);
  const chunks: string[] = [];
  let currentChunk = "";

  for (const sentence of sentences) {
    if ((currentChunk + sentence).length > maxLength && currentChunk) {
      chunks.push(currentChunk.trim());
      currentChunk = sentence;
    } else {
      currentChunk += (currentChunk ? ". " : "") + sentence;
    }
  }

  if (currentChunk) {
    chunks.push(currentChunk.trim());
  }

  return chunks.filter((chunk) => chunk.length >= 20);
}

This ensures chunks are semantically coherent (complete sentences) while staying within token limits.

Embedding Generation

Chunks are embedded using OpenAI's text-embedding-3-small (1536 dimensions) or via OpenRouter for flexibility. The embedding script:

Generates embeddings for all sub-chunks in batches
Validates embedding dimensions match expected size (1536)
Stores embeddings in PostgreSQL using pgvector

Vector Database Schema

I use Neon (serverless Postgres) with the pgvector extension. The schema includes:

docs.page: Stores page metadata (path, type, meta JSON, last_refresh)
docs.page_section: Stores content chunks with their embeddings (vector(1536))

The similarity search function uses cosine similarity (dot product on normalized vectors):

create or replace function "docs"."match_page_sections"(
  embedding vector(1536), 
  match_threshold float, 
  match_count int, 
  min_content_length int
)
returns table (...)
language plpgsql
as $$
begin
  return query
  select
    ps.id,
    ps.content,
    (ps.embedding <#> embedding) * -1 as similarity,
    ps.url,
    ps.content_type
  from docs.page_section ps
  where length(ps.content) >= min_content_length
    and (ps.embedding <#> embedding) * -1 > match_threshold
  order by ps.embedding <#> embedding
  limit match_count;
end;
$$;

Retrieval Logic

When a user asks a question, the system:

Generates query embedding: Converts the user's question into a 1536-dimensional vector
Semantic search: Finds top-k most similar chunks using cosine similarity
Content-type prioritization: Re-ranks results based on query intent (career questions prioritize journey content, contact questions prioritize for-agents content)
Deduplication: Removes duplicate URLs while keeping multiple sections for journey/CV content
Context assembly: Formats retrieved chunks with headings and URLs for the LLM

2. LLM Prompting and Model Configuration

Model Selection

The system supports both OpenAI and OpenRouter providers, configurable via environment variables:

function getLLMProvider() {
  const provider = process.env.LLM_PROVIDER || "openrouter";
  const model = process.env.LLM_MODEL || "google/gemini-2.5-flash";

  if (provider === "openrouter") {
    const openrouter = createOpenRouter({ 
      apiKey: process.env.OPENROUTER_API_KEY,
      headers: {
        'HTTP-Referer': 'https://folch.ai',
        'X-Title': 'folch.ai',
      },
    });
    return { model: openrouter.chat(model) };
  }

  return { model: openai(model) };
}

Default model is google/gemini-2.5-flash via OpenRouter for cost-effectiveness while maintaining quality.

System Prompt Architecture

The system prompt is structured with clear sections:

Role Definition: Establishes the assistant's purpose (helping users learn about Albert's background)

Security Boundaries: Prevents prompt injection, code execution, and information leakage

Retrieval Strategy: Forces the LLM to always call getInformation tool before answering

Use Cases: Defines primary scenarios (career questions, tools, projects, contact info) with specific guidance

Response Guidelines:

Concise, non-repetitive answers
Structured with bullet points
Numbered source references [1], [2]
Sources section at the end with descriptive links

Query Construction: Guides the LLM on how to construct effective semantic search queries (2-4 key concepts, 10-50 words, natural language)

Tool-Based RAG

The LLM uses a getInformation tool that:

Takes a natural language query (constructed by the LLM from user's question)
Calls findRelevantContent() to retrieve top-k chunks
Formats context with headings and URLs
Returns formatted context for the LLM to synthesize

The tool description includes query construction examples to guide the LLM:

EXAMPLES:
- User: "what's albert's background?" → Query: "Albert background career journey professional history"
- User: "journey?" → Query: "Albert journey career timeline professional path"
- User: "contact?" → Query: "Albert contact information email LinkedIn GitHub"

Query Rewriting

Before retrieval, queries go through a rewriting step that:

Classifies intent: Detects career, contact, project, or tool queries
Adds context: Incorporates conversation history for follow-up questions
Expands abbreviations: Converts "journey?" to "Albert journey career timeline professional path"

This improves retrieval quality, especially for short or ambiguous queries.

Response Formatting

The LLM formats responses with:

Inline references: [1], [2] throughout the answer
Sources section: Bullet list with descriptive links like 🔗 1. [Professional Journey](https://folch.ai/journey)
URL deduplication: Same URL reused with same number

3. GitHub Workflows

Automated Embedding Generation

A GitHub Actions workflow automatically regenerates embeddings when content changes:

name: 'generate_embeddings'

on:
  workflow_dispatch:
  push:
    branches:
      - main
      - development
    paths:
      - 'src/lib/data.ts'
      - 'src/content/projects/**'
      - 'src/content/journey/**'
      - 'src/content/tools/**'
      - 'src/content/blog/**'
      - 'src/content/cv.mdx'
      - 'src/content/for-agents.mdx'
      - 'src/app/sitemap.ts'
      - 'scripts/generate-embeddings.ts'

jobs:
  generate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - name: Install dependencies
        run: npm ci
      - name: Generate embeddings
        env:
          NEON_DATABASE_URL: ${{ secrets.NEON_DATABASE_URL }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
          EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
        run: |
          npx tsx scripts/generate-embeddings.ts

Workflow Behavior

Triggers: Runs on push to main/development when content files change, or manually via workflow_dispatch
Cleanup: Deletes all existing embeddings before regenerating (ensures consistency)
Processing: Processes all content types sequentially
Validation: Validates embedding dimensions and logs statistics

This ensures the knowledge base stays in sync with website content without manual intervention.

Key Learnings

Chunking matters: Sentence-based chunking preserves semantic coherence better than fixed-size windows
Query rewriting improves retrieval: Even simple intent classification and context expansion significantly improves results
Content-type prioritization: Re-ranking by content type based on query intent improves answer quality
Tool-based RAG: Using tools forces the LLM to retrieve before answering, preventing hallucinations
Automation is essential: GitHub workflows eliminate manual embedding updates and keep the system current

Future Improvements

Hybrid search: Combine semantic search with keyword matching for better precision
Query expansion: Use LLM to expand queries before retrieval
Conversation memory: Store conversation context in database for better follow-ups
Analytics: Track query patterns and retrieval quality to optimize thresholds

The Ask AI feature demonstrates how RAG can make personal websites more interactive and informative. By combining semantic search with structured prompting, it provides accurate, contextual answers while maintaining security boundaries.