Complete Tutorial: Docker Desktop + n8n + Qdrant + Embedding Auto-Ingestion Pipeline


๐Ÿš€ Complete Tutorial: Docker Desktop + n8n + Qdrant + Embedding Auto-Ingestion Pipeline

A fully local AI knowledge base system you can run right away: Webhook โ†’ Embedding โ†’ n8n processing โ†’ Qdrant vector database โ†’ searchable knowledge base Including all the pitfalls I personally encountered (Webhook parsing, vector format, HTTP JSON errors, payload structure, etc.) โ€” this is the “bug-proof” edition.


๐Ÿงฑ 1. Overall Architecture

User Request (Webhook)
        โ†“
n8n Workflow
        โ†“
Embedding Model (Ollama / OpenAI)
        โ†“
Set Node (standardize structure)
        โ†“
HTTP Request (write to Qdrant)
        โ†“
Qdrant Vector DB
        โ†“
Subsequent retrieval / RAG

๐Ÿณ 2. Launch Qdrant + n8n with Docker Desktop

1๏ธโƒฃ docker-compose.yml

version: "3.9"

services:
  qdrant:
    image: qdrant/qdrant
    ports:
      - "6333:6333"
    volumes:
      - ./qdrant_data:/qdrant/storage

  n8n:
    image: n8nio/n8n
    ports:
      - "5678:5678"
    environment:
      - N8N_HOST=localhost
      - N8N_PORT=5678
      - N8N_PROTOCOL=http
      - NODE_ENV=production
    volumes:
      - ./n8n_data:/home/node/.n8n

Start up:

docker compose up -d

Access:


๐Ÿง  3. Create a Qdrant Collection

curl -X PUT http://localhost:6333/collections/test \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {
      "size": 768,
      "distance": "Cosine"
    }
  }'

๐Ÿ”Œ 4. n8n Workflow Design (Core)


๐Ÿงฉ Step 1: Webhook (Entry Point)

Node: Webhook

  • Method: POST
  • Path: /qdrant-ingest
  • Body Content Type: JSON (required)

Test payload:

{
  "text": "Artificial intelligence is the technology that simulates human intelligence"
}

โš ๏ธ Common Pitfall

If the webhook output shows:

body: "{ \"text\": \"xxx\" }"

โŒ This means JSON was not parsed ๐Ÿ‘‰ You must enable JSON mode


๐Ÿงฉ Step 2: Embedding (Ollama example)

HTTP Request Node:

POST http://host.docker.internal:11434/api/embeddings

Body:

{
  "model": "nomic-embed-text",
  "prompt": "{{$json.body.text}}"
}

โœ” Correct output should be:

{
  "embedding": [0.1, 0.2, 0.3, ...]
}

๐Ÿงฉ Step 3: Set Node (Critical structure cleanup)

๐Ÿ‘‰ This is where you got stuck the longest


โžค Add Node: Set

Mode: Manual Mapping


โœ” Set these fields:

โ‘  id (Number)

{{ Date.now() }}

โ‘ก vector (Array)

{{ $json.embedding }}

โ‘ข payload (Object) ๐Ÿ”ฅCritical

{
  "text": "{{$node["Webhook"].json.body.text}}"
}

โœ” Correct Set output structure:

{
  "id": 1700000000000,
  "vector": [0.1, 0.2, ...],
  "payload": {
    "text": "Artificial intelligence..."
  }
}

๐Ÿงฉ Step 4: HTTP Request (Write to Qdrant)


URL

http://qdrant:6333/collections/test/points

Method

PUT

Body (โš ๏ธCorrect format)

{
  "points": [
    {
      "id": {{$json.id}},
      "vector": {{$json.vector}},
      "payload": {{$json.payload}}
    }
  ]
}

โŒ Summary of why you got errors before

โ‘  JSON Body errors

Reason:

vector / payload were not properly JSON-serialized

โ‘ก payload undefined

Reason:

Webhook body wasn't parsed as JSON

โ‘ข vector is not an array

Reason:

embedding was treated as a string

๐Ÿงช 5. Verify data was ingested successfully

curl http://localhost:6333/collections/test/points/scroll \
  -H "Content-Type: application/json" \
  -d '{
    "limit": 10,
    "with_payload": true,
    "with_vectors": true
  }'

๐ŸŽฏ 6. Final stable workflow (recommended version)

Webhook
  โ†“
Embedding Node (Ollama / OpenAI)
  โ†“
Set Node (standardize structure)
  โ†“
HTTP Request โ†’ Qdrant

๐Ÿง  7. Key lessons learned (very important)

โœ” 1. n8n JSON rules

MistakeCorrect
Concatenate JSON with {{}}Use expressions properly
vector as stringvector as array
payload as stringpayload as object

โœ” 2. Qdrant requirements

id: number / uuid
vector: float[]
payload: object

โœ” 3. Webhook must enable JSON mode

Otherwise:

body = string โŒ
body.text = undefined โŒ

๐Ÿš€ 8. Next steps / upgrade directions

Your system is already capable of upgrading to:

๐Ÿ”ฅ Enterprise-grade AI Knowledge Base

  • RAG-based Q&A retrieval
  • Multi-collection classification (doc / chat / blog)
  • Automatic chunk splitting
  • Deduplication on write
  • Vector update strategies
  • Switching between multiple Embedding models
  • LangChain / Flowise integration

If you want to keep upgrading, we can continue with ๐Ÿ‘‡

๐Ÿ‘‰ n8n + Qdrant + RAG Q&A system (production-ready)

Including:

  • Chat UI (Web)
  • Vector retrieval
  • LLM responses (Ollama / GPT)
  • Multi-turn conversation memory
  • Knowledge base tiering

This can directly become an AI SaaS MVP.