Complete Tutorial: Docker Desktop + n8n + Qdrant + Embedding Auto-Ingestion Pipeline

🚀 Complete Tutorial: Docker Desktop + n8n + Qdrant + Embedding Auto-Ingestion Pipeline

A fully local AI knowledge base system you can run right away: Webhook → Embedding → n8n processing → Qdrant vector database → searchable knowledge base Including all the pitfalls I personally encountered (Webhook parsing, vector format, HTTP JSON errors, payload structure, etc.) — this is the “bug-proof” edition.

🧱 1. Overall Architecture

User Request (Webhook)
        ↓
n8n Workflow
        ↓
Embedding Model (Ollama / OpenAI)
        ↓
Set Node (standardize structure)
        ↓
HTTP Request (write to Qdrant)
        ↓
Qdrant Vector DB
        ↓
Subsequent retrieval / RAG

🐳 2. Launch Qdrant + n8n with Docker Desktop

1️⃣ docker-compose.yml

version: "3.9"

services:
  qdrant:
    image: qdrant/qdrant
    ports:
      - "6333:6333"
    volumes:
      - ./qdrant_data:/qdrant/storage

  n8n:
    image: n8nio/n8n
    ports:
      - "5678:5678"
    environment:
      - N8N_HOST=localhost
      - N8N_PORT=5678
      - N8N_PROTOCOL=http
      - NODE_ENV=production
    volumes:
      - ./n8n_data:/home/node/.n8n

Start up:

docker compose up -d

Access:

n8n: http://localhost:5678
Qdrant: http://localhost:6333

🧠 3. Create a Qdrant Collection

curl -X PUT http://localhost:6333/collections/test \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {
      "size": 768,
      "distance": "Cosine"
    }
  }'

🔌 4. n8n Workflow Design (Core)

🧩 Step 1: Webhook (Entry Point)

Node: Webhook

Method: POST
Path: /qdrant-ingest
Body Content Type: JSON (required)

Test payload:

{
  "text": "Artificial intelligence is the technology that simulates human intelligence"
}

⚠️ Common Pitfall

If the webhook output shows:

body: "{ \"text\": \"xxx\" }"

❌ This means JSON was not parsed 👉 You must enable JSON mode

🧩 Step 2: Embedding (Ollama example)

HTTP Request Node:

POST http://host.docker.internal:11434/api/embeddings

Body:

{
  "model": "nomic-embed-text",
  "prompt": "{{$json.body.text}}"
}

✔ Correct output should be:

{
  "embedding": [0.1, 0.2, 0.3, ...]
}

🧩 Step 3: Set Node (Critical structure cleanup)

👉 This is where you got stuck the longest

➤ Add Node: Set

Mode: Manual Mapping

✔ Set these fields:

① id (Number)

{{ Date.now() }}

② vector (Array)

{{ $json.embedding }}

③ payload (Object) 🔥Critical

{
  "text": "{{$node["Webhook"].json.body.text}}"
}

✔ Correct Set output structure:

{
  "id": 1700000000000,
  "vector": [0.1, 0.2, ...],
  "payload": {
    "text": "Artificial intelligence..."
  }
}

🧩 Step 4: HTTP Request (Write to Qdrant)

URL

http://qdrant:6333/collections/test/points

Method

PUT

Body (⚠️Correct format)

{
  "points": [
    {
      "id": {{$json.id}},
      "vector": {{$json.vector}},
      "payload": {{$json.payload}}
    }
  ]
}

❌ Summary of why you got errors before

① JSON Body errors

Reason:

vector / payload were not properly JSON-serialized

② payload undefined

Reason:

Webhook body wasn't parsed as JSON

③ vector is not an array

Reason:

embedding was treated as a string

🧪 5. Verify data was ingested successfully

curl http://localhost:6333/collections/test/points/scroll \
  -H "Content-Type: application/json" \
  -d '{
    "limit": 10,
    "with_payload": true,
    "with_vectors": true
  }'

🎯 6. Final stable workflow (recommended version)

Webhook
  ↓
Embedding Node (Ollama / OpenAI)
  ↓
Set Node (standardize structure)
  ↓
HTTP Request → Qdrant

🧠 7. Key lessons learned (very important)

✔ 1. n8n JSON rules

Mistake	Correct
Concatenate JSON with {{}}	Use expressions properly
vector as string	vector as array
payload as string	payload as object

✔ 2. Qdrant requirements

id: number / uuid
vector: float[]
payload: object

✔ 3. Webhook must enable JSON mode

Otherwise:

body = string ❌
body.text = undefined ❌

🚀 8. Next steps / upgrade directions

Your system is already capable of upgrading to:

🔥 Enterprise-grade AI Knowledge Base

RAG-based Q&A retrieval
Multi-collection classification (doc / chat / blog)
Automatic chunk splitting
Deduplication on write
Vector update strategies
Switching between multiple Embedding models
LangChain / Flowise integration

If you want to keep upgrading, we can continue with 👇

👉 n8n + Qdrant + RAG Q&A system (production-ready)

Including:

Chat UI (Web)
Vector retrieval
LLM responses (Ollama / GPT)
Multi-turn conversation memory
Knowledge base tiering

This can directly become an AI SaaS MVP.