lobehub/docs/self-hosting/advanced/knowledge-base.mdx

---
title: LobeHub Knowledge Base / File Upload
description: >-
  Explore LobeHub's file upload and knowledge base management features with core
  components.
tags:
  - LobeHub
  - File Upload
  - Knowledge Base
  - PostgreSQL
  - OpenAI Embedding
---

# Knowledge Base / File Upload

LobeHub supports file upload and knowledge base management. This feature relies on the following core technical components. Understanding these components will help you successfully deploy and maintain the knowledge base system.

## Core Components

### 1. PostgreSQL and PGVector

PostgreSQL is a powerful open-source relational database system, and PGVector is its extension for vector operations.

- **Purpose**: Store structured data and vector indexes
- **Deployment Tip**: Use the ParadeDB Docker image for quick deployment with pgvector and pg\_search plugins

Deployment script example:

```
docker run -p 5432:5432 -d --name pg -e POSTGRES_PASSWORD=mysecretpassword paradedb/paradedb:latest-pg17
```

- **Note**: Ensure sufficient resources for vector operations

### 2. S3-compatible Object Storage

S3 (or S3-compatible storage services) is used for storing uploaded files.

- **Purpose**: Store raw files
- **Options**: AWS S3, RustFS, ceph, or other S3-compatible services
- **Note**: Configure appropriate access permissions and security policies

### 3. OpenAI Embedding

OpenAI's Embedding service is used to convert text into vector representations.

<Callout type={'info'}>
  LobeHub currently uses OpenAI's `text-embedding-3-small` model by default. Ensure your API Key
  has access to this model.
</Callout>

- **Purpose**: Generate vector representations for semantic search
- **Notes**:
  - Requires valid OpenAI API key
  - Implement proper API call limits and error handling

### 4. Unstructured.io (Optional)

Unstructured.io is a powerful document processing tool.

- **Purpose**: Process complex document formats, extract structured information
- **Use Case**: Handle non-plain text formats like PDF, Word
- **Note**: Evaluate processing needs based on document complexity

By correctly configuring and integrating these core components, you can build a powerful and efficient knowledge base system for LobeHub. Each component plays a crucial role in the overall architecture, supporting advanced document management and intelligent retrieval functions.

### 5. Custom Embedding

- **Purpose**: Use different Embedding generate vector representations for semantic search
- **Options**: support model provider list: zhipu/github/openai/bedrock/ollama
- **Deployment Tip**: Used to configure the default Embedding model

```
environment: DEFAULT_FILES_CONFIG=embedding_model=openai/embedding-text-3-small
```