mirror of
https://github.com/lobehub/lobehub
synced 2026-04-21 17:47:27 +00:00
* update document * update documents * update auth * move * update database * move auth * move auth * update
75 lines
2.7 KiB
Text
75 lines
2.7 KiB
Text
---
|
|
title: LobeHub Knowledge Base / File Upload
|
|
description: >-
|
|
Explore LobeHub's file upload and knowledge base management features with core
|
|
components.
|
|
tags:
|
|
- LobeHub
|
|
- File Upload
|
|
- Knowledge Base
|
|
- PostgreSQL
|
|
- OpenAI Embedding
|
|
---
|
|
|
|
# Knowledge Base / File Upload
|
|
|
|
LobeHub supports file upload and knowledge base management. This feature relies on the following core technical components. Understanding these components will help you successfully deploy and maintain the knowledge base system.
|
|
|
|
## Core Components
|
|
|
|
### 1. PostgreSQL and PGVector
|
|
|
|
PostgreSQL is a powerful open-source relational database system, and PGVector is its extension for vector operations.
|
|
|
|
- **Purpose**: Store structured data and vector indexes
|
|
- **Deployment Tip**: Use the ParadeDB Docker image for quick deployment with pgvector and pg\_search plugins
|
|
|
|
Deployment script example:
|
|
|
|
```
|
|
docker run -p 5432:5432 -d --name pg -e POSTGRES_PASSWORD=mysecretpassword paradedb/paradedb:latest-pg17
|
|
```
|
|
|
|
- **Note**: Ensure sufficient resources for vector operations
|
|
|
|
### 2. S3-compatible Object Storage
|
|
|
|
S3 (or S3-compatible storage services) is used for storing uploaded files.
|
|
|
|
- **Purpose**: Store raw files
|
|
- **Options**: AWS S3, RustFS, ceph, or other S3-compatible services
|
|
- **Note**: Configure appropriate access permissions and security policies
|
|
|
|
### 3. OpenAI Embedding
|
|
|
|
OpenAI's Embedding service is used to convert text into vector representations.
|
|
|
|
<Callout type={'info'}>
|
|
LobeHub currently uses OpenAI's `text-embedding-3-small` model by default. Ensure your API Key
|
|
has access to this model.
|
|
</Callout>
|
|
|
|
- **Purpose**: Generate vector representations for semantic search
|
|
- **Notes**:
|
|
- Requires valid OpenAI API key
|
|
- Implement proper API call limits and error handling
|
|
|
|
### 4. Unstructured.io (Optional)
|
|
|
|
Unstructured.io is a powerful document processing tool.
|
|
|
|
- **Purpose**: Process complex document formats, extract structured information
|
|
- **Use Case**: Handle non-plain text formats like PDF, Word
|
|
- **Note**: Evaluate processing needs based on document complexity
|
|
|
|
By correctly configuring and integrating these core components, you can build a powerful and efficient knowledge base system for LobeHub. Each component plays a crucial role in the overall architecture, supporting advanced document management and intelligent retrieval functions.
|
|
|
|
### 5. Custom Embedding
|
|
|
|
- **Purpose**: Use different Embedding generate vector representations for semantic search
|
|
- **Options**: support model provider list: zhipu/github/openai/bedrock/ollama
|
|
- **Deployment Tip**: Used to configure the default Embedding model
|
|
|
|
```
|
|
environment: DEFAULT_FILES_CONFIG=embedding_model=openai/embedding-text-3-small
|
|
```
|