lobehub/docs/self-hosting/advanced/online-search.mdx

---
title: >-
  Configuring Online Search Functionality - Enhancing AI's Ability to Access Web
  Information
description: >-
  Learn how to configure the SearXNG online search functionality for LobeHub,
  enabling AI to access the latest web information.
tags:
  - Online Search
  - SearXNG
  - Web Information
  - AI Enhancement
---

# Configuring Online Search Functionality

LobeHub supports configuring **web search functionality** for AI, enabling it to retrieve real-time information from the internet to provide more accurate and up-to-date responses. Web search supports multiple search engine providers, including [SearXNG](https://github.com/searxng/searxng), [Search1API](https://www.search1api.com), [Google](https://programmablesearchengine.google.com), and [Brave](https://brave.com/search/api), among others.

<Callout type="info">
  Web search allows AI to access time-sensitive content, such as the latest news, technology trends,
  or product information. You can deploy the open-source SearXNG yourself, or choose to integrate
  mainstream search services like Search1API, Google, Brave, etc., combining them freely based on
  your use case.
</Callout>

By setting the search service environment variable `SEARCH_PROVIDERS` and the corresponding API Keys, LobeHub will query multiple sources and return the results. You can also configure crawler service environment variables such as `CRAWLER_IMPLS` (e.g., `browserless`, `firecrawl`, `tavily`, etc.) to extract webpage content, enhancing the capability of search + reading.

# Core Environment Variables

## `CRAWLER_IMPLS`

Configure available web crawlers for structured extraction of webpage content.

```env
CRAWLER_IMPLS="naive,search1api"
```

Supported crawler types are listed below:

| Value         | Description                                                                                                         | Environment Variable                                                        |
| ------------- | ------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `browserless` | Headless browser crawler based on [Browserless](https://www.browserless.io/), suitable for rendering complex pages. | `BROWSERLESS_TOKEN`                                                         |
| `exa`         | Crawler capabilities provided by [Exa](https://exa.ai/), API required.                                              | `EXA_API_KEY`                                                               |
| `firecrawl`   | [Firecrawl](https://firecrawl.dev/) headless browser API, ideal for modern websites.                                | `FIRECRAWL_API_KEY`                                                         |
| `jina`        | Crawler service from [Jina AI](https://jina.ai/), supports fast content summarization.                              | `JINA_READER_API_KEY`                                                       |
| `naive`       | Built-in general-purpose crawler for standard web structures.                                                       |                                                                             |
| `search1api`  | Page crawling capabilities from [Search1API](https://www.search1api.com), great for structured content extraction.  | `SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY` |
| `tavily`      | Web scraping and summarization API from [Tavily](https://www.tavily.com/).                                          | `TAVILY_API_KEY`                                                            |

> 💡 Setting multiple crawlers increases success rate; the system will try different ones based on priority.

---

## `CRAWL_CONCURRENCY`

Controls crawler concurrency per crawl task. The default is `3`. On low-resource servers, use `1` to reduce CPU spikes.

```env
CRAWL_CONCURRENCY=3
```

## `CRAWLER_RETRY`

Controls retry attempts per URL on crawl failures. The default is `1` (up to 2 attempts total).

```env
CRAWLER_RETRY=1
```

---

## `SEARCH_PROVIDERS`

Configure which search engine providers to use for web search.

```env
SEARCH_PROVIDERS="searxng"
```

Supported search engines include:

| Value        | Description                                                                             | Environment Variable                                                        |
| ------------ | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| `anspire`    | Search service provided by [Anspire](https://anspire.ai/).                              | `ANSPIRE_API_KEY`                                                           |
| `bocha`      | Search service from [Bocha](https://open.bochaai.com/).                                 | `BOCHA_API_KEY`                                                             |
| `brave`      | [Brave](https://search.brave.com/help/api), a privacy-friendly search source.           | `BRAVE_API_KEY`                                                             |
| `exa`        | [Exa](https://exa.ai/), a search API designed for AI.                                   | `EXA_API_KEY`                                                               |
| `firecrawl`  | Search capabilities via [Firecrawl](https://firecrawl.dev/).                            | `FIRECRAWL_API_KEY`                                                         |
| `google`     | Uses [Google Programmable Search Engine](https://programmablesearchengine.google.com/). | `GOOGLE_PSE_API_KEY` `GOOGLE_PSE_ENGINE_ID`                                 |
| `jina`       | Semantic search provided by [Jina AI](https://jina.ai/).                                | `JINA_READER_API_KEY`                                                       |
| `kagi`       | Premium search API by [Kagi](https://kagi.com/), requires a subscription key.           | `KAGI_API_KEY`                                                              |
| `search1api` | Aggregated search capabilities from [Search1API](https://www.search1api.com).           | `SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY` |
| `searxng`    | Use a self-hosted or public [SearXNG](https://searx.space/) instance.                   | `SEARXNG_URL`                                                               |
| `tavily`     | [Tavily](https://www.tavily.com/), offers fast web summaries and answers.               | `TAVILY_API_KEY`                                                            |

> ⚠️ Some search providers require you to apply for an API Key and configure it in your `.env` file.

---

## `BROWSERLESS_URL`

Specifies the API endpoint for [Browserless](https://www.browserless.io/), used for web crawling tasks. Browserless is a browser automation platform based on Headless Chrome, ideal for rendering dynamic pages.

```env
BROWSERLESS_URL=https://chrome.browserless.io
```

> 📌 Usually used together with `CRAWLER_IMPLS=browserless`.

---

## `BROWSERLESS_BLOCK_ADS`

Enables ad blocking functionality. When using [Browserless](https://www.browserless.io/) for web scraping, it automatically blocks common ad resources (such as scripts, images, trackers, etc.), improving scraping speed and page clarity.

```env
BROWSERLESS_BLOCK_ADS=1
```

> 📌 Supported values:
>
> - `1`: Enable ad blocking (recommended);
> - `0`: Disable ad blocking (default).

> ✅ It is recommended to use with `BROWSERLESS_STEALTH_MODE=1` to enhance stealth and scraping success rate.

---

## `BROWSERLESS_STEALTH_MODE`

Enables stealth mode. When using [Browserless](https://www.browserless.io/) for web scraping, it applies various anti-detection techniques (such as modifying the user agent, removing webdriver traits, simulating user interactions) to bypass anti-bot mechanisms.

```env
BROWSERLESS_STEALTH_MODE=1
```

> 📌 Supported values:
>
> - `1`: Enable stealth mode (recommended);
> - `0`: Disable stealth mode (default).

> ⚠️ Some websites use advanced anti-scraping techniques. Enabling stealth mode can significantly improve scraping success rate.

---

## `GOOGLE_PSE_ENGINE_ID`

Configure the Search Engine ID for Google Programmable Search Engine (Google PSE), used to restrict the search scope. Must be used alongside `GOOGLE_PSE_API_KEY`.

```env
GOOGLE_PSE_ENGINE_ID=your-google-cx-id
```

> 🔑 How to get it: Visit [programmablesearchengine.google.com](https://programmablesearchengine.google.com/), create a search engine, and obtain the `cx` parameter.

---

## `FIRECRAWL_URL`

Sets the access URL for the [Firecrawl](https://firecrawl.dev/) API, used for web content scraping. Default value:

```env
FIRECRAWL_URL=https://api.firecrawl.dev/v2
```

> ⚙️ Usually does not need to be changed unless you’re using a self-hosted version or a proxy service.

---

## `TAVILY_SEARCH_DEPTH`

Configure the result depth for [Tavily](https://www.tavily.com/) searches.

```env
TAVILY_SEARCH_DEPTH=basic
```

Supported values:

- `basic`: Fast search, returns brief results;
- `advanced`: Deep search, returns more context and web page details.

---

## `TAVILY_EXTRACT_DEPTH`

Configure how deeply Tavily extracts content from web pages.

```env
TAVILY_EXTRACT_DEPTH=basic
```

Supported values:

- `basic`: Extracts basic info like title and content summary;
- `advanced`: Extracts structured data, lists, charts, and more from web pages.

---

## `SEARXNG_URL`

The URL of the SearXNG instance, which is a necessary configuration to enable the online search functionality. For example:

```shell
SEARXNG_URL=https://searxng-instance.com
```

This URL should point to a functional SearXNG instance. You can choose to self-host SearXNG or use a publicly available SearXNG instance.

You can find publicly available SearXNG instances in the [SearXNG instance list](https://searx.space/). Choose an instance that is fast and reliable, and then configure its URL in LobeHub.

> Note that the `searxng` you use must have `json` output enabled; otherwise, the `lobehub` call will result in an error. If self-hosting, find the `searxng` configuration file and add `json` as shown below.

```bash
$ vi searxng/settings.yml
...
search:
formats:
- html
- json
```

# Verifying Online Search Functionality

After configuration, you can verify whether the online search functionality is working correctly by following these steps:

1. Restart the LobeHub service.
2. Start a new chat session, enable smart online search, and then ask AI a question that requires the latest information, such as "What is the current gold price today?" or "What are the latest major news stories?"
3. Observe whether AI can return the latest information based on internet searches.

If AI can answer these time-sensitive questions, it indicates that the online search functionality has been successfully configured.

## References

- [LobeHub Online Search RFC Discussion](https://github.com/lobehub/lobehub/discussions/6447)
- [SearXNG GitHub Repository](https://github.com/searxng/searxng)
- [Discussion on Enabling JSON Output for SearXNG](https://github.com/searxng/searxng/discussions/3542)