Fix extract_from_webpage discarding pre-fetched content (#1269)

mirror of https://github.com/khoj-ai/khoj synced 2026-04-21 15:57:17 +00:00

## Summary

In `extract_from_webpage()`, the `content` parameter is unconditionally
overwritten to `None` on the line before the `is_none_or_empty(content)`
check. This means any pre-fetched content (e.g. text content already
retrieved by the Exa search engine) is always discarded, forcing an
unnecessary re-scrape of the webpage.

## Bug

```python
async def extract_from_webpage(
    url: str,
    subqueries: set[str] = None,
    content: str = None,     # <-- caller passes pre-fetched content
    ...
) -> Tuple[set[str], str, Union[None, str]]:
    content = None            # <-- BUG: immediately overwrites it
    if is_none_or_empty(content):  # always True
        content = await scrape_webpage_with_fallback(url)
```

## Fix

Remove the `content = None` assignment so the passed-in content is used
when available, falling back to scraping only when needed.

This bug was introduced in a refactor and causes:
- Wasted API calls to web scrapers for pages whose content is already
available
- Increased latency for search results that include inline content (e.g.
Exa)

Signed-off-by: JiangNan <1394485448@qq.com>

This commit is contained in:

jnMetaCode

2026-03-17 13:03:52 +08:00

committed by

GitHub

parent 6735d33af2

commit 678549c6b0

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

1 changed files with 0 additions and 1 deletions

									
										1

src/khoj/processor/tools/online_search.py
									
										View file
										
				@ -556,7 +556,6 @@ async def extract_from_webpage(

				    tracer: dict = {},

				) -> Tuple[set[str], str, Union[None, str]]:

				    # Read the web page

				    content = None

				    if is_none_or_empty(content):

				        content = await scrape_webpage_with_fallback(url)

Fix extract_from_webpage discarding pre-fetched content (#1269)

1 src/khoj/processor/tools/online_search.py Unescape Escape View file

1

src/khoj/processor/tools/online_search.py

View file