Fix extract_from_webpage discarding pre-fetched content (#1269)

## Summary

In `extract_from_webpage()`, the `content` parameter is unconditionally
overwritten to `None` on the line before the `is_none_or_empty(content)`
check. This means any pre-fetched content (e.g. text content already
retrieved by the Exa search engine) is always discarded, forcing an
unnecessary re-scrape of the webpage.

## Bug

```python
async def extract_from_webpage(
    url: str,
    subqueries: set[str] = None,
    content: str = None,     # <-- caller passes pre-fetched content
    ...
) -> Tuple[set[str], str, Union[None, str]]:
    content = None            # <-- BUG: immediately overwrites it
    if is_none_or_empty(content):  # always True
        content = await scrape_webpage_with_fallback(url)
```

## Fix

Remove the `content = None` assignment so the passed-in content is used
when available, falling back to scraping only when needed.

This bug was introduced in a refactor and causes:
- Wasted API calls to web scrapers for pages whose content is already
available
- Increased latency for search results that include inline content (e.g.
Exa)

Signed-off-by: JiangNan <1394485448@qq.com>
This commit is contained in:
jnMetaCode 2026-03-17 13:03:52 +08:00 committed by GitHub
parent 6735d33af2
commit 678549c6b0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -556,7 +556,6 @@ async def extract_from_webpage(
tracer: dict = {},
) -> Tuple[set[str], str, Union[None, str]]:
# Read the web page
content = None
if is_none_or_empty(content):
content = await scrape_webpage_with_fallback(url)