Fix UnboundLocalError in PdfToEntries.extract_text when PDF processing fails (#1292)

When PyMuPDFLoader fails to process an invalid PDF file, the exception
is caught but pdf_entry_by_pages is referenced before assignment, 
causing an UnboundLocalError.

Initialized pdf_entry_by_pages to an empty list before the try block so 
the return statement always has a valid value, even when an exception
occurs.

Verified with both invalid input (returns []) and valid PDFs (returns
extracted text).

Fixes #1289

Co-authored-by: BillionClaw <267901332+BillionClaw@users.noreply.github.com>
This commit is contained in:
BillionToken 2026-03-25 20:17:50 +08:00 committed by GitHub
parent e863126140
commit 530443a4f6
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -94,6 +94,7 @@ class PdfToEntries(TextToEntries):
@staticmethod
def extract_text(pdf_file):
"""Extract text from specified PDF files"""
pdf_entry_by_pages = []
try:
# Create temp file with .pdf extension that gets auto-deleted
with tempfile.NamedTemporaryFile(suffix=".pdf", delete=True) as tmpf: