mirror of
https://github.com/khoj-ai/khoj
synced 2026-04-21 15:57:17 +00:00
Fix UnboundLocalError in PdfToEntries.extract_text when PDF processing fails (#1292)
When PyMuPDFLoader fails to process an invalid PDF file, the exception is caught but pdf_entry_by_pages is referenced before assignment, causing an UnboundLocalError. Initialized pdf_entry_by_pages to an empty list before the try block so the return statement always has a valid value, even when an exception occurs. Verified with both invalid input (returns []) and valid PDFs (returns extracted text). Fixes #1289 Co-authored-by: BillionClaw <267901332+BillionClaw@users.noreply.github.com>
This commit is contained in:
parent
e863126140
commit
530443a4f6
1 changed files with 1 additions and 0 deletions
|
|
@ -94,6 +94,7 @@ class PdfToEntries(TextToEntries):
|
|||
@staticmethod
|
||||
def extract_text(pdf_file):
|
||||
"""Extract text from specified PDF files"""
|
||||
pdf_entry_by_pages = []
|
||||
try:
|
||||
# Create temp file with .pdf extension that gets auto-deleted
|
||||
with tempfile.NamedTemporaryFile(suffix=".pdf", delete=True) as tmpf:
|
||||
|
|
|
|||
Loading…
Reference in a new issue