Fix UnboundLocalError in PdfToEntries.extract_text when PDF processing fails (#1292)

When PyMuPDFLoader fails to process an invalid PDF file, the exception is caught but pdf_entry_by_pages is referenced before assignment, causing an UnboundLocalError. Initialized pdf_entry_by_pages to an empty list before the try block so the return statement always has a valid value, even when an exception occurs. Verified with both invalid input (returns []) and valid PDFs (returns extracted text). Fixes #1289 Co-authored-by: BillionClaw <267901332+BillionClaw@users.noreply.github.com>
2026-04-21 15:57:17 +00:00 · 2026-03-25 20:17:50 +08:00 · 2026-03-25 20:17:50 +08:00 · 530443a4f6
commit 530443a4f6
parent e863126140
1 changed files with 1 additions and 0 deletions
--- a/src/khoj/processor/content/pdf/pdf_to_entries.py
+++ b/src/khoj/processor/content/pdf/pdf_to_entries.py
@ -94,6 +94,7 @@ class PdfToEntries(TextToEntries):
    @staticmethod
    def extract_text(pdf_file):
        """Extract text from specified PDF files"""
+        pdf_entry_by_pages = []
        try:
            # Create temp file with .pdf extension that gets auto-deleted
            with tempfile.NamedTemporaryFile(suffix=".pdf", delete=True) as tmpf: