unsloth/studio
Wasim Yousef Said dd283b0605
feat(studio): multi-file unstructured seed upload with better backend extraction (#4468)
* fix(recipe-studio): prevent fitView from zooming to wrong location on recipe load

* feat: add pymupdf/python-docx deps and unstructured uploads storage root

* feat: add POST /seed/upload-unstructured-file endpoint

* feat: add multi-file chunking with source_file column

* feat: update frontend types and API layer for multi-file upload

* feat: round-robin preview rows across source files

Ensures every uploaded file is represented in the preview table
by cycling through sources instead of just taking the first N rows.

* fix: disable OCR, fix auto-load timing, fix persistence on reload

- Disable pymupdf4llm OCR with write_images=False, show_progress=False
- Replace onAllUploaded callback with useEffect that detects uploading→done
  transition (avoids stale closure reading empty file IDs)
- Fix importer to preserve file IDs from saved recipes instead of clearing
  (clearing only happens at share time via sanitizeSeedForShare)

* fix: harden unstructured upload with input validation and state fixes

Validate block_id/file_id with alphanumeric regex to prevent path
traversal, use exact stem match for file deletion, add error handling
for metadata writes and empty files, fix React stale closures and
object mutations in upload loop, and correct validation logic for
unstructured seed resolved_paths.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: address PR review - legacy path import, share sanitizer, sync effect

Promote legacy source.path into resolved_paths for old unstructured
recipes, clear source.paths in share sanitizer to prevent leaking local
filesystem paths, and gate file sync effect to dialog open transition
so users can actually delete all uploaded files.

* fix: CSV column fix (BOM + whitespace + unnamed index re-save) for #4470

* fix: harden unstructured upload flow and polish dialog UX

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-03-20 13:22:42 -07:00
..
backend feat(studio): multi-file unstructured seed upload with better backend extraction (#4468) 2026-03-20 13:22:42 -07:00
frontend feat(studio): multi-file unstructured seed upload with better backend extraction (#4468) 2026-03-20 13:22:42 -07:00
__init__.py Final cleanup 2026-03-12 18:28:04 +00:00
install_python_stack.py Combine studio setup fixes: frontend caching, venv isolation, Windows CPU support (#4413) 2026-03-18 03:52:25 -07:00
LICENSE.AGPL-3.0 Add AGPL-3.0 license to studio folder 2026-03-09 19:36:25 +00:00
setup.bat Final cleanup 2026-03-12 18:28:04 +00:00
setup.ps1 Fix Install commands for Windows + 1 line installs (#4447) 2026-03-19 02:09:09 -07:00
setup.sh Fix Install commands for Windows + 1 line installs (#4447) 2026-03-19 02:09:09 -07:00
Unsloth_Studio_Colab.ipynb Update Unsloth_Studio_Colab.ipynb 2026-03-17 15:42:38 -07:00