feat(vllm): add structured output support via guided decoding

Update the vLLM backend to support structured output:
- Import GuidedDecodingParams from vllm.sampling_params
- Handle JSONSchema: parse and pass as GuidedDecodingParams(json_schema=...)
- Handle json_object response format: GuidedDecodingParams(json_object=True)
- Fall back to Grammar (GBNF) via GuidedDecodingParams(grammar=...)
- Remove phantom GuidedDecoding mapping (field doesn't exist in proto)
- Fix missing 'import time' and 'import json' for load_video and schema parsing

Priority: JSONSchema > json_object > Grammar (GBNF fallback)

Ref: #6857
Signed-off-by: eureka928 <meobius123@gmail.com>
This commit is contained in:
eureka928 2026-03-06 02:28:27 +01:00
parent bbb32ac244
commit 3617e2aba4
No known key found for this signature in database
GPG key ID: B0C6F02BD1A12D8E

View file

@ -2,6 +2,7 @@
import asyncio
from concurrent import futures
import argparse
import json
import signal
import sys
import os
@ -21,7 +22,7 @@ from grpc_auth import get_auth_interceptors
from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.sampling_params import SamplingParams
from vllm.sampling_params import SamplingParams, GuidedDecodingParams
from vllm.utils import random_uuid
from vllm.transformers_utils.tokenizer import get_tokenizer
from vllm.multimodal.utils import fetch_image