当前位置：首页>学习笔记>Spring AI学习笔记十九:聊天模型OpenAI之在OpenAI兼容服务器上使用额外参数

Spring AI学习笔记十九:聊天模型OpenAI之在OpenAI兼容服务器上使用额外参数

2026-02-03 12:49:08

与OpenAI兼容的推理服务器（如vLLM、Ollama等）通常支持OpenAI标准API中定义的参数之外的其他参数。例如，这些服务器可能接受诸如top_k、repetition_penalty或官方OpenAI API无法识别的其他采样控件之类的参数。

配置属性

您可以使用Spring Boot属性配置额外的参数。spring.ai.openai.chat.options.extra-body下的每个属性都将成为请求中的顶级参数：

spring.ai.openai.base-url=http://localhost:8000/v1spring.ai.openai.chat.options.model=meta-llama/Llama-3-8B-Instructspring.ai.openai.chat.options.temperature=0.7spring.ai.openai.chat.options.extra-body.top_k=50spring.ai.openai.chat.options.extra-body.repetition_penalty=1.1

此配置将生成一个JSON请求，如：

{  "model": "meta-llama/Llama-3-8B-Instruct",  "temperature": 0.7,  "top_k": 50,  "repetition_penalty": 1.1,  "messages": [...]}

使用Builder进行配置

您还可以使用选项构建器在运行时指定额外的参数：

ChatResponse response = chatModel.call(    new Prompt(        "Tell me a creative story",        OpenAiChatOptions.builder()            .model("meta-llama/Llama-3-8B-Instruct")            .temperature(0.7)            .extraBody(Map.of(                "top_k", 50,                "repetition_penalty", 1.1,                "frequency_penalty", 0.5            ))            .build()    ));

示例：vLLM服务器

当使用Llama模型运行vLLM时，您可能需要使用特定于vLLM的采样参数：

spring.ai.openai.base-url=http://localhost:8000/v1spring.ai.openai.chat.options.model=meta-llama/Llama-3-70B-Instructspring.ai.openai.chat.options.extra-body.top_k=40spring.ai.openai.chat.options.extra-body.top_p=0.95spring.ai.openai.chat.options.extra-body.repetition_penalty=1.05spring.ai.openai.chat.options.extra-body.min_p=0.05

示例：Ollama服务器

通过OpenAI兼容端点使用Ollama时，可以传递Ollama特定的参数：

OpenAiChatOptions options = OpenAiChatOptions.builder()    .model("llama3.2")    .extraBody(Map.of(        "num_predict", 100,        "top_k", 40,        "repeat_penalty", 1.1    ))    .build();ChatResponse response = chatModel.call(new Prompt("Generate text", options));

推理模型的推理内容

一些支持推理模型的OpenAI兼容服务器（如DeepSeek R1，具有推理解析器的vLLM）通过API响应中的reasoning_content字段公开模型的内部思想链。此字段包含模型用于得出最终答案的逐步推理过程。

Spring AI将此字段从JSON响应映射到AssistantMessage元数据中的reasoningContent键。

reasoning_content可用性的重要区别：

OpenAI兼容服务器（DeepSeek，vLLM）：在聊天完成API响应中暴露reasoning_content
官方OpenAI模型（GPT-5，o 1，o3）：在聊天完成API响应中不暴露推理文本

官方OpenAI推理模型在使用聊天完成API时隐藏思想链内容。它们只在使用统计中暴露reasoning_tokens count。要从官方OpenAI模型访问实际的推理文本，您必须使用OpenAI的响应API（此客户端当前不支持的单独端点）。

回退行为：当服务器没有提供reasoning_content时（例如，官方OpenAI聊天完成），reasoningContent元数据字段将是空字符串。

推理内容

当使用兼容的服务器时，您可以从响应元数据访问推理内容。

直接使用ChatModel：

// Configure to use DeepSeek R1 or vLLM with a reasoning modelChatResponse response = chatModel.call(    new Prompt("Which number is larger: 9.11 or 9.8?"));// Get the assistant messageAssistantMessage message = response.getResult().getOutput();// Access the reasoning content from metadataString reasoning = message.getMetadata().get("reasoningContent");if (reasoning != null && !reasoning.isEmpty()) {    System.out.println("Model's reasoning process:");    System.out.println(reasoning);}// The final answer is in the regular contentSystem.out.println("\nFinal answer:");System.out.println(message.getContent());

使用ChatClient：

ChatClient chatClient = ChatClient.create(chatModel);String result = chatClient.prompt()    .user("Which number is larger: 9.11 or 9.8?")    .call()    .chatResponse()    .getResult()    .getOutput()    .getContent();// To access reasoning content with ChatClient, retrieve the full responseChatResponse response = chatClient.prompt()    .user("Which number is larger: 9.11 or 9.8?")    .call()    .chatResponse();AssistantMessage message = response.getResult().getOutput();String reasoning = message.getMetadata().get("reasoningContent");

流式推理内容

当使用流响应时，推理内容会像常规消息内容一样跨块累积：

Flux<ChatResponse> responseFlux = chatModel.stream(    new Prompt("Solve this logic puzzle..."));StringBuilder reasoning = new StringBuilder();StringBuilder answer = new StringBuilder();responseFlux.subscribe(chunk -> {    AssistantMessage message = chunk.getResult().getOutput();    // Accumulate reasoning if present    String reasoningChunk = message.getMetadata().get("reasoningContent");    if (reasoningChunk != null) {        reasoning.append(reasoningChunk);    }    // Accumulate the final answer    if (message.getContent() != null) {        answer.append(message.getContent());    }});

示例：DeepSeek R1

DeepSeek R1是一个推理模型，它暴露了它的内部推理过程：

spring.ai.openai.api-key=${DEEPSEEK_API_KEY}spring.ai.openai.base-url=https://api.deepseek.comspring.ai.openai.chat.options.model=deepseek-reasoner

当您向DeepSeek R1发出请求时，响应将包括推理内容（模型的思维过程）和最终答案。

示例：带推理解析器的vLLM

vLLM在配置了推理解析器时支持推理模型：

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \    --enable-reasoning \    --reasoning-parser deepseek_r1

spring.ai.openai.base-url=http://localhost:8000/v1spring.ai.openai.chat.options.model=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

reasoning_content的可用性完全取决于您使用的推理服务器。并非所有OpenAI兼容的服务器都公开推理内容，即使在使用具有推理能力的模型时也是如此。

本文来自网友投稿或网络内容，如有侵犯您的权益请联系我们删除，联系邮箱：wyl860211@qq.com 。

Spring AI学习笔记十九:聊天模型OpenAI之在OpenAI兼容服务器上使用额外参数

最新文章

热门文章

随机文章

Spring AI学习笔记十九:聊天模型OpenAI之在OpenAI兼容服务器上使用额外参数

【学习笔记260203】干货:消火栓水枪充实水柱长度速查

S4 HANA物料管理集成学习笔记-物料管理组织架构 1

最新文章

热门文章

随机文章