12 KiB
12 KiB
OPENAI Compatible Target Implementation
Purpose
Add a separate execution path for OpenAI-compatible backends such as llama.cpp, while keeping the current official OpenAI path unchanged.
Checklist
- Add explicit OpenAI backend mode in config
- Route OpenAI requests to separate official and compatible runners
- Keep official OpenAI on
responses.create(...) - Add compatible
chat.completions.create(...)runner - Add compatible tool-call extractors
- Add backend selection tests
- Add basic memory/config regression coverage
- Normalize compatible streaming tool-call assembly
- Preserve file upload behavior in compatible backend
- Guard unsupported OpenAI-only tools for compatible backend
- Add environment docs and example config entries
- Add real-server integration coverage for compatible backend
- Revisit shared orchestration extraction for further deduplication
Non-Goals
- Do not change the current official OpenAI
responses.create(...)behavior. - Do not auto-switch behavior only because
OPENAI_BASE_URLis set. - Do not merge compatible backend quirks into the official OpenAI runner.
- Do not remove or weaken existing tool ranking, memory, RAG, logging, or upload behavior.
Current State
src/ai/unified-ai-runner.openai.tscurrently uses the officialresponsesAPI.src/ai/provider-adapters.tsalready has provider-specific adapters and tool/result mapping.src/ai/provider-adapter-contract.tsalready containsresponses-style extractors.src/ai/openai-chat-message.tscurrently modelsresponses-style messages, notchat.completionstool messages.src/ai/unified-ai-request-pipeline.tsprepares chat context and runtime state before the model call.src/ai/ai-runtime-target.tsresolves provider targets, base URLs, models, and keys.src/ai/unified-ai-runner.tool-ranker.tsalready uses achat.completions-style call path, which is closer to compatible backends.
Target Architecture
- Official OpenAI backend stays on
responses.create(...). - Compatible OpenAI backend uses
chat.completions.create(...). - Backend selection is explicit through config, for example
OPENAI_BACKEND=official|compatible. - Shared preparation logic remains common.
- Transport-specific request formatting and response parsing are split.
Configuration Design
- Add a new config value
OPENAI_BACKEND. - Allowed values should be
officialandcompatible. - Default must be
official. - Keep
OPENAI_BASE_URLas a transport setting only. OPENAI_BASE_URLmust not imply compatible mode by itself.- Extend environment schema and runtime config to expose this value.
- Update env docs and example env files.
Step 1: Config and Target Selection
- Update
src/common/environment.ts. - Add a new environment field for backend mode.
- Add setters if the codebase uses runtime env mutation in tests.
- Update the startup schema and runtime snapshot.
- Add tests for default value and explicit
compatibleselection.
Expected result:
- Official OpenAI stays unchanged by default.
- Explicit
OPENAI_BACKEND=compatibleselects the new execution path.
Step 2: Split Runner Selection
- Update the unified AI execution entry point.
- Add a small backend selector for OpenAI targets.
- Route official mode to the current runner.
- Route compatible mode to a new compatible runner.
- Keep other providers untouched.
Expected result:
- One codepath for official OpenAI.
- One codepath for OpenAI-compatible servers.
Step 3: Shared Orchestration Extraction
- Identify logic that is identical for both OpenAI branches.
- Extract common orchestration into a shared helper where possible.
- Keep these pieces shared:
- memory prompt injection
- tool ranking
- tool loop control
- logging and timing
- cancellation handling
- file upload post-processing
- document RAG preparation and cleanup
- Keep transport-specific pieces separate:
- request shape
- response parsing
- tool result message shape
- streaming event parsing
Expected result:
- Less duplicate logic.
- Cleaner separation between official and compatible behavior.
Step 4: Compatible Message Model
- Update
src/ai/openai-chat-message.tsor create a sibling type file for compatible chat messages. - Model
system,user,assistant, andtoolroles explicitly. - Support
tool_callson assistant messages. - Support
tool_call_idon tool result messages. - Preserve support for text and multimodal user content where the backend supports it.
- Avoid forcing
responsesoutput types intochat.completions.
Expected result:
- Compatible runner can build valid
chat.completionsmessage arrays.
Step 5: Compatible Contract Extractors
- Extend
src/ai/provider-adapter-contract.ts. - Add extractors for
chat.completionstool calls. - Add extractors for
chat.completionsstreaming tool call deltas. - Keep existing
responsesextractors intact. - Normalize tool call IDs, names, and argument text the same way as existing extractors.
- Ensure arguments are always represented as JSON text for the tool loop.
Expected result:
- Compatible runner can parse tool calls from both normal and streaming responses.
Step 6: Compatible Provider Adapter
- Update
src/ai/provider-adapters.ts. - Add a separate adapter or branch for OpenAI-compatible chat.completions behavior.
- Reuse existing tool ranking where safe.
- Make
appendToolResults(...)emitrole: "tool"messages withtool_call_id. - Keep official OpenAI adapter outputting
function_call_output. - Keep Mistral and Ollama unchanged.
Expected result:
- Each backend uses the tool result shape it expects.
Step 7: Compatible Runner Implementation
- Create a new file such as
src/ai/unified-ai-runner.openai-compatible.ts. - Use
openai.chat.completions.create(...). - Pass
messages,tools,model,stream, andsignal. - Map system prompt and memory prompt into the
messagesarray correctly. - Keep the tool loop structure from the current runner.
- Append assistant tool-call messages and tool result messages between rounds.
- Continue until no tool calls remain or max rounds is reached.
Expected result:
- Compatible backends can complete multi-round tool flows.
Step 8: Tool Call Loop Semantics
- Preserve
MAX_TOOL_ROUNDS. - Preserve tool ranking before each round.
- Preserve memory tool selection.
- Preserve file search injection when document RAG is active.
- Preserve file upload post-processing.
- Preserve max-rounds warnings and continuation decisions.
- Keep the final text visible in the stream message exactly as today.
Expected result:
- Compatible backend behaves like the current runner from the user’s perspective.
Step 9: Streaming Behavior
- Implement streaming event handling for
chat.completions. - Parse text deltas and append them to
TelegramStreamMessage. - Parse
delta.tool_callsand keep incremental tool-call state. - Update status text when tool usage starts and ends.
- Keep image generation and file-search status handling if the backend emits compatible signals.
- Finalize the stream only after the terminal completion event.
Expected result:
- Streaming works without losing tool call state.
Step 10: Tool Result Handling
- After each tool execution round, append tool results using the compatible message format.
- Ensure each tool result keeps the correct
tool_call_id. - Preserve the existing file upload hook.
- If upload fails, convert the failure into a tool result error string.
- Preserve the same tool memory map behavior.
Expected result:
- The backend receives a valid message history for the next round.
Step 11: Prompt and Memory Injection
- Keep
buildSystemInstruction(...)as the source of system prompt assembly. - Keep
buildUserMemoryPrompt(...)injected as a separate block. - Preserve the explicit separation between assistant memory and user memory.
- Preserve the
user.mdandsystem.mdmemory layout. - Ensure compatible backend receives the same semantic prompt content.
Expected result:
- Memory behavior stays identical across official and compatible backends.
Step 12: Tool Ranking Compatibility
- Review
src/ai/unified-ai-runner.tool-ranker.ts. - Verify whether the current JSON response handling is safe for compatible backends.
- If a backend cannot guarantee strict JSON mode, add a fallback parser.
- Keep ranking inputs and outputs consistent across both branches.
- Do not weaken tool selection heuristics.
Expected result:
- Tool ranking remains deterministic enough for both branches.
Step 13: File Search and RAG
- Keep document RAG preparation in the request pipeline.
- Keep vector store preparation for official OpenAI.
- Decide whether compatible backend supports file search or needs a no-op fallback.
- If unsupported, guard the tool list so the compatible backend never receives unsupported tools.
- Keep cleanup behavior for temporary artifacts.
Expected result:
- Compatible backend does not receive tools it cannot execute.
Step 14: Error Handling
- Preserve abort handling.
- Preserve response failure handling.
- Preserve stream error handling.
- Surface backend-specific incompatibilities as explicit errors.
- Do not silently fall back from compatible to official mode.
- Keep logs actionable.
Expected result:
- Failures are obvious and debuggable.
Step 15: Logging and Observability
- Keep the current AI logs and duration tracking.
- Add backend mode to log metadata.
- Log tool calls, tool outputs, and round transitions in both branches.
- Preserve existing observability hooks.
- Add explicit labels for official vs compatible runs.
Expected result:
- Debugging remains easy after the split.
Step 16: Tests
- Add unit tests for backend selection.
- Add unit tests for compatible message conversion.
- Add unit tests for compatible tool call extraction.
- Add integration tests for a tool-call round trip using mocked
chat.completions. - Add tests proving the official
responsespath is unchanged. - Add tests for streaming tool call parsing if the backend supports it.
- Add tests for fallback behavior in the tool ranker if needed.
Expected result:
- Both branches are covered and regressions are visible quickly.
Step 17: Suggested File Changes
src/common/environment.tssrc/ai/ai-runtime-target.tssrc/ai/unified-ai-request-pipeline.tssrc/ai/unified-ai-runner.openai.tssrc/ai/unified-ai-runner.openai-compatible.tssrc/ai/provider-adapter-contract.tssrc/ai/provider-adapters.tssrc/ai/openai-chat-message.tssrc/ai/unified-ai-runner.tool-ranker.tstest/*.test.mjs.env.example- Documentation files for backend selection
Implementation Order
- Add config flag and wire it through environment parsing.
- Add backend selection logic.
- Add compatible message and extractor support.
- Create the compatible runner.
- Reuse shared orchestration where possible.
- Wire tests.
- Verify official behavior is unchanged.
- Verify compatible backend works with a real OpenAI-compatible server.
Verification Plan
- Run unit tests.
- Run integration tests.
- Verify official OpenAI path still uses
responses.create(...). - Verify compatible path uses
chat.completions.create(...). - Verify a
llama.cpp-style server can complete a tool loop. - Verify memory tools still work.
- Verify document RAG and file upload behavior do not regress.
Risks
- Some OpenAI-compatible servers do not support every official OpenAI feature.
- Streaming tool call deltas may differ across providers.
- JSON-mode assumptions in the ranker may not hold for all compatible servers.
- Tool schema filtering may need backend-specific allowlists.
- Message conversion mistakes can break tool loops silently if not tested.
Acceptance Criteria
- Official OpenAI behavior is unchanged.
- Compatible backend can run a full chat loop with tools.
- Tool calls are correctly extracted and executed.
- Tool results are appended in the correct format.
- Memory injection still works.
- Document RAG and file upload behavior remain functional or fail explicitly.
- Tests cover both branches.
Final Note
The key design rule is simple: keep official OpenAI responses behavior intact, and introduce OpenAI-compatible chat.completions behavior as a separate backend mode with its own parsing and message shape.