melod1n/tg-chat-bot

Fork 0

Files

T

melod1n 46a99605e6 Add OpenAI compatible chat backend

2026-05-22 20:52:35 +03:00

12 KiB

Raw Blame History

OPENAI Compatible Target Implementation

Purpose

Add a separate execution path for OpenAI-compatible backends such as llama.cpp, while keeping the current official OpenAI path unchanged.

Checklist

Add explicit OpenAI backend mode in config
Route OpenAI requests to separate official and compatible runners
Keep official OpenAI on responses.create(...)
Add compatible chat.completions.create(...) runner
Add compatible tool-call extractors
Add backend selection tests
Add basic memory/config regression coverage
Normalize compatible streaming tool-call assembly
Preserve file upload behavior in compatible backend
Guard unsupported OpenAI-only tools for compatible backend
Add environment docs and example config entries
Add real-server integration coverage for compatible backend
Revisit shared orchestration extraction for further deduplication

Non-Goals

Do not change the current official OpenAI responses.create(...) behavior.
Do not auto-switch behavior only because OPENAI_BASE_URL is set.
Do not merge compatible backend quirks into the official OpenAI runner.
Do not remove or weaken existing tool ranking, memory, RAG, logging, or upload behavior.

Current State

src/ai/unified-ai-runner.openai.ts currently uses the official responses API.
src/ai/provider-adapters.ts already has provider-specific adapters and tool/result mapping.
src/ai/provider-adapter-contract.ts already contains responses-style extractors.
src/ai/openai-chat-message.ts currently models responses-style messages, not chat.completions tool messages.
src/ai/unified-ai-request-pipeline.ts prepares chat context and runtime state before the model call.
src/ai/ai-runtime-target.ts resolves provider targets, base URLs, models, and keys.
src/ai/unified-ai-runner.tool-ranker.ts already uses a chat.completions-style call path, which is closer to compatible backends.

Target Architecture

Official OpenAI backend stays on responses.create(...).
Compatible OpenAI backend uses chat.completions.create(...).
Backend selection is explicit through config, for example OPENAI_BACKEND=official|compatible.
Shared preparation logic remains common.
Transport-specific request formatting and response parsing are split.

Configuration Design

Add a new config value OPENAI_BACKEND.
Allowed values should be official and compatible.
Default must be official.
Keep OPENAI_BASE_URL as a transport setting only.
OPENAI_BASE_URL must not imply compatible mode by itself.
Extend environment schema and runtime config to expose this value.
Update env docs and example env files.

Step 1: Config and Target Selection

Update src/common/environment.ts.
Add a new environment field for backend mode.
Add setters if the codebase uses runtime env mutation in tests.
Update the startup schema and runtime snapshot.
Add tests for default value and explicit compatible selection.

Expected result:

Official OpenAI stays unchanged by default.
Explicit OPENAI_BACKEND=compatible selects the new execution path.

Step 2: Split Runner Selection

Update the unified AI execution entry point.
Add a small backend selector for OpenAI targets.
Route official mode to the current runner.
Route compatible mode to a new compatible runner.
Keep other providers untouched.

Expected result:

One codepath for official OpenAI.
One codepath for OpenAI-compatible servers.

Step 3: Shared Orchestration Extraction

Identify logic that is identical for both OpenAI branches.
Extract common orchestration into a shared helper where possible.
Keep these pieces shared:
- memory prompt injection
- tool ranking
- tool loop control
- logging and timing
- cancellation handling
- file upload post-processing
- document RAG preparation and cleanup
Keep transport-specific pieces separate:
- request shape
- response parsing
- tool result message shape
- streaming event parsing

Expected result:

Less duplicate logic.
Cleaner separation between official and compatible behavior.

Step 4: Compatible Message Model

Update src/ai/openai-chat-message.ts or create a sibling type file for compatible chat messages.
Model system, user, assistant, and tool roles explicitly.
Support tool_calls on assistant messages.
Support tool_call_id on tool result messages.
Preserve support for text and multimodal user content where the backend supports it.
Avoid forcing responses output types into chat.completions.

Expected result:

Compatible runner can build valid chat.completions message arrays.

Step 5: Compatible Contract Extractors

Extend src/ai/provider-adapter-contract.ts.
Add extractors for chat.completions tool calls.
Add extractors for chat.completions streaming tool call deltas.
Keep existing responses extractors intact.
Normalize tool call IDs, names, and argument text the same way as existing extractors.
Ensure arguments are always represented as JSON text for the tool loop.

Expected result:

Compatible runner can parse tool calls from both normal and streaming responses.

Step 6: Compatible Provider Adapter

Update src/ai/provider-adapters.ts.
Add a separate adapter or branch for OpenAI-compatible chat.completions behavior.
Reuse existing tool ranking where safe.
Make appendToolResults(...) emit role: "tool" messages with tool_call_id.
Keep official OpenAI adapter outputting function_call_output.
Keep Mistral and Ollama unchanged.

Expected result:

Each backend uses the tool result shape it expects.

Step 7: Compatible Runner Implementation

Create a new file such as src/ai/unified-ai-runner.openai-compatible.ts.
Use openai.chat.completions.create(...).
Pass messages, tools, model, stream, and signal.
Map system prompt and memory prompt into the messages array correctly.
Keep the tool loop structure from the current runner.
Append assistant tool-call messages and tool result messages between rounds.
Continue until no tool calls remain or max rounds is reached.

Expected result:

Compatible backends can complete multi-round tool flows.

Step 8: Tool Call Loop Semantics

Preserve MAX_TOOL_ROUNDS.
Preserve tool ranking before each round.
Preserve memory tool selection.
Preserve file search injection when document RAG is active.
Preserve file upload post-processing.
Preserve max-rounds warnings and continuation decisions.
Keep the final text visible in the stream message exactly as today.

Expected result:

Compatible backend behaves like the current runner from the user’s perspective.

Step 9: Streaming Behavior

Implement streaming event handling for chat.completions.
Parse text deltas and append them to TelegramStreamMessage.
Parse delta.tool_calls and keep incremental tool-call state.
Update status text when tool usage starts and ends.
Keep image generation and file-search status handling if the backend emits compatible signals.
Finalize the stream only after the terminal completion event.

Expected result:

Streaming works without losing tool call state.

Step 10: Tool Result Handling

After each tool execution round, append tool results using the compatible message format.
Ensure each tool result keeps the correct tool_call_id.
Preserve the existing file upload hook.
If upload fails, convert the failure into a tool result error string.
Preserve the same tool memory map behavior.

Expected result:

The backend receives a valid message history for the next round.

Step 11: Prompt and Memory Injection

Keep buildSystemInstruction(...) as the source of system prompt assembly.
Keep buildUserMemoryPrompt(...) injected as a separate block.
Preserve the explicit separation between assistant memory and user memory.
Preserve the user.md and system.md memory layout.
Ensure compatible backend receives the same semantic prompt content.

Expected result:

Memory behavior stays identical across official and compatible backends.

Step 12: Tool Ranking Compatibility

Review src/ai/unified-ai-runner.tool-ranker.ts.
Verify whether the current JSON response handling is safe for compatible backends.
If a backend cannot guarantee strict JSON mode, add a fallback parser.
Keep ranking inputs and outputs consistent across both branches.
Do not weaken tool selection heuristics.

Expected result:

Tool ranking remains deterministic enough for both branches.

Step 13: File Search and RAG

Keep document RAG preparation in the request pipeline.
Keep vector store preparation for official OpenAI.
Decide whether compatible backend supports file search or needs a no-op fallback.
If unsupported, guard the tool list so the compatible backend never receives unsupported tools.
Keep cleanup behavior for temporary artifacts.

Expected result:

Compatible backend does not receive tools it cannot execute.

Step 14: Error Handling

Preserve abort handling.
Preserve response failure handling.
Preserve stream error handling.
Surface backend-specific incompatibilities as explicit errors.
Do not silently fall back from compatible to official mode.
Keep logs actionable.

Expected result:

Failures are obvious and debuggable.

Step 15: Logging and Observability

Keep the current AI logs and duration tracking.
Add backend mode to log metadata.
Log tool calls, tool outputs, and round transitions in both branches.
Preserve existing observability hooks.
Add explicit labels for official vs compatible runs.

Expected result:

Debugging remains easy after the split.

Step 16: Tests

Add unit tests for backend selection.
Add unit tests for compatible message conversion.
Add unit tests for compatible tool call extraction.
Add integration tests for a tool-call round trip using mocked chat.completions.
Add tests proving the official responses path is unchanged.
Add tests for streaming tool call parsing if the backend supports it.
Add tests for fallback behavior in the tool ranker if needed.

Expected result:

Both branches are covered and regressions are visible quickly.

Step 17: Suggested File Changes

src/common/environment.ts
src/ai/ai-runtime-target.ts
src/ai/unified-ai-request-pipeline.ts
src/ai/unified-ai-runner.openai.ts
src/ai/unified-ai-runner.openai-compatible.ts
src/ai/provider-adapter-contract.ts
src/ai/provider-adapters.ts
src/ai/openai-chat-message.ts
src/ai/unified-ai-runner.tool-ranker.ts
test/*.test.mjs
.env.example
Documentation files for backend selection

Implementation Order

Add config flag and wire it through environment parsing.
Add backend selection logic.
Add compatible message and extractor support.
Create the compatible runner.
Reuse shared orchestration where possible.
Wire tests.
Verify official behavior is unchanged.
Verify compatible backend works with a real OpenAI-compatible server.

Verification Plan

Run unit tests.
Run integration tests.
Verify official OpenAI path still uses responses.create(...).
Verify compatible path uses chat.completions.create(...).
Verify a llama.cpp-style server can complete a tool loop.
Verify memory tools still work.
Verify document RAG and file upload behavior do not regress.

Risks

Some OpenAI-compatible servers do not support every official OpenAI feature.
Streaming tool call deltas may differ across providers.
JSON-mode assumptions in the ranker may not hold for all compatible servers.
Tool schema filtering may need backend-specific allowlists.
Message conversion mistakes can break tool loops silently if not tested.

Acceptance Criteria

Official OpenAI behavior is unchanged.
Compatible backend can run a full chat loop with tools.
Tool calls are correctly extracted and executed.
Tool results are appended in the correct format.
Memory injection still works.
Document RAG and file upload behavior remain functional or fail explicitly.
Tests cover both branches.

Final Note

The key design rule is simple: keep official OpenAI responses behavior intact, and introduce OpenAI-compatible chat.completions behavior as a separate backend mode with its own parsing and message shape.

12 KiB Raw Blame History Unescape Escape

OPENAI Compatible Target Implementation

Purpose

Checklist

Non-Goals

Current State

Target Architecture

Configuration Design

Step 1: Config and Target Selection

Step 2: Split Runner Selection

Step 3: Shared Orchestration Extraction

Step 4: Compatible Message Model

Step 5: Compatible Contract Extractors

Step 6: Compatible Provider Adapter

Step 7: Compatible Runner Implementation

Step 8: Tool Call Loop Semantics

Step 9: Streaming Behavior

Step 10: Tool Result Handling

Step 11: Prompt and Memory Injection

Step 12: Tool Ranking Compatibility

Step 13: File Search and RAG

Step 14: Error Handling

Step 15: Logging and Observability

Step 16: Tests

Step 17: Suggested File Changes

Implementation Order

Verification Plan

Risks

Acceptance Criteria

Final Note

12 KiB

Raw Blame History