fix: support local models with unknown max_tokens and context window (#2554)
Kartik33
created
Two fixes for local/custom model compatibility (LM Studio, Ollama, llama.cpp):
1. Don't send MaxOutputTokens when it's 0. Custom models not in the
catwalk providers list have DefaultMaxTokens=0, which gets sent as
max_tokens:0 in the API request. LM Studio rejects this with
"maxPredictedTokens does not satisfy the schema". Fix: only send
the field when the value is positive.
2. Skip auto-summarize when ContextWindow is 0. Custom models have
ContextWindow=0, making remaining tokens negative, which immediately
triggers summarize after the first response. The session resets with
"previous session was interrupted because it got too long" even for
short conversations. Fix: skip the check when context window is
unknown.
Fixes #1218 (regression), relates to #1583, #1591