deepseek: Fix for max output tokens blocking completions (#45236) (cherry-pick to preview) (#45250)

zed-zippy[bot] and Ben Brandt created 1 month ago

Cherry-pick of #45236 to preview

----
They count the requested max_output_tokens against the prompt total.
Seems like a bug on their end as most other providers don't do this, but
now we just default to None for the main models and let the API use its
default behavior which works just fine.

Closes: #45134

Release Notes:

- deepseek: Fix issue with Deepseek API that was causing the token limit
to be reached sooner than necessary

Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>

Change summary

crates/deepseek/src/deepseek.rs | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

Detailed changes

crates/deepseek/src/deepseek.rs 🔗

@@ -103,8 +103,9 @@ impl Model {
 
     pub fn max_output_tokens(&self) -> Option<u64> {
         match self {
-            Self::Chat => Some(8_192),
-            Self::Reasoner => Some(64_000),
+            // Their API treats this max against the context window, which means we hit the limit a lot
+            // Using the default value of None in the API instead
+            Self::Chat | Self::Reasoner => None,
             Self::Custom {
                 max_output_tokens, ..
             } => *max_output_tokens,