Skip to content

Generation Parameters

Control how models generate responses — from context length to creativity settings.

Context Window

Max Context Messages sets how many recent messages are sent to the model as context. Default: 20.

  • 5–20 — Shorter context, faster responses, less token usage
  • 20–50 — Longer context for complex, multi-turn conversations
  • 50–100 — Maximal context for very long discussions (may hit token limits)

This applies to all models. The actual context window in tokens depends on your model and message length.


Temperature

Controls randomness in model output. Range: 0.0 – 2.0.

  • 0.0 – 0.3 — More deterministic, consistent, factual
  • 0.5 – 0.8 — Balanced creativity (recommended default)
  • 1.0 – 2.0 — More random, creative, unpredictable

Higher temperature means the model is more likely to choose less probable words. Lower temperature produces more focused, repetitive outputs.

When to Adjust

  • Code / Facts: Use low temperature (0.0 – 0.3)
  • Creative Writing: Use high temperature (0.8 – 1.2)
  • General Chat: Use medium temperature (0.5 – 0.7)

Top P (Nucleus Sampling)

Controls the diversity of token selection. Range: 0.0 – 1.0.

The model considers only the smallest set of tokens whose cumulative probability exceeds top_p.

  • 0.1 — Very focused, only the most likely tokens
  • 0.5 — Moderate diversity
  • 0.9 – 1.0 — Full diversity (recommended default)

Usually you adjust either temperature or top P — not both.


Default Max Tokens

Sets a maximum token limit for model responses. When set, the model will not generate more than this many tokens in a single response. When not set (default), the model's own maximum applies.

Available presets:

256   512   1024   2048
4096  8192  16384  32768

Leave Unset for Flexibility

For most use cases, leave this unset. Set a limit only when you need consistent response lengths (e.g., short summaries) or want to cap costs.


Frequency Penalty

Reduces the model's tendency to repeat the same words. Range: -2.0 – 2.0.

  • Positive values (0.1 – 1.0) — Discourage repetition
  • Zero (0.0) — No penalty (default)
  • Negative values (-1.0 – -0.1) — Encourage repetition

Presence Penalty

Encourages the model to talk about new topics. Range: -2.0 – 2.0.

  • Positive values (0.1 – 1.0) — Encourage topic diversity
  • Zero (0.0) — No penalty (default)
  • Negative values — Stay on current topic

Thinking / Reasoning

Enables chain-of-thought reasoning for supported models (e.g., DeepSeek R1, Qwen3, Claude).

When enabled, the model generates internal reasoning before producing the final response. This improves accuracy for complex tasks but takes longer and uses more tokens.

Thinking Level

  • Low — Minimal reasoning, faster
  • Medium — Balanced (default)
  • High — Maximum reasoning for complex problems

Not All Models Support Thinking

Thinking mode requires a model that supports reasoning tokens. If your model doesn't support it, this setting has no effect.


Visualize Context Rollout

When enabled, Agora visually indicates which messages are included in the current context window vs. which have been rolled out (excluded due to the context window limit). This helps you understand:

  • How much of your conversation the model can "see"
  • When older messages drop out of context
  • Whether you need to increase the context window

The visualization appears as a subtle marker in the conversation view.


How Parameters Work

All generation parameters are nullable — when not explicitly set, they are not sent to the model, and the model uses its own defaults. Each parameter has a reset option to clear the value back to "not set."


Per-Conversation Overrides

You can override generation parameters for individual conversations using the Advanced Settings dialog in the chat screen (long-press the send button or use the ⋮ menu).