vision.describe

Generates a detailed text description of visual content in images.

Input

Drop file or click to upload

JPG, PNG, WEBP — max 20MB

Prompt (optional)

Language (optional)

Detail level

Output

Results will appear here after execution.

Example Output

{
  "description": "A sunlit outdoor café scene with
    several patrons seated under yellow
    parasols along a cobblestone street.",
  "model": "qwen-vl-max",
  "usage": { "input_tokens": 1024, "output_tokens": 128 }
}