vision.ocr

Generates a detailed text description of visual content in images.

Input

Drop file or click to upload

JPG, PNG, WEBP — max 20MB

Language (optional)

Output

Results will appear here after execution.

Example Output

{
  "text": "Hello World\nLine 2",
  "blocks": [
    { "text": "Hello World", "bbox": [10, 20, 200, 50], "confidence": 0.98 }
  ],
  "model": "qwen-vl-max",
  "usage": { "input_tokens": 512, "output_tokens": 64 }
}