Back to Tools
vision.detect
Generates a detailed text description of visual content in images.
Input
Drop file or click to upload
JPG, PNG, WEBP — max 20MB
Output
Results will appear here after execution.
Example Output
{
"objects": [
{ "class": "person", "confidence": 0.95,
"bbox": [100, 50, 300, 400], "attributes": {} }
],
"model": "qwen-vl-max",
"usage": { "input_tokens": 1024, "output_tokens": 96 }
}