Quickstart
Comprehensive guide for using the Qwen2-VL API on Kloudihub.
Quickstart
The Qwen2-VL API provides advanced vision-language capabilities, enabling complex document understanding and OCR tasks. This guide provides connection details, API usage, and code examples.
Introduction
Qwen2-VL is a series of large-scale vision-language models from the Qwen team at Alibaba Cloud. Unlike traditional OCR, Qwen2-VL can understand images in context, allowing for not just text extraction but also question-answering about document content, layout parsing, and multimodal reasoning.
It is particularly powerful for complex document analysis where context and relationships between elements are important. Key features include:
- Multimodal Understanding: Goes beyond OCR to understand the semantic meaning of images.
- Complex Layout Parsing: Excels at interpreting tables, charts, and complex document structures.
- High Resolution Support: Capable of processing high-resolution images for fine-grained detail extraction.
- Powerful Reasoning: Can answer specific questions based on the visual input.
While the user asked for Qwen3-VL, please note that current state-of-the-art models in this family are part of the Qwen2-VL series. This documentation is optimized for the latest available versions.
properties
| Property | Value |
|---|---|
| Base URL | https://kloudihub.com |
| Endpoint | /api/v1/qwen-vl |
| Method | POST |
| Content-Type | multipart/form-data |
Authentication
Authentication is handled via a Bearer Token. Ensure you include your API key in the Authorization header.
Authorization: Bearer <YOUR_API_KEY>Request Parameters
The API accepts a file upload via multipart/form-data.
| Parameter | Type | Required | Description |
|---|---|---|---|
file | File | Yes | The image or document file to be processed. |
prompt | String | No | Optional text prompt to guide the extraction or question-answering. |
Response Format
The API returns a JSON response containing the model's output.
{
"status": "success",
"data": {
"output": "Parsed document content or answer to prompt...",
"confidence": 0.96
}
}Code Examples
curl -X POST https://kloudihub.com/api/v1/qwen-vl \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@/path/to/document.jpg" \
-F "prompt=Extract all text from this image"Using the axios library:
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
const form = new FormData();
form.append('file', fs.createReadStream('/path/to/document.jpg'));
form.append('prompt', 'Extract all text from this image');
axios.post('https://kloudihub.com/api/v1/qwen-vl', form, {
headers: {
...form.getHeaders(),
'Authorization': 'Bearer YOUR_API_KEY'
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});Using axios with TypeScript:
import axios from 'axios';
import FormData from 'form-data';
import fs from 'fs';
interface VLResponse {
status: string;
data: {
output: string;
confidence: number;
};
}
const analyzeDocument = async () => {
const form = new FormData();
form.append('file', fs.createReadStream('/path/to/document.jpg'));
form.append('prompt', 'Extract all text from this image');
try {
const { data } = await axios.post<VLResponse>(
'https://kloudihub.com/api/v1/qwen-vl',
form,
{
headers: {
...form.getHeaders(),
'Authorization': 'Bearer YOUR_API_KEY',
},
}
);
console.log('Model Output:', data.data.output);
} catch (error) {
console.error('VL Analysis Error:', error);
}
};
analyzeDocument();Using the requests library:
import requests
url = "https://kloudihub.com/api/v1/qwen-vl"
file_path = "/path/to/document.jpg"
api_key = "YOUR_API_KEY"
with open(file_path, "rb") as file:
files = {"file": file}
data = {"prompt": "Extract all text from this image"}
headers = {
"Authorization": f"Bearer {api_key}"
}
response = requests.post(url, headers=headers, files=files, data=data)
if response.status_code == 200:
print("Response:", response.json())
else:
print(f"Error {response.status_code}: {response.text}")