Quickstart

The Qwen2-VL API provides advanced vision-language capabilities, enabling complex document understanding and OCR tasks. This guide provides connection details, API usage, and code examples.

Introduction

Qwen2-VL is a series of large-scale vision-language models from the Qwen team at Alibaba Cloud. Unlike traditional OCR, Qwen2-VL can understand images in context, allowing for not just text extraction but also question-answering about document content, layout parsing, and multimodal reasoning.

It is particularly powerful for complex document analysis where context and relationships between elements are important. Key features include:

Multimodal Understanding: Goes beyond OCR to understand the semantic meaning of images.
Complex Layout Parsing: Excels at interpreting tables, charts, and complex document structures.
High Resolution Support: Capable of processing high-resolution images for fine-grained detail extraction.
Powerful Reasoning: Can answer specific questions based on the visual input.

While the user asked for Qwen3-VL, please note that current state-of-the-art models in this family are part of the Qwen2-VL series. This documentation is optimized for the latest available versions.

properties

Property	Value
Base URL	`https://kloudihub.com`
Endpoint	`/api/v1/qwen-vl`
Method	`POST`
Content-Type	`multipart/form-data`

Authentication

Authentication is handled via a Bearer Token. Ensure you include your API key in the Authorization header.

Authorization: Bearer <YOUR_API_KEY>

Request Parameters

The API accepts a file upload via multipart/form-data.

Parameter	Type	Required	Description
`file`	File	Yes	The image or document file to be processed.
`prompt`	String	No	Optional text prompt to guide the extraction or question-answering.

Response Format

The API returns a JSON response containing the model's output.

{
  "status": "success",
  "data": {
    "output": "Parsed document content or answer to prompt...",
    "confidence": 0.96
  }
}

Code Examples

curl -X POST https://kloudihub.com/api/v1/qwen-vl \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/document.jpg" \
  -F "prompt=Extract all text from this image"

Using the axios library:

const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

const form = new FormData();
form.append('file', fs.createReadStream('/path/to/document.jpg'));
form.append('prompt', 'Extract all text from this image');

axios.post('https://kloudihub.com/api/v1/qwen-vl', form, {
  headers: {
    ...form.getHeaders(),
    'Authorization': 'Bearer YOUR_API_KEY'
  }
})
.then(response => {
  console.log(response.data);
})
.catch(error => {
  console.error(error);
});

Using axios with TypeScript:

import axios from 'axios';
import FormData from 'form-data';
import fs from 'fs';

interface VLResponse {
  status: string;
  data: {
    output: string;
    confidence: number;
  };
}

const analyzeDocument = async () => {
  const form = new FormData();
  form.append('file', fs.createReadStream('/path/to/document.jpg'));
  form.append('prompt', 'Extract all text from this image');

  try {
    const { data } = await axios.post<VLResponse>(
      'https://kloudihub.com/api/v1/qwen-vl',
      form,
      {
        headers: {
          ...form.getHeaders(),
          'Authorization': 'Bearer YOUR_API_KEY',
        },
      }
    );
    console.log('Model Output:', data.data.output);
  } catch (error) {
    console.error('VL Analysis Error:', error);
  }
};

analyzeDocument();

Using the requests library:

import requests

url = "https://kloudihub.com/api/v1/qwen-vl"
file_path = "/path/to/document.jpg"
api_key = "YOUR_API_KEY"

with open(file_path, "rb") as file:
    files = {"file": file}
    data = {"prompt": "Extract all text from this image"}
    headers = {
        "Authorization": f"Bearer {api_key}"
    }
    
    response = requests.post(url, headers=headers, files=files, data=data)
    
    if response.status_code == 200:
        print("Response:", response.json())
    else:
        print(f"Error {response.status_code}: {response.text}")

Quickstart

Quickstart

Introduction

properties

Authentication

Request Parameters

Response Format

Code Examples

On this page