imae

Finetuned Vision Model: unsloth/llama-3.2-11b-vision-instruct

Overview

This model is a finetuned version of unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit, optimized for vision-based instruction tasks.
It was trained 2x faster using Unsloth and Hugging Face's TRL library, enabling efficient large model adaptation while maintaining precision and accuracy.

Unsloth Logo

Key Features

  • Model Type: Multimodal LLama-based Vision Instruction Model
  • License: Apache-2.0
  • Base Model: unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
  • Developed by: Daemontatox
  • Language: English

Training Details

  • Framework: Hugging Face Transformers + TRL
  • Optimization: Unsloth methodology for accelerated finetuning
  • Quantization: 4-bit model, enabling deployment on resource-constrained devices
  • Dataset: Vision-specific instruction tasks (details to be added by user if public)

Performance Metrics

  • Inference Speed: Optimized for low-latency environments
  • Accuracy: Improved on vision-related benchmarks (details TBD based on evaluation)
  • Model Size: Lightweight due to quantization

Applications

  • Vision-based interactive AI
  • Instruction-following tasks with multimodal input
  • Resource-constrained deployment (e.g., edge devices)

How to Use

To load and use the model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "your_model_repository_name"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True)

# Example usage
input_text = "Describe the image in detail:"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
207
Safetensors
Model size
10.7B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support transformers models for this pipeline type.