clip-vit-large-patch14-ko

Korean CLIP model trained by Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation로 학습된 한국어 CLIP 모델입니다.

훈련 코드: https://github.com/Bing-su/KoCLIP_training_code

사용된 데이터: AIHUB에 있는 모든 한국어-영어 병렬 데이터

How to Use

1.

import requests
import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor

repo = "Bingsu/clip-vit-large-patch14-ko"
model = AutoModel.from_pretrained(repo)
processor = AutoProcessor.from_pretrained(repo)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=["고양이 두 마리", "개 두 마리"], images=image, return_tensors="pt", padding=True)
with torch.inference_mode():
    outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = logits_per_image.softmax(dim=1)
>>> probs
tensor([[0.9974, 0.0026]])

2.

from transformers import pipeline

repo = "Bingsu/clip-vit-large-patch14-ko"
pipe = pipeline("zero-shot-image-classification", model=repo)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
result = pipe(images=url, candidate_labels=["고양이 한 마리", "고양이 두 마리", "분홍색 소파에 드러누운 고양이 친구들"], hypothesis_template="{}")
>>> result
[{'score': 0.9907576441764832, 'label': '분홍색 소파에 드러누운 고양이 친구들'},
 {'score': 0.009206341579556465, 'label': '고양이 두 마리'},
 {'score': 3.606083555496298e-05, 'label': '고양이 한 마리'}]
Downloads last month
44,421
Safetensors
Model size
428M params
Tensor type
F32
·
I64
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.