Default token limit?
Hi,
Great model here. Been looking for something to replace my current e5-base-v2 and this looks like it!
Was wondering if the model has a token limit of 512 by default when I load it via
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
angle.set_prompt(prompt=Prompts.C)
or do I have to explicity set the token limit?
For eg in Sentence Transformers I do
model = SentenceTransformer('intfloat/e5-base-v2')
model.max_seq_length = 512
because the default token limit is 128.
Another thing that confuses me is the Prompts. I'm using the model for Retrieval and Summarizing. Do I have to apply the prompts?
The example in Huggingface gives this:
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
angle.set_prompt(prompt=Prompts.C)
However this colab from your Github does not have the code using a prompt, so I'm a bit confused.
Also do I have to in some way explicity define what prompt I should use for summarization or retrieval or does the model somehow figure out what is needed and apply the right prompt?
For eg in e5, we specifically prepend query: or passage: while embedding the text.
Last question: How do I use the onnx format. Will the angle-emb package support it directly? Right now it loads the safetensors version from HF. Or do I have to pip install onnx, in which case how do I configure it for Prompts.C
Thanks for following our work.
First, our model has a token limit of 512, and you don't need to specify the max token explicitly.
Second, if you use it for retrieval, you need to set a prompt for the query (no need for documents). There are two ways:
angle.set_prompt(Prompts.C); angle.encode(query)
- manually apply the prompt to the query text when calling the
encode()
function. For exampleangle.encode(Prompts.C.format(text=query))
.
For other non-retrieval scenarios, there is no need to set the prompt.
If you use it for retrieval and summarization at the same time, I suggest manually setting the prompt when calling encode()
, i.e., the second way.
Third, currently, angle_emb
does not support onnx inference. You can use HuggingFace optimum
for inference.
Here is the usage: https://huggingface.co/WhereIsAI/UAE-Large-V1/discussions/10#659df9c060736ff2a6dc04e7
Thanks.
Regarding onnx, I'm getting some unexpected results.
The onnx is almost 4X slower than the regular embed.
Here's my code. Any idea what might be wrong? I'm running this on a T4 in colab.
!pip install --upgrade optimum[onnxruntime] onnxruntime-gpu onnx
from optimum.onnxruntime import ORTModelForFeatureExtraction
from optimum.pipelines import pipeline
model = ORTModelForFeatureExtraction.from_pretrained('WhereIsAI/UAE-Large-V1', file_name="onnx/model.onnx")
def onnx_embed(query):
extractor = pipeline('feature-extraction', model=model)
output = extractor(query)
return output
import time
query = " some test"
timh = time.time()
avec = onnx_embed(query)[0][0]
print(time.time()-timh) # prints 0.28627943992614746
print(avec)
!pip install -U angle-emb --quiet
from angle_emb import AnglE, Prompts
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
def uae_embed(query):
vec = angle.encode(query, to_numpy=True)
return vec.flatten().tolist()
import time
query = " some test"
timh = time.time()
vec = uae_embed(query)
print(vec)
print(time.time()-timh) # prints 0.07494688034057617
Also wanted a bit of advice on one particular use case. I'm trying to do extractive summarization in a passage and find the sentence in the passage that is most similar to the whole passage.
So for example, in the passage: The ship set out to sail around the world. It was manned by sailors from around the world. They set out to sail yesterday."
It would choose let's say "The ship set out to sail around the world" as the sentence most similar to the whole passage.
In this case, I am essentially comparing 1 passage to 3 individual sentences. Would this be considered retrieval as well? Do I have to add the query prompt to the passage embedding here, but not add them to the individual sentence embeddings?