- Based model
teknium/OpenHermes-2.5-Mistral-7B
- Refined using Direct Preference Optimization (DPO) with the
Intel/orca_dpo_pairs
.
Uses
Direct Use
Way 1 (see the next one for faster inference Way 2
)
import transformers
from transformers import AutoTokenizer
new_model="abdullahalzubaer/NeuralHermes-2.5-Mistral-7B"
# Format prompt
message = [
{"role": "system", "content": "You are a helpful assistant chatbot."},
{"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
# Create pipeline
pipeline = transformers.pipeline(
"text-generation",
model=new_model,
tokenizer=tokenizer
)
# Generate text
sequences = pipeline(
prompt,
do_sample=True,
temperature=0.7,
top_p=0.9,
num_return_sequences=1,
max_length=200,
)
print(sequences[0]['generated_text'])
Sample Output from abdullahalzubaer/NeuralHermes-2.5-Mistral-7B
<|im_start|>system
You are a helpful assistant chatbot.<|im_end|>
<|im_start|>user
What is a Large Language Model?<|im_end|>
<|im_start|>assistant
A large language model is an artificial intelligence system designed to process and understand large amounts of natural language data.
It's a type of machine learning model, typically built using neural networks,
that is trained on vast datasets of text to learn patterns and relationships within the language.
These models can then generate human-like text, predict the next word in a sequence, perform language translation,
and answer questions, among other tasks. The "large" in the term refers to the size of the model, which includes
the number of parameters, the complexity of the architecture, and the amount of training data it processes.
As a result, large language models are capable of generating more complex and coherent responses compared to smaller models.
Sample Output from mlabonne/NeuralHermes-2.5-Mistral-7B
(provided as in the tutorial)
<|im_start|>system
You are a helpful assistant chatbot.<|im_end|>
<|im_start|>user
What is a Large Language Model?<|im_end|>
<|im_start|>assistant
A large language model is a type of artificial intelligence (AI) system that has been trained on vast amounts of text data.
These models are designed to understand and generate human language, allowing them to perform various natural
language processing tasks, such as text generation, language translation, and question answering. Large language models
typically use deep learning techniques, like recurrent neural networks (RNNs) or transformers, to learn patterns and
relationships in the data, enabling them to generate coherent and contextually relevant responses.
The size of these models, in terms of the number of parameters and the volume of data they are trained on,
plays a significant role in their ability to comprehend and produce complex language structures.
Therefore it worked maybe not as good as the original model but still close to it (due to max lenght in DPOTrainer?)
Way 2 (not sure but it is significantly faster than Way 1 above - therefore I recommend this. Taken directly from mistral model card and just replaced with my model)
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import trl
from trl import AutoModelForCausalLMWithValueHead, PPOConfig, PPOTrainer
print(torch.__version__)
print(transformers.__version__)
print(trl.__version__)
'''
1.13.0+cu117
4.38.2
0.7.11
'''
model_tokenizer = "abdullahalzubaer/NeuralHermes-2.5-Mistral-7B" #lets try my model
# model_tokenizer = "mistralai/Mistral-7B-Instruct-v0.2"
# model_tokenizer = "mistralai/Mixtral-8x7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_tokenizer)
tokenizer = AutoTokenizer.from_pretrained(model_tokenizer)
print(f"Loaded Model = {model.config._name_or_path}")
print(f"Loaded Tokenizer = {tokenizer.name_or_path}")
# Check available GPUs and print their names
gpu_count = torch.cuda.device_count()
print("Available GPUs:", gpu_count)
for i in range(gpu_count):
print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
# Choose a specific GPU (e.g., GPU 0)
device_id = 3 # Change this to select a different GPU
device = f"cuda:{device_id}" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
your_prompt="""What is a Large Language Model?"""
messages = [
{"role": "user", "content": your_prompt},
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(f"\nComplete I/O:\n{decoded[0]}")
# print(f"Using device: {device}")
# print(f"\nModel Reply:\n{decoded[0].split('[/INST]')[1]}")
'''
Complete I/O:
<|im_start|> user
What is a Large Language Model? Elaborate.
<|im_end|>
A Large Language Model is a type of artificial intelligence algorithm
designed to generate human-like text or respond to natural language input.
It is typically trained on vast amounts of text data, enabling it to
understand and generate language with a high level of complexity.<|im_end|>
'''
Loss
Step | Training Loss |
---|---|
1 | 0.693300 |
2 | 0.693200 |
3 | 0.692500 |
4 | 0.691300 |
5 | 0.68940 |
... | ... |
45 | 0.633700 |
46 | 0.629000 |
47 | 0.591300 |
48 | 0.558100 |
49 | 0.585800 |
50 | 0.558900 |
Hyperparameters:
All hyperparameters are as here except the following
# for TrainingArguments()
dataloader_num_workers=1, # had to add this #CHANGED_HERE#
dataloader_prefetch_factor=1
# for DPOTrainer()
# ref_model (it is not required as prompted by error when I included a reference model: not sure why tho, needs further investigation)
max_prompt_length=256, # had to lower this to 256 #CHANGED_HERE# or else cuda out of memory
max_length=256, # had to lower this to 256 #CHANGED_HERE# cuda out of memory
Reference
Thanks! https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.