Update README.md

2125b1e verified 2 days ago

10.5 kB

	---
	license: apache-2.0
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- axolotl
	- dpo
	- trl
	base_model: mistralai/Mistral-Nemo-Instruct-2407
	model-index:
	- name: Humanish-Mistral-Nemo-Instruct-2407
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 54.51
	name: strict accuracy
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 32.71
	name: normalized accuracy
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 7.63
	name: exact match
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 5.03
	name: acc_norm
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 9.4
	name: acc_norm
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 28.01
	name: accuracy
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407
	name: Open LLM Leaderboard
	datasets:
	- HumanLLMs/Human-Like-DPO-Dataset
	language:
	- en
	---
	<div align="center">
	<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/63da3d7ae697e5898cb86854/H-vpXOX6KZu01HnV87Jk5.jpeg" width="320" height="320" />
	<h1>Enhancing Human-Like Responses in Large Language Models</h1>
	</div>

	<p align="center">
	&nbsp&nbsp \| 🤗 <a href="https://huggingface.co/collections/HumanLLMs/human-like-humanish-llms-6759fa68f22e11eb1a10967e">Models</a>&nbsp&nbsp \|
	&nbsp&nbsp 📊 <a href="https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset">Dataset</a>&nbsp&nbsp \|
	&nbsp&nbsp 📄<a href="https://arxiv.org/abs/2501.05032">Paper</a>&nbsp&nbsp \|
	</p>

	# 🚀 Human-Like-Llama3-8B-Instruct

	This model is a fine-tuned version of [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407), specifically optimized to generate more human-like and conversational responses.

	The fine-tuning process employed both [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685) and [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) to enhance natural language understanding, conversational coherence, and emotional intelligence in interactions.

	The proccess of creating this models is detailed in the research paper [“Enhancing Human-Like Responses in Large Language Models”](https://arxiv.org/abs/2501.05032).

	# 🛠️ Training Configuration

	- Base Model: Mistral-Nemo-Instruct-2407
	- Framework: Axolotl v0.4.1
	- Hardware: 2x NVIDIA A100 (80 GB) GPUs
	- Training Time: ~3 hours 40 minutes
	- Dataset: Synthetic dataset with ≈11,000 samples across 256 diverse topics

	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.1`
	```yaml
	base_model: mistralai/Mistral-Nemo-Instruct-2407
	model_type: MistralForCausalLM
	tokenizer_type: AutoTokenizer

	load_in_8bit: true
	load_in_4bit: false
	strict: false

	chat_template: inst
	rl: dpo
	datasets:
	- path: HumanLLMs/humanish-dpo-project
	type: chatml.prompt_pairs
	conversation: mistral

	dataset_prepared_path: last_run_prepared
	val_set_size: 0.05
	output_dir: ./humanish-mistral-nemo-instruct-2407

	sequence_len: 8192
	sample_packing: false
	pad_to_sequence_len: true

	adapter: lora
	lora_model_dir:
	lora_r: 8
	lora_alpha: 4
	lora_dropout: 0.05
	lora_target_linear: true
	lora_fan_in_fan_out:

	wandb_project: Humanish-DPO
	wandb_entity:
	wandb_watch:
	wandb_name:
	wandb_log_model:

	hub_model_id: HumanLLMs/Humanish-Mistral-Nemo-Instruct-2407

	gradient_accumulation_steps: 8
	micro_batch_size: 2
	num_epochs: 1
	optimizer: adamw_bnb_8bit
	lr_scheduler: cosine
	learning_rate: 0.0002

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true
	s2_attention:

	warmup_steps: 10
	evals_per_epoch: 2
	eval_table_size:
	eval_max_new_tokens: 128
	saves_per_epoch: 1
	debug:
	deepspeed:
	weight_decay: 0.0
	fsdp:
	fsdp_config:

	special_tokens:
	pad_token: </s>

	save_safetensors: true
	```

	</details><br>

	# 💬 Prompt Template

	You can use Mistral-Nemo prompt template while using the model:

	### Mistral-Nemo

	```
	<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]
	```

	This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the
	`tokenizer.apply_chat_template()` method:

	```python
	messages = [
	{"role": "system", "content": "You are helpful AI asistant."},
	{"role": "user", "content": "Hello!"}
	]
	gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
	model.generate(**gen_input)
	```

	# 🤖 Models

	\| Model \| Download \|
	\|:---------------------:\|:-----------------------------------------------------------------------:\|
	\| Human-Like-Llama-3-8B-Instruct \| 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-LLama3-8B-Instruct) \|
	\| Human-Like-Qwen-2.5-7B-Instruct \| 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) \|
	\| Human-Like-Mistral-Nemo-Instruct \| 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407) \|

	# 🔄 Quantizationed versions

	## GGUF [@bartowski](https://huggingface.co/bartowski)

	- https://huggingface.co/bartowski/Human-Like-LLama3-8B-Instruct-GGUF

	- https://huggingface.co/bartowski/Human-Like-Qwen2.5-7B-Instruct-GGUF

	- https://huggingface.co/bartowski/Human-Like-Mistral-Nemo-Instruct-2407-GGUF


	# 🎯 Benchmark Results

	\| Group \| Model \| Average \| IFEval \| BBH \| MATH Lvl 5 \| GPQA \| MuSR \| MMLU-PRO \|
	\|--------------------------------\|--------------------------------\|-------------\|------------\|---------\|----------------\|----------\|----------\|--------------\|
	\| Llama Models \| Human-Like-Llama-3-8B-Instruct \| 22.37 \| 64.97 \| 28.01 \| 8.45 \| 0.78 \| 2.00 \| 30.01 \|
	\| \| Llama-3-8B-Instruct \| 23.57 \| 74.08 \| 28.24 \| 8.68 \| 1.23 \| 1.60 \| 29.60 \|
	\| \| Difference (Human-Like) \| -1.20 \| -9.11 \| -0.23 \| -0.23 \| -0.45 \| +0.40 \| +0.41 \|
	\| Qwen Models \| Human-Like-Qwen-2.5-7B-Instruct \| 26.66 \| 72.84 \| 34.48 \| 0.00 \| 6.49 \| 8.42 \| 37.76 \|
	\| \| Qwen-2.5-7B-Instruct \| 26.86 \| 75.85 \| 34.89 \| 0.00 \| 5.48 \| 8.45 \| 36.52 \|
	\| \| Difference (Human-Like) \| -0.20 \| -3.01 \| -0.41 \| 0.00 \| +1.01\| -0.03 \| +1.24 \|
	\| Mistral Models \| Human-Like-Mistral-Nemo-Instruct \| 22.88 \| 54.51 \| 32.70 \| 7.62 \| 5.03 \| 9.39 \| 28.00 \|
	\| \| Mistral-Nemo-Instruct \| 23.53 \| 63.80 \| 29.68 \| 5.89 \| 5.37 \| 8.48 \| 27.97 \|
	\| \| Difference (Human-Like) \| -0.65 \| -9.29 \| +3.02\| +1.73 \| -0.34 \| +0.91 \| +0.03 \|


	# 📊 Dataset

	The dataset used for fine-tuning was generated using LLaMA 3 models. The dataset includes 10,884 samples across 256 distinct topics such as technology, daily life, science, history, and arts. Each sample consists of:

	- Human-like responses: Natural, conversational answers mimicking human dialogue.
	- Formal responses: Structured and precise answers with a more formal tone.

	The dataset has been open-sourced and is available at:

	- 👉 [Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)

	More details on the dataset creation process can be found in the accompanying research paper.

	# 📝 Citation

	```
	@misc{çalık2025enhancinghumanlikeresponseslarge,
	title={Enhancing Human-Like Responses in Large Language Models},
	author={Ethem Yağız Çalık and Talha Rüzgar Akkuş},
	year={2025},
	eprint={2501.05032},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2501.05032},
	}
	```