MiniMax-AI
commited on
Commit
·
e1640d7
1
Parent(s):
45fa731
Initial Commit
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitattributes +1 -0
- LICENSE +42 -0
- README.md +205 -3
- config.json +126 -0
- configuration_minimax_text_01.py +152 -0
- figures/MiniMaxLogo.png +0 -0
- figures/TextBench.png +0 -0
- figures/VisionBench.png +0 -0
- figures/hailuo.svg +1 -0
- figures/image.jpg +0 -0
- figures/minimax.svg +1 -0
- figures/niah.png +3 -0
- main.py +100 -0
- merges.txt +0 -0
- model-00000-of-00413.safetensors +3 -0
- model-00001-of-00413.safetensors +3 -0
- model-00002-of-00413.safetensors +3 -0
- model-00003-of-00413.safetensors +3 -0
- model-00004-of-00413.safetensors +3 -0
- model-00005-of-00413.safetensors +3 -0
- model-00006-of-00413.safetensors +3 -0
- model-00007-of-00413.safetensors +3 -0
- model-00008-of-00413.safetensors +3 -0
- model-00009-of-00413.safetensors +3 -0
- model-00010-of-00413.safetensors +3 -0
- model-00011-of-00413.safetensors +3 -0
- model-00012-of-00413.safetensors +3 -0
- model-00013-of-00413.safetensors +3 -0
- model-00014-of-00413.safetensors +3 -0
- model-00015-of-00413.safetensors +3 -0
- model-00016-of-00413.safetensors +3 -0
- model-00017-of-00413.safetensors +3 -0
- model-00018-of-00413.safetensors +3 -0
- model-00019-of-00413.safetensors +3 -0
- model-00020-of-00413.safetensors +3 -0
- model-00021-of-00413.safetensors +3 -0
- model-00022-of-00413.safetensors +3 -0
- model-00023-of-00413.safetensors +3 -0
- model-00024-of-00413.safetensors +3 -0
- model-00025-of-00413.safetensors +3 -0
- model-00026-of-00413.safetensors +3 -0
- model-00027-of-00413.safetensors +3 -0
- model-00028-of-00413.safetensors +3 -0
- model-00029-of-00413.safetensors +3 -0
- model-00030-of-00413.safetensors +3 -0
- model-00031-of-00413.safetensors +3 -0
- model-00032-of-00413.safetensors +3 -0
- model-00033-of-00413.safetensors +3 -0
- model-00034-of-00413.safetensors +3 -0
- model-00035-of-00413.safetensors +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
figures/niah.png filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
MINIMAX MODEL LICENSE AGREEMENT
|
3 |
+
|
4 |
+
1. Definitions
|
5 |
+
"Agreement" means the terms and conditions for use, reproduction, distribution and modification of the MiniMax Model Materials set forth herein.
|
6 |
+
"License" or "you" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
|
7 |
+
"MiniMax Model" means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by MiniMax at https://huggingface.co/MiniMaxAI/MiniMaxText01, https://huggingface.co/MiniMaxAI/MiniMaxVL01, https://github.com/MiniMax-AI/MiniMax01. In this agreement, MiniMax Model including MiniMaxText01 and MiniMaxVL01.
|
8 |
+
"MiniMax Model Materials" means, collectively, MiniMax’s proprietary MiniMax Model and Documentation (and any portion thereof) made available under this Agreement.
|
9 |
+
"MiniMax" or "we" means MiniMax AI.
|
10 |
+
|
11 |
+
2. License Rights and Redistribution
|
12 |
+
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under MiniMax’s intellectual property or other rights owned by MiniMax embodied in the MiniMax Model Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the MiniMax Model Materials.
|
13 |
+
b. Redistribution and Use.
|
14 |
+
i. If you distribute or make available the MiniMax Model Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such MiniMax Model Materials; and (B) prominently display “Built with MiniMax AI” on a related website, user interface, blogpost, about page, or product documentation. If you use the MiniMax Model Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “MiniMax” at the beginning of any such AI model name.
|
15 |
+
ii. You must retain in all copies of the MiniMax Model Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “MiniMax AI model is licensed under the MiniMax License, Copyright © MiniMax. All Rights Reserved.”
|
16 |
+
iii. Your use of the MiniMax Model Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Prohibited Uses Policy for the MiniMax Model Materials, which is hereby incorporated by reference into this Agreement.
|
17 |
+
iv. You will not use the MiniMax Model Materials or any output or results of the MiniMax Model Materials to improve any other large language model.
|
18 |
+
|
19 |
+
3. Additional Commercial Terms. If, on the MiniMax Model Materials release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 100 million monthly active users in the preceding calendar month, you must request a license from MiniMax, which MiniMax may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until MiniMax otherwise expressly grants you such rights.
|
20 |
+
|
21 |
+
4. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE MINIMAX MODEL MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND MINIMAX DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE MINIMAX MODEL MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE MINIMAX MODEL MATERIALS AND ANY OUTPUT AND RESULTS.
|
22 |
+
|
23 |
+
5. Limitation of Liability. IN NO EVENT WILL MINIMAX OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF MINIMAX OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
24 |
+
|
25 |
+
6. Intellectual Property.
|
26 |
+
a. No trademark licenses are granted under this Agreement, and in connection with the MiniMax Model Materials, neither MiniMax nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the MiniMax Materials or as set forth in this Section 6(a). MiniMax hereby grants you a license to use "MiniMaxText01" or "MiniMaxVL01" (the "Mark") solely as required to comply with the last sentence of Section 2.b.i. All goodwill arising out of your use of the Mark will inure to the benefit of MiniMax.
|
27 |
+
b. Subject to MiniMax’s ownership of MiniMax Model Materials and derivatives made by or for MiniMax, with respect to any derivative works and modifications of the MiniMax Model Materials that are made by you, as between you and MiniMax, you are and will be the owner of such derivative works and modifications.
|
28 |
+
c. If you institute litigation or other proceedings against MiniMax or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the MiniMax Model Materials or outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless MiniMax from and against any claim by any third party arising out of or related to your use or distribution of the MiniMax Model Materials.
|
29 |
+
7. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access to the MiniMax Model Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. MiniMax may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the MiniMax Model Materials. Sections 2, 3 and 6 shall survive the termination of this Agreement.
|
30 |
+
|
31 |
+
8. Governing Law and Jurisdiction. This agreement will be governed and construed under the laws of Singapore without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this agreement. Any dispute arising out of or in connection with this Agreement, including any question regarding its existence, validity or termination, shall be referred to and finally resolved by arbitration administered by the Singapore International Arbitration Centre (“SIAC”) in accordance with the Arbitration Rules of the Singapore International Arbitration Centre (“SIAC Rules”) for the time being in force, which rules are deemed to be incorporated by reference in this clause.
|
32 |
+
|
33 |
+
You agree you will not use, or allow others to use,MiniMaxText01 or MiniMaxVL01 to:
|
34 |
+
1. Violate any applicable federal, state, local, or international law or regulation, or infringe upon the lawful rights or interests of any third party.
|
35 |
+
2. Assist with, engage in or in any way associate with any military purpose.
|
36 |
+
3. Exploit, harm, or attempt to exploit or harm minors in any way.
|
37 |
+
4. Generate or disseminate false or misleading information with the intent to harm others.
|
38 |
+
5. Generate or disseminate content prohibited by applicable laws or regulations.
|
39 |
+
6. Generate or disseminate personally identifiable information without proper authorization or for unreasonable or unlawful purposes.
|
40 |
+
7. Defame, disparage, harass, or cause harm to any individual or entity.
|
41 |
+
8. Carry out fully automated decision-making that adversely affects an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation.
|
42 |
+
9. Promote discrimination, hate speech, or harmful behavior towards individuals or groups based on race or ethnic origin, religion, disability, age, nationality and national origin, veteran status, sexual orientation, gender or gender identity, caste, immigration status, or any other legally protected characteristics or categories.
|
README.md
CHANGED
@@ -1,3 +1,205 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
<img src="figures/MiniMaxLogo.png" width="60%" alt="MiniMax-Text-01" />
|
3 |
+
</div>
|
4 |
+
<hr>
|
5 |
+
|
6 |
+
<div align="center" style="line-height: 1;">
|
7 |
+
<a href="https://www.minimaxi.com/en" target="_blank" style="margin: 2px;">
|
8 |
+
<img alt="Homepage" src="https://img.shields.io/badge/_Homepage-MiniMax-FF4040?style=flat-square&labelColor=2C3E50&logo=&logoWidth=20" style="display: inline-block; vertical-align: middle;"/>
|
9 |
+
</a>
|
10 |
+
<a href="https://huggingface.co/MiniMaxAI" target="_blank" style="margin: 2px;">
|
11 |
+
<img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Hugging_Face-MinMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
|
12 |
+
</a>
|
13 |
+
</div>
|
14 |
+
<div align="center" style="line-height: 1;">
|
15 |
+
<a href="https://www.hailuo.ai/" target="_blank" style="margin: 2px;">
|
16 |
+
<img alt="Chat" src="https://img.shields.io/badge/Chat-_Hailuo AI-FF4040?style=flat-square&labelColor=2C3E50&logo=&logoWidth=16" style="display: inline-block; vertical-align: middle;"/>
|
17 |
+
</a>
|
18 |
+
<a href="https://intl.minimaxi.com" style="margin: 2px;">
|
19 |
+
<img alt="API" src="https://img.shields.io/badge/⚡_API-Platform-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
|
20 |
+
</a>
|
21 |
+
</div>
|
22 |
+
<div align="center" style="line-height: 1;">
|
23 |
+
<a href="https://github.com/MiniMax-AI/MiniMax-01/blob/main/LICENSE" style="margin: 2px;">
|
24 |
+
<img alt="License" src="https://img.shields.io/badge/📜_License-Model_Agreement-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/>
|
25 |
+
</a>
|
26 |
+
</div>
|
27 |
+
|
28 |
+
|
29 |
+
# MiniMax-Text-01
|
30 |
+
|
31 |
+
## 1. Introduction
|
32 |
+
|
33 |
+
MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methods—such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.
|
34 |
+
|
35 |
+
<p align="center">
|
36 |
+
<img width="100%" src="figures/TextBench.png">
|
37 |
+
</p>
|
38 |
+
|
39 |
+
## 2. Model Architecture
|
40 |
+
|
41 |
+
The architecture of MiniMax-Text-01 is briefly described as follows:
|
42 |
+
- Total Parameters: 456B
|
43 |
+
- Activated Parameters per Token: 45.9B
|
44 |
+
- Number Layers: 80
|
45 |
+
- Hybrid Attention: a softmax attention is positioned after every 7 lightning attention.
|
46 |
+
- Number of attention heads: 64
|
47 |
+
- Attention head dimension: 128
|
48 |
+
- Mixture of Experts:
|
49 |
+
- Number of experts: 32
|
50 |
+
- Expert hidden dimension: 9216
|
51 |
+
- Top-2 routing strategy
|
52 |
+
- Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000
|
53 |
+
- Hidden Size: 6144
|
54 |
+
- Vocab Size: 200,064
|
55 |
+
|
56 |
+
## 3. Evaluation
|
57 |
+
|
58 |
+
### Core Academic Benchmarks
|
59 |
+
|
60 |
+
| **Tasks** | **GPT-4o (11-20)** | **Claude-3.5-Sonnet (10-22)** | **Gemini-1.5-Pro (002)** | **Gemini-2.0-Flash (exp)** | **Qwen2.5-72B-Inst.** | **DeepSeek-V3** | **Llama-3.1-405B-Inst.** | **MiniMax-Text-01** |
|
61 |
+
|-------------------------------|--------------------|-------------------------------|--------------------------|----------------------------|-----------------------|-----------------|--------------------------|---------------------|
|
62 |
+
| **General** | | | | | | | | |
|
63 |
+
| MMLU<sup>*</sup> | 85.7 | 88.3 | 86.8 | 86.5 | 86.1 | 88.5 | **88.6** | 88.5 |
|
64 |
+
| MMLU-Pro<sup>*</sup> | 74.4 | **78.0** | 75.8 | 76.4 | 71.1 | 75.9 | 73.3 | 75.7 |
|
65 |
+
| SimpleQA | **39.0** | 28.1 | 23.4 | 26.6 | 10.3 | 24.9 | 23.2 | 23.7 |
|
66 |
+
| C-SimpleQA | 64.6 | 56.8 | 59.4 | 63.3 | 52.2 | 64.8 | 54.7 | **67.4** |
|
67 |
+
| IFEval _(avg)_ | 84.1 | **90.1** | 89.4 | 88.4 | 87.2 | 87.3 | 86.4 | 89.1 |
|
68 |
+
| Arena-Hard | **92.4** | 87.6 | 85.3 | 72.7 | 81.2 | 91.4 | 63.5 | 89.1 |
|
69 |
+
| **Reasoning** | | | | | | | | |
|
70 |
+
| GPQA<sup>*</sup> _(diamond)_ | 46.0 | **65.0** | 59.1 | 62.1 | 49.0 | 59.1 | 50.7 | 54.4 |
|
71 |
+
| DROP<sup>*</sup> _(F1)_ | 89.2 | 88.8 | 89.2 | 89.3 | 85.0 | 91.0 | **92.5** | 87.8 |
|
72 |
+
| **Mathematics** | | | | | | | | |
|
73 |
+
| GSM8k<sup>*</sup> | 95.6 | **96.9** | 95.2 | 95.4 | 95.8 | 96.7 | 96.7 | 94.8 |
|
74 |
+
| MATH<sup>*</sup> | 76.6 | 74.1 | **84.6** | 83.9 | 81.8 | **84.6** | 73.8 | 77.4 |
|
75 |
+
| **Coding** | | | | | | | | |
|
76 |
+
| MBPP + | 76.2 | 75.1 | 75.4 | 75.9 | 77.0 | **78.8** | 73.0 | 71.7 |
|
77 |
+
| HumanEval | 90.2 | **93.7** | 86.6 | 89.6 | 86.6 | 92.1 | 89.0 | 86.9 |
|
78 |
+
|
79 |
+
<sup>*</sup> Evaluated following a _0-shot CoT_ setting.
|
80 |
+
|
81 |
+
### Long Benchmarks
|
82 |
+
#### 4M Needle In A Haystack Test
|
83 |
+
<p align="center">
|
84 |
+
<img width="90%" src="figures/niah.png">
|
85 |
+
</p>
|
86 |
+
|
87 |
+
#### Ruler
|
88 |
+
| Model | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M |
|
89 |
+
|-------|----|----|-----|-----|-----|------|------|------|----|
|
90 |
+
| **GPT-4o (11-20)** | **0.970** | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - |
|
91 |
+
| **Claude-3.5-Sonnet (10-22)** | 0.965 | 0.960 | 0.957 | 0.950 | **0.952** | 0.938 | - | - | - |
|
92 |
+
| **Gemini-1.5-Pro (002)** | 0.962 | 0.960 | **0.960** | **0.958** | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 |
|
93 |
+
| **Gemini-2.0-Flash (exp)** | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - |
|
94 |
+
| **MiniMax-Text-01** | 0.963 | **0.961** | 0.953 | 0.954 | 0.943 | **0.947** | **0.945** | **0.928** | **0.910** |
|
95 |
+
|
96 |
+
#### LongBench v2
|
97 |
+
| **Model** | **overall** | **easy** | **hard** | **short** | **medium** | **long** |
|
98 |
+
|----------------------------|-------------|----------|----------|------------|------------|----------|
|
99 |
+
| Human | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 |
|
100 |
+
| **w/ CoT** | | | | | | |
|
101 |
+
| GPT-4o (11-20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 |
|
102 |
+
| Claude-3.5-Sonnet (10-22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 |
|
103 |
+
| Deepseek-V3 | - | - | - | - | - | - |
|
104 |
+
| Qwen2.5-72B-Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 |
|
105 |
+
| **MiniMax-Text-01** | **56.5** | **66.1** | **50.5** | **61.7** | **56.7** | **47.2** |
|
106 |
+
| **w/o CoT** | | | | | | |
|
107 |
+
| GPT-4o (11-20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 |
|
108 |
+
| Claude-3.5-Sonnet (10-22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 |
|
109 |
+
| Deepseek-V3 | 48.7 | - | - | - | - | - |
|
110 |
+
| Qwen2.5-72B-Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | **44.4** |
|
111 |
+
| **MiniMax-Text-01** | **52.9** | **60.9** | **47.9** | **58.9** | **52.6** | 43.5 |
|
112 |
+
|
113 |
+
#### MTOB
|
114 |
+
| **Context Type** | **no context** | **half book** | **full book** | **Δ half book** | **Δ full book** |
|
115 |
+
|------------------|----------------|---------------|---------------|------------------|-----------------|
|
116 |
+
| **eng → kalam (ChrF)** | | | | | |
|
117 |
+
| GPT-4o (11-20) | 9.90 | **54.30** | - | 44.40 | - |
|
118 |
+
| Claude-3.5-Sonnet (10-22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 |
|
119 |
+
| Gemini-1.5-Pro (002) | 16.79 | 53.68 | **57.90** | 36.89 | 41.11 |
|
120 |
+
| Gemini-2.0-Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 |
|
121 |
+
| Qwen-Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 |
|
122 |
+
| **MiniMax-Text-01** | 6.0 | 51.74 | 51.60 | **45.7** | **45.6** |
|
123 |
+
| **kalam → eng (BLEURT)** | | | | | |
|
124 |
+
| GPT-4o (11-20) | 33.20 | 58.30 | - | 25.10 | - |
|
125 |
+
| Claude-3.5-Sonnet (10-22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 |
|
126 |
+
| Gemini-1.5-Pro (002) | 32.02 | **61.52** | **63.09** | **29.50** | **31.07** |
|
127 |
+
| Gemini-2.0-Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 |
|
128 |
+
| Qwen-Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 |
|
129 |
+
| **MiniMax-Text-01** | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 |
|
130 |
+
|
131 |
+
|
132 |
+
## 4. Quickstart
|
133 |
+
Here we provide a simple example of loading the tokenizer and model to generate content.
|
134 |
+
```python
|
135 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig
|
136 |
+
|
137 |
+
# load hf config
|
138 |
+
hf_config = AutoConfig.from_pretrained("MiniMax-Text-01", trust_remote_code=True)
|
139 |
+
|
140 |
+
# quantization config, int8 is recommended
|
141 |
+
quantization_config = QuantoConfig(
|
142 |
+
weights="int8",
|
143 |
+
modules_to_not_convert=[
|
144 |
+
"lm_head",
|
145 |
+
"embed_tokens",
|
146 |
+
] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
|
147 |
+
+ [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
|
148 |
+
)
|
149 |
+
|
150 |
+
# set device map
|
151 |
+
device_map = {
|
152 |
+
'model.embed_tokens': 'cuda:0',
|
153 |
+
'model.norm': f'cuda:{world_size - 1}',
|
154 |
+
'lm_head': f'cuda:{world_size - 1}'
|
155 |
+
}
|
156 |
+
# assume 8 GPUs
|
157 |
+
world_size = 8
|
158 |
+
layers_per_device = hf_config.num_hidden_layers // world_size
|
159 |
+
for i in range(world_size):
|
160 |
+
for j in range(layers_per_device):
|
161 |
+
device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'
|
162 |
+
|
163 |
+
# load tokenizer
|
164 |
+
tokenizer = AutoTokenizer.from_pretrained("MiniMax-Text-01")
|
165 |
+
prompt = "Hello!"
|
166 |
+
messages = [
|
167 |
+
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
|
168 |
+
{"role": "user", "content": [{"type": "text", "text": prompt}]},
|
169 |
+
]
|
170 |
+
text = tokenizer.apply_chat_template(
|
171 |
+
messages,
|
172 |
+
tokenize=False,
|
173 |
+
add_generation_prompt=True
|
174 |
+
)
|
175 |
+
# tokenize and move to device
|
176 |
+
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
|
177 |
+
|
178 |
+
# load bfloat16 model, move to device, and apply quantization
|
179 |
+
quantized_model = AutoModelForCausalLM.from_pretrained(
|
180 |
+
"MiniMax-Text-01",
|
181 |
+
torch_dtype="bfloat16",
|
182 |
+
device_map=device_map,
|
183 |
+
quantization_config=quantization_config,
|
184 |
+
trust_remote_code=True,
|
185 |
+
offload_buffers=True,
|
186 |
+
)
|
187 |
+
|
188 |
+
# generate response
|
189 |
+
generation_config = GenerationConfig(
|
190 |
+
max_new_tokens=20,
|
191 |
+
eos_token_id=200020,
|
192 |
+
use_cache=True,
|
193 |
+
)
|
194 |
+
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
|
195 |
+
print(f"generated_ids: {generated_ids}")
|
196 |
+
generated_ids = [
|
197 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
198 |
+
]
|
199 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
200 |
+
```
|
201 |
+
|
202 |
+
## 5. Chatbot & API
|
203 |
+
For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://intl.minimaxi.com) for developers.
|
204 |
+
|
205 |
+
Contact us at [[email protected]](mailto:[email protected]).
|
config.json
ADDED
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"MiniMaxText01ForCausalLM"
|
4 |
+
],
|
5 |
+
"attention_dropout": 0.0,
|
6 |
+
"attn_type_list": [
|
7 |
+
0,
|
8 |
+
0,
|
9 |
+
0,
|
10 |
+
0,
|
11 |
+
0,
|
12 |
+
0,
|
13 |
+
0,
|
14 |
+
1,
|
15 |
+
0,
|
16 |
+
0,
|
17 |
+
0,
|
18 |
+
0,
|
19 |
+
0,
|
20 |
+
0,
|
21 |
+
0,
|
22 |
+
1,
|
23 |
+
0,
|
24 |
+
0,
|
25 |
+
0,
|
26 |
+
0,
|
27 |
+
0,
|
28 |
+
0,
|
29 |
+
0,
|
30 |
+
1,
|
31 |
+
0,
|
32 |
+
0,
|
33 |
+
0,
|
34 |
+
0,
|
35 |
+
0,
|
36 |
+
0,
|
37 |
+
0,
|
38 |
+
1,
|
39 |
+
0,
|
40 |
+
0,
|
41 |
+
0,
|
42 |
+
0,
|
43 |
+
0,
|
44 |
+
0,
|
45 |
+
0,
|
46 |
+
1,
|
47 |
+
0,
|
48 |
+
0,
|
49 |
+
0,
|
50 |
+
0,
|
51 |
+
0,
|
52 |
+
0,
|
53 |
+
0,
|
54 |
+
1,
|
55 |
+
0,
|
56 |
+
0,
|
57 |
+
0,
|
58 |
+
0,
|
59 |
+
0,
|
60 |
+
0,
|
61 |
+
0,
|
62 |
+
1,
|
63 |
+
0,
|
64 |
+
0,
|
65 |
+
0,
|
66 |
+
0,
|
67 |
+
0,
|
68 |
+
0,
|
69 |
+
0,
|
70 |
+
1,
|
71 |
+
0,
|
72 |
+
0,
|
73 |
+
0,
|
74 |
+
0,
|
75 |
+
0,
|
76 |
+
0,
|
77 |
+
0,
|
78 |
+
1,
|
79 |
+
0,
|
80 |
+
0,
|
81 |
+
0,
|
82 |
+
0,
|
83 |
+
0,
|
84 |
+
0,
|
85 |
+
0,
|
86 |
+
1
|
87 |
+
],
|
88 |
+
"auto_map": {
|
89 |
+
"AutoConfig": "configuration_minimax_text_01.MiniMaxText01Config",
|
90 |
+
"AutoModelForCausalLM": "modeling_minimax_text_01.MiniMaxText01ForCausalLM"
|
91 |
+
},
|
92 |
+
"bos_token_id": null,
|
93 |
+
"eos_token_id": null,
|
94 |
+
"head_dim": 128,
|
95 |
+
"hidden_act": "silu",
|
96 |
+
"hidden_size": 6144,
|
97 |
+
"initializer_range": 0.02,
|
98 |
+
"intermediate_size": 9216,
|
99 |
+
"layernorm_full_attention_alpha": 3.5565588200778455,
|
100 |
+
"layernorm_full_attention_beta": 1.0,
|
101 |
+
"layernorm_linear_attention_alpha": 3.5565588200778455,
|
102 |
+
"layernorm_linear_attention_beta": 1.0,
|
103 |
+
"layernorm_mlp_alpha": 3.5565588200778455,
|
104 |
+
"layernorm_mlp_beta": 1.0,
|
105 |
+
"max_position_embeddings": 10240000,
|
106 |
+
"model_type": "minimax_text_01",
|
107 |
+
"num_attention_heads": 64,
|
108 |
+
"num_experts_per_tok": 2,
|
109 |
+
"num_hidden_layers": 80,
|
110 |
+
"num_key_value_heads": 8,
|
111 |
+
"num_local_experts": 32,
|
112 |
+
"output_router_logits": false,
|
113 |
+
"postnorm": true,
|
114 |
+
"rms_norm_eps": 1e-05,
|
115 |
+
"rope_theta": 10000000,
|
116 |
+
"rotary_dim": 64,
|
117 |
+
"router_aux_loss_coef": 0.001,
|
118 |
+
"router_jitter_noise": 0.0,
|
119 |
+
"shared_intermediate_size": 0,
|
120 |
+
"shared_moe_mode": "sigmoid",
|
121 |
+
"sliding_window": null,
|
122 |
+
"tie_word_embeddings": false,
|
123 |
+
"transformers_version": "4.45.2",
|
124 |
+
"use_cache": true,
|
125 |
+
"vocab_size": 200064
|
126 |
+
}
|
configuration_minimax_text_01.py
ADDED
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
""" MiniMaxText01 model configuration"""
|
2 |
+
|
3 |
+
from transformers.configuration_utils import PretrainedConfig
|
4 |
+
from transformers.utils import logging
|
5 |
+
|
6 |
+
|
7 |
+
logger = logging.get_logger(__name__)
|
8 |
+
|
9 |
+
|
10 |
+
class MiniMaxText01Config(PretrainedConfig):
|
11 |
+
r"""
|
12 |
+
This is the configuration class to store the configuration of a [`MiniMaxText01Model`]. It is used to instantiate an
|
13 |
+
MiniMaxText01 model according to the specified arguments, defining the model architecture. Instantiating a configuration
|
14 |
+
with the defaults will yield a similar configuration to that of the MiniMaxText01.
|
15 |
+
|
16 |
+
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
|
17 |
+
documentation from [`PretrainedConfig`] for more information.
|
18 |
+
|
19 |
+
|
20 |
+
Args:
|
21 |
+
vocab_size (`int`, *optional*, defaults to 32000):
|
22 |
+
Vocabulary size of the MiniMaxText01 model. Defines the number of different tokens that can be represented by the
|
23 |
+
`inputs_ids` passed when calling [`MiniMaxText01Model`]
|
24 |
+
hidden_size (`int`, *optional*, defaults to 4096):
|
25 |
+
Dimension of the hidden representations.
|
26 |
+
intermediate_size (`int`, *optional*, defaults to 14336):
|
27 |
+
Dimension of the MLP representations.
|
28 |
+
num_hidden_layers (`int`, *optional*, defaults to 32):
|
29 |
+
Number of hidden layers in the Transformer encoder.
|
30 |
+
num_attention_heads (`int`, *optional*, defaults to 32):
|
31 |
+
Number of attention heads for each attention layer in the Transformer encoder.
|
32 |
+
num_key_value_heads (`int`, *optional*, defaults to 8):
|
33 |
+
This is the number of key_value heads that should be used to implement Grouped Query Attention. If
|
34 |
+
`num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
|
35 |
+
`num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
|
36 |
+
converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
|
37 |
+
by meanpooling all the original heads within that group. For more details checkout [this
|
38 |
+
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `8`.
|
39 |
+
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
|
40 |
+
The non-linear activation function (function or string) in the decoder.
|
41 |
+
max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
|
42 |
+
The maximum sequence length that this model might ever be used with. MiniMaxText01's sliding window attention
|
43 |
+
allows sequence of up to 4096*32 tokens.
|
44 |
+
initializer_range (`float`, *optional*, defaults to 0.02):
|
45 |
+
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
46 |
+
rms_norm_eps (`float`, *optional*, defaults to 1e-05):
|
47 |
+
The epsilon used by the rms normalization layers.
|
48 |
+
use_cache (`bool`, *optional*, defaults to `True`):
|
49 |
+
Whether or not the model should return the last key/values attentions (not used by all models). Only
|
50 |
+
relevant if `config.is_decoder=True`.
|
51 |
+
pad_token_id (`int`, *optional*):
|
52 |
+
The id of the padding token.
|
53 |
+
bos_token_id (`int`, *optional*, defaults to 1):
|
54 |
+
The id of the "beginning-of-sequence" token.
|
55 |
+
eos_token_id (`int`, *optional*, defaults to 2):
|
56 |
+
The id of the "end-of-sequence" token.
|
57 |
+
tie_word_embeddings (`bool`, *optional*, defaults to `False`):
|
58 |
+
Whether the model's input and output word embeddings should be tied.
|
59 |
+
rope_theta (`float`, *optional*, defaults to 1000000.0):
|
60 |
+
The base period of the RoPE embeddings.
|
61 |
+
sliding_window (`int`, *optional*):
|
62 |
+
Sliding window attention window size. If not specified, will default to `4096`.
|
63 |
+
attention_dropout (`float`, *optional*, defaults to 0.0):
|
64 |
+
The dropout ratio for the attention probabilities.
|
65 |
+
num_experts_per_tok (`int`, *optional*, defaults to 2):
|
66 |
+
The number of experts to route per-token, can be also interpreted as the `top-k` routing
|
67 |
+
parameter
|
68 |
+
num_local_experts (`int`, *optional*, defaults to 8):
|
69 |
+
Number of experts per Sparse MLP layer.
|
70 |
+
output_router_logits (`bool`, *optional*, defaults to `False`):
|
71 |
+
Whether or not the router logits should be returned by the model. Enabeling this will also
|
72 |
+
allow the model to output the auxiliary loss. See [here]() for more details
|
73 |
+
router_aux_loss_coef (`float`, *optional*, defaults to 0.001):
|
74 |
+
The aux loss factor for the total loss.
|
75 |
+
router_jitter_noise (`float`, *optional*, defaults to 0.0):
|
76 |
+
Amount of noise to add to the router.
|
77 |
+
|
78 |
+
```python
|
79 |
+
>>> from transformers import MiniMaxText01Model, MiniMaxText01Config
|
80 |
+
|
81 |
+
>>> # Initializing a MiniMaxText01 style configuration
|
82 |
+
>>> configuration = MiniMaxText01Config()
|
83 |
+
|
84 |
+
>>> # Initializing a model from the MiniMaxText01 style configuration
|
85 |
+
>>> model = MiniMaxText01Model(configuration)
|
86 |
+
|
87 |
+
>>> # Accessing the model configuration
|
88 |
+
>>> configuration = model.config
|
89 |
+
```"""
|
90 |
+
|
91 |
+
model_type = "MiniMaxText01"
|
92 |
+
keys_to_ignore_at_inference = ["past_key_values"]
|
93 |
+
|
94 |
+
def __init__(
|
95 |
+
self,
|
96 |
+
vocab_size=32000,
|
97 |
+
hidden_size=4096,
|
98 |
+
intermediate_size=14336,
|
99 |
+
num_hidden_layers=32,
|
100 |
+
num_attention_heads=32,
|
101 |
+
num_key_value_heads=8,
|
102 |
+
hidden_act="silu",
|
103 |
+
max_position_embeddings=4096 * 32,
|
104 |
+
initializer_range=0.02,
|
105 |
+
rms_norm_eps=1e-5,
|
106 |
+
use_cache=True,
|
107 |
+
pad_token_id=None,
|
108 |
+
bos_token_id=None,
|
109 |
+
eos_token_id=None,
|
110 |
+
tie_word_embeddings=False,
|
111 |
+
rope_theta=1e6,
|
112 |
+
sliding_window=None,
|
113 |
+
attention_dropout=0.0,
|
114 |
+
num_experts_per_tok=2,
|
115 |
+
num_local_experts=8,
|
116 |
+
output_router_logits=False,
|
117 |
+
router_aux_loss_coef=0.001,
|
118 |
+
router_jitter_noise=0.0,
|
119 |
+
**kwargs,
|
120 |
+
):
|
121 |
+
self.vocab_size = vocab_size
|
122 |
+
self.max_position_embeddings = max_position_embeddings
|
123 |
+
self.hidden_size = hidden_size
|
124 |
+
self.intermediate_size = intermediate_size
|
125 |
+
self.num_hidden_layers = num_hidden_layers
|
126 |
+
self.num_attention_heads = num_attention_heads
|
127 |
+
self.sliding_window = sliding_window
|
128 |
+
|
129 |
+
# for backward compatibility
|
130 |
+
if num_key_value_heads is None:
|
131 |
+
num_key_value_heads = num_attention_heads
|
132 |
+
|
133 |
+
self.num_key_value_heads = num_key_value_heads
|
134 |
+
self.hidden_act = hidden_act
|
135 |
+
self.initializer_range = initializer_range
|
136 |
+
self.rms_norm_eps = rms_norm_eps
|
137 |
+
self.use_cache = use_cache
|
138 |
+
self.rope_theta = rope_theta
|
139 |
+
self.attention_dropout = attention_dropout
|
140 |
+
|
141 |
+
self.num_experts_per_tok = num_experts_per_tok
|
142 |
+
self.num_local_experts = num_local_experts
|
143 |
+
self.output_router_logits = output_router_logits
|
144 |
+
self.router_aux_loss_coef = router_aux_loss_coef
|
145 |
+
self.router_jitter_noise = router_jitter_noise
|
146 |
+
super().__init__(
|
147 |
+
pad_token_id=pad_token_id,
|
148 |
+
bos_token_id=bos_token_id,
|
149 |
+
eos_token_id=eos_token_id,
|
150 |
+
tie_word_embeddings=tie_word_embeddings,
|
151 |
+
**kwargs,
|
152 |
+
)
|
figures/MiniMaxLogo.png
ADDED
figures/TextBench.png
ADDED
figures/VisionBench.png
ADDED
figures/hailuo.svg
ADDED
figures/image.jpg
ADDED
figures/minimax.svg
ADDED
figures/niah.png
ADDED
Git LFS Details
|
main.py
ADDED
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig
|
2 |
+
import torch
|
3 |
+
import argparse
|
4 |
+
|
5 |
+
"""
|
6 |
+
usage:
|
7 |
+
export SAFETENSORS_FAST_GPU=1
|
8 |
+
python main.py --quant_type int8 --world_size 8 --model_id <model_path>
|
9 |
+
"""
|
10 |
+
|
11 |
+
def generate_quanto_config(hf_config: AutoConfig, quant_type: str):
|
12 |
+
QUANT_TYPE_MAP = {
|
13 |
+
"default": None,
|
14 |
+
"int8": QuantoConfig(
|
15 |
+
weights="int8",
|
16 |
+
modules_to_not_convert=[
|
17 |
+
"lm_head",
|
18 |
+
"embed_tokens",
|
19 |
+
] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
|
20 |
+
+ [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
|
21 |
+
),
|
22 |
+
}
|
23 |
+
return QUANT_TYPE_MAP[quant_type]
|
24 |
+
|
25 |
+
|
26 |
+
def parse_args():
|
27 |
+
parser = argparse.ArgumentParser()
|
28 |
+
parser.add_argument("--quant_type", type=str, default="default", choices=["default", "int8"])
|
29 |
+
parser.add_argument("--model_id", type=str, required=True)
|
30 |
+
parser.add_argument("--world_size", type=int, required=True)
|
31 |
+
return parser.parse_args()
|
32 |
+
|
33 |
+
|
34 |
+
def check_params(args, hf_config: AutoConfig):
|
35 |
+
if args.quant_type == "int8":
|
36 |
+
assert args.world_size >= 8, "int8 weight-only quantization requires at least 8 GPUs"
|
37 |
+
|
38 |
+
assert hf_config.num_hidden_layers % args.world_size == 0, f"num_hidden_layers({hf_config.num_hidden_layers}) must be divisible by world_size({args.world_size})"
|
39 |
+
|
40 |
+
|
41 |
+
@torch.no_grad()
|
42 |
+
def main():
|
43 |
+
args = parse_args()
|
44 |
+
print("\n=============== Argument ===============")
|
45 |
+
for key in vars(args):
|
46 |
+
print(f"{key}: {vars(args)[key]}")
|
47 |
+
print("========================================")
|
48 |
+
|
49 |
+
model_id = args.model_id
|
50 |
+
|
51 |
+
hf_config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
|
52 |
+
check_params(args, hf_config)
|
53 |
+
quantization_config = generate_quanto_config(hf_config, args.quant_type)
|
54 |
+
|
55 |
+
device_map = {
|
56 |
+
'model.embed_tokens': 'cuda:0',
|
57 |
+
'model.norm': f'cuda:{args.world_size - 1}',
|
58 |
+
'lm_head': f'cuda:{args.world_size - 1}'
|
59 |
+
}
|
60 |
+
layers_per_device = hf_config.num_hidden_layers // args.world_size
|
61 |
+
for i in range(args.world_size):
|
62 |
+
for j in range(layers_per_device):
|
63 |
+
device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'
|
64 |
+
|
65 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
66 |
+
prompt = "Hello!"
|
67 |
+
messages = [
|
68 |
+
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by Minimax based on MiniMax-Text-01 model."}]},
|
69 |
+
{"role": "user", "content": [{"type": "text", "text": prompt}]},
|
70 |
+
]
|
71 |
+
text = tokenizer.apply_chat_template(
|
72 |
+
messages,
|
73 |
+
tokenize=False,
|
74 |
+
add_generation_prompt=True
|
75 |
+
)
|
76 |
+
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
|
77 |
+
quantized_model = AutoModelForCausalLM.from_pretrained(
|
78 |
+
model_id,
|
79 |
+
torch_dtype="bfloat16",
|
80 |
+
device_map=device_map,
|
81 |
+
quantization_config=quantization_config,
|
82 |
+
trust_remote_code=True,
|
83 |
+
offload_buffers=True,
|
84 |
+
)
|
85 |
+
generation_config = GenerationConfig(
|
86 |
+
max_new_tokens=20,
|
87 |
+
eos_token_id=200020,
|
88 |
+
use_cache=True,
|
89 |
+
)
|
90 |
+
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
|
91 |
+
print(f"generated_ids: {generated_ids}")
|
92 |
+
generated_ids = [
|
93 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
94 |
+
]
|
95 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
96 |
+
print(response)
|
97 |
+
|
98 |
+
if __name__ == "__main__":
|
99 |
+
main()
|
100 |
+
|
merges.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
model-00000-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:45b85926994c2ee092c2319158471ed9262d146bd5973c442e135dae9e21624d
|
3 |
+
size 4916773000
|
model-00001-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2cdaea944f170b60d206e41a80accbfcf5b9c74744f014a819c30f45cb9a9130
|
3 |
+
size 2191113152
|
model-00002-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0f15b83f1afd5c5da8853b7d9bd2c9814dbcb2de7a2f1a24765e16aa7a310d82
|
3 |
+
size 2330307784
|
model-00003-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:83f66465aca90c07b247e218950c1d03594dc08c7170ad2e1a8aaa82b26612fe
|
3 |
+
size 2254810656
|
model-00004-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f7ddaa7380c6dc2cc109b209bfb8450f56bd88f03bcba19f9f0735d24d65a1ed
|
3 |
+
size 2116402376
|
model-00005-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:99060d0311ad70c53858396b8fbf2a24e2476b965e809335aac99eb3a536c685
|
3 |
+
size 2103016184
|
model-00006-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4785d81a7df999ea5ffd7688ddc2228c607c6b01173e6e53cbe718fa2da6f8b2
|
3 |
+
size 2254810688
|
model-00007-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1306d2e054fd20b6900b6d50965ee12257116f5967747a6cfe6dcfd5d8d4f8cd
|
3 |
+
size 2116402392
|
model-00008-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:afb8c7602baf0a815e8ad02cbecfb80a9ae5d29f9ce10a578762544a5de1c0cb
|
3 |
+
size 2202839784
|
model-00009-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a1d8a6c70f49e8adaf8dafd15c8235be9331ffb975f2dce5ffe986cfca770598
|
3 |
+
size 2151680440
|
model-00010-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f94ec16a2f79adf7fa7eb9ba583fe5441146a8ee9405eb3a9e4ba62ce58b0016
|
3 |
+
size 2264926800
|
model-00011-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8f4bd5579bb8314355e60ab945da0d43332914854a6877e1c7e29c0796bdcc45
|
3 |
+
size 2151680448
|
model-00012-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9ad1363efef21cc1eff543eb1e18cdc35b706925468b31746589dd5520b27c44
|
3 |
+
size 2264926776
|
model-00013-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d4ed9d125a753491a1a52257e4c3313c81e4b23ceec1ea92060f21fda6506d33
|
3 |
+
size 2151680456
|
model-00014-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3874e30152a24167cffb45a368d644e848b39191a18250c8baa372badf1404fa
|
3 |
+
size 2264926792
|
model-00015-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:33fa7b68a1ca03cd36dc34b43098a4342cb90b6ab29a4ed81bbc5fd22ff3206b
|
3 |
+
size 2151680440
|
model-00016-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:52a2cdc9917920ae2e678dd8c8f5118bd269310dc24e09a9795def0bfc6db289
|
3 |
+
size 2264926792
|
model-00017-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:966bb346a62cb5bcaf692cae407fb53e3ba58a6a7798a5012e7979110d44debb
|
3 |
+
size 2151680456
|
model-00018-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6acadaecce036887bbf3d9e80bd9765f3f63ed147b0318c47dc151c484ee5dea
|
3 |
+
size 2264926776
|
model-00019-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d45b4bfc0d98ffea2bbfa70672199845c424645139170705d1a615cb4b32bedf
|
3 |
+
size 2151680456
|
model-00020-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b4d89e6f890817b4eaacbb61d447ea9741a1ae1750bb6b555f49f97eb53eed3a
|
3 |
+
size 2264926792
|
model-00021-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:989d3448acab544b95096da7cb3fa2530c658c54eb9a067ef0f9735d0fed98e9
|
3 |
+
size 2151680440
|
model-00022-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a4079b9211347c16ab94cf92b3be67b2506f7fd22924feeb886c511878ada34a
|
3 |
+
size 2264926792
|
model-00023-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fa5667b97d73be5c951a84fd5141d7e5098854e386b08de3ad2d4a26619121d0
|
3 |
+
size 2151680456
|
model-00024-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d1b4cfc292b3496550591242fa27d8853f48f26952a68a62264559e2a8a7026c
|
3 |
+
size 2264926776
|
model-00025-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0084a9a0f19e13b06b7fd6c2ab77212ecabc1237991715a4cc0a8b3760bab059
|
3 |
+
size 2151680456
|
model-00026-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ec99d07905384e93ff617910268fd614403273bb72a0e9a66e23477609404d39
|
3 |
+
size 2264926800
|
model-00027-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:cf1ae16fe8da6e760ccdcbff16b81047460783c12a97e35572db64b705832b36
|
3 |
+
size 2151680440
|
model-00028-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ce157751c2a5278886267e41e2ad71a8fdc7bdd5f6fa798c939234992dd11ab2
|
3 |
+
size 2264926792
|
model-00029-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7683f74793f627034ee64af8bac8fdd15ebc0e50127cc9779668091ca3410398
|
3 |
+
size 2151680456
|
model-00030-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0a713332d8255b32f06f8ca0ff73b6a847c2e788f3086d26e921aa927b1630b2
|
3 |
+
size 2264926776
|
model-00031-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4ddc6b163c1970bce84c2c78a34e426873dadda5b5ef1d574e044f914ffae47d
|
3 |
+
size 2151680448
|
model-00032-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:57f494acdd9fcdb5338f90b471913df350612bba845ef2c8de766b77c95104a2
|
3 |
+
size 2264926800
|
model-00033-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fc35a4d97876ed9cd70ab29ba5c7b486e7c94d38c6caa944c01c8d3aeccfc841
|
3 |
+
size 2151680440
|
model-00034-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6a2fd3f245a02e4ac7e398d6431129ab8eb7a1a2650a5ece70415ad03a3d8819
|
3 |
+
size 2264926784
|
model-00035-of-00413.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:bc3cd1085b6b2bd1e31e9722248130bfeb7ff9525a48f0bec08f86b62db752d5
|
3 |
+
size 2151680456
|