Unlike previous models we've uploaded, this one is the best one we've published! Answers in two steps: Reasoning -> Final answer like chatgpt o1 mini, gemini flash 2.0 thinking experimental. This model is our new flagship.

🧀 Which quant is right for you? (all tested!)

Developed by: Pinkstack
License: MIT
Finetuned from model: Pinkstack/PARM-V1-phi-4-4k-CoT-pytorch

Q3: This quant should be used on most high-end devices like rtx 2080TI's, Responses are very high quality, but its slightly slower than Q4. (Runs at ~1 tokens per second or less on a Samsung z fold 5 smartphone.)
Q4: This quant should be used on high-end modern devices like rtx 3080's or any GPU,TPU,CPU that is powerful enough and has at minimum 15gb of available memory, (On servers and high-end computers we personally use it.) reccomened.
Q8: This quant should be used on very high-end modern devices which can handle it's power, it is very powerful but q4 is more well rounded, not reccomened.

🧀 Information

⚠️ A low temperature must be used to ensure it won't fail at reasoning. we use 0.3 - 0.8!
⚠️ Due to the current prompt format, it may sometimes put <|FinalAnswer|> at the end, you can ignore this or modify the prompt format.
this is out flagship model, with top-tier reasoning, rivaling gemini-flash-exp-2.0-thinking and o1 mini. results are overall similar to both of them, we are not comparing to qwq as it has much longer results which waste tokens.

the model uses this prompt format: (modified phi-4 prompt)

{{ if .System }}<|system|>
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|im_end|>
{{ end }}<|assistant|>{{ .CoT }}<|CoT|>
{{ .Response }}<|FinalAnswer|><|im_end|>