Unlike previous models we've uploaded, this one is the best one we've published! Answers in two steps: Reasoning -> Final answer like chatgpt o1 mini, gemini flash 2.0 thinking experimental. This model is our new flagship.

πŸ§€ Which quant is right for you? (all tested!)

  • Q3: This quant should be used on most high-end devices like rtx 2080TI's, Responses are very high quality, but its slightly slower than Q4. (Runs at ~1 tokens per second or less on a Samsung z fold 5 smartphone.)
  • Q4: This quant should be used on high-end modern devices like rtx 3080's or any GPU,TPU,CPU that is powerful enough and has at minimum 15gb of available memory, (On servers and high-end computers we personally use it.) reccomened.
  • Q8: This quant should be used on very high-end modern devices which can handle it's power, it is very powerful but q4 is more well rounded, not reccomened.

πŸ§€ Information

  • ⚠️ A low temperature must be used to ensure it won't fail at reasoning. we use 0.3 - 0.8!
  • ⚠️ Due to the current prompt format, it may sometimes put <|FinalAnswer|> at the end, you can ignore this or modify the prompt format.
  • this is out flagship model, with top-tier reasoning, rivaling gemini-flash-exp-2.0-thinking and o1 mini. results are overall similar to both of them, we are not comparing to qwq as it has much longer results which waste tokens.

the model uses this prompt format: (modified phi-4 prompt)

{{ if .System }}<|system|>
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|im_end|>
{{ end }}<|assistant|>{{ .CoT }}<|CoT|>
{{ .Response }}<|FinalAnswer|><|im_end|>

πŸ§€ Examples:

(q4_k_m, 10GB rtx 3080, 64GB memory, running inside of MSTY, all use "You are a friendly ai assistant." as the System prompt.) example 1: example1part1.png example1part2.png

example 2: example2

πŸ§€ Uploaded model

  • Developed by: Pinkstack
  • License: MIT
  • Finetuned from model : Pinkstack/PARM-V1-phi-4-4k-CoT-pytorch

This Phi-4 model was trained with Unsloth and Huggingface's TRL library.

Downloads last month
0
GGUF
Model size
14.7B params
Architecture
llama

3-bit

4-bit

8-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Pinkstack/PARM-V2-phi-4-16k-CoT-o1-gguf

Quantized
(1)
this model

Collection including Pinkstack/PARM-V2-phi-4-16k-CoT-o1-gguf