amazon-sagemaker-community (Amazon SageMaker Community)

posted an update about 2 hours ago

Post

94

Microsoft's rStar-Math paper claims that 🤏 ~7B models can match the math skills of o1 using clever train- and test-time techniques. You can now download their prompt templates from Hugging Face !

📏 The paper introduces rStar-Math, which claims to rival OpenAI o1's math reasoning capabilities by integrating Monte Carlo Tree Search (MCTS) with step-by-step verified reasoning trajectories.
🤖 A Process Preference Model (PPM) enables fine-grained evaluation of intermediate steps, improving training data quality.
🧪 The system underwent four rounds of self-evolution, progressively refining both the policy and reward models to tackle Olympiad-level math problems—without GPT-4-based data distillation.
💾 While we wait for the release of code and datasets, you can already download the prompts they used from the HF Hub!

Details and links here 👇
Prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
Templates on the hub: MoritzLaurer/rstar-math-prompts
Prompt-templates collection: MoritzLaurer/prompt-templates-6776aa0b0b8a923957920bb4
Paper: https://arxiv.org/pdf/2501.04519

MoritzLaurer

posted an update 4 days ago

Post

2824

FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!

📏 The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.

🤖 Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.

🧪 The authors tested different prompt templates on held-out data to ensure their generalization.

📚 It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.

💾 You can now download and reuse these prompt templates via the prompt-templates library!

🔄 The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Let’s make LLM work more transparent and reproducible by sharing more templates like this!

Links 👇
- prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
- all templates on the HF Hub: MoritzLaurer/facts-grounding-prompts
- FACTS paper: https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf

MoritzLaurer

posted an update 6 days ago

Post

1661

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here 👇
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)

MoritzLaurer

posted an update 8 days ago

Post

2029

OpenAI is losing money on the $200/month subscription 🤯. It's crazy how expensive it is to run these largest LLMs:

- ChatGPT Pro costs $200/month ($2,400/year) and is still unprofitable for OpenAI due to higher-than-expected usage.
- OpenAI reportedly expected losses of about $5 billion on revenue of $3.7 billion last year, with ChatGPT alone once costing an estimated $700,000 per day to operate. 💸🔥
- They build strong models and do great research. Whether this business model will work in the long run is one of the biggest questions in the AI economy today.

Source with the numbers 👇
https://techcrunch.com/2025/01/05/openai-is-losing-money-on-its-pricey-chatgpt-pro-plan-ceo-sam-altman-says/

3 replies

·

jeffboudier

posted an update 8 days ago

Post

504

NVIDIA just announced the Cosmos World Foundation Models, available on the Hub: nvidia/cosmos-6751e884dc10e013a0a0d8e6

Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development.
The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6

Learn more in this great community article by @mingyuliutw and @PranjaliJoshi https://huggingface.co/blog/mingyuliutw/nvidia-cosmos

1 reply

·

MoritzLaurer

posted an update 9 days ago

Post

2178

🚀 Releasing a new zeroshot-classifier based on ModernBERT! Some key takeaways:

- ⚡ Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost as well
- 📉 Performance tradeoff: It performs slightly worse than DeBERTav3 on average across my zeroshot classification task collection
- 🧠 Use cases: I recommend using it for scenarios requiring speed and a larger context window (8k).
- 💡 What’s next? I’m preparing a newer version trained on better + longer synthetic data to fully leverage the 8k context window and improve upon the training mix of my older zeroshot-v2.0 models. I also hope that there will be a multilingual variant in the future.

Great work by https://huggingface.co/answerdotai !

If you’re looking for a high-speed zeroshot classifier, give it a try!

📄 Resources below: 👇
Base model: MoritzLaurer/ModernBERT-base-zeroshot-v2.0
Large model: MoritzLaurer/ModernBERT-large-zeroshot-v2.0
Updated zeroshot collection: MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f
ModernBERT collection with paper: answerdotai/modernbert-67627ad707a4acbf33c41deb

MoritzLaurer

posted an update 26 days ago

Post

2600

Quite excited by the ModernBERT release! 0.15/0.4B small, 2T modern pre-training data and tokenizer with code, 8k context window, great efficient model for embeddings & classification!

This will probably be the basis for many future SOTA encoders! And I can finally stop using DeBERTav3 from 2021 :D

Congrats @answerdotai , @LightOnIO and collaborators like @tomaarsen !

Paper and models here 👇https://huggingface.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb

3 replies

·

MoritzLaurer

posted an update 29 days ago

Post

1177

"Open-source AI: year in review 2024": amazing Space with lots of data-driven insights into AI in 2024! Check it out 👇
huggingface/open-source-ai-year-in-review-2024

2 replies

·

MoritzLaurer

posted an update about 1 month ago

Post

1286

I've been building a small library for working with prompt templates on the HF hub: pip install prompt-templates. Motivation:

The community currently shares prompt templates in a wide variety of formats: in datasets, in model cards, as strings in .py files, as .txt/.yaml/.json/.jinja2 files etc. This makes sharing and working with prompt templates unnecessarily complicated.

Prompt templates are currently the main hyperparameter that people tune when building complex LLM systems or agents. If we don't have a common standard for sharing them, we cannot systematically test and improve our systems. After comparing different community approaches, I think that working with modular .yaml or .json files is the best approach.

The prompt-templates library :
- proposes a standard for sharing prompts (entirely locally or on the HF hub)
- provides some utilities that are interoperable with the broader ecosystem

Try it:

# !pip install prompt-templates
from prompt_templates import PromptTemplateLoader 
prompt_template = PromptTemplateLoader.from_hub(repo_id="MoritzLaurer/closed_system_prompts", filename="claude-3-5-artifacts-leak-210624.yaml")

The library is in early stages, feedback is welcome!

More details in the docs: https://github.com/MoritzLaurer/prompt_templates/

1 reply

·

DrishtiSharma

authored 2 papers about 1 month ago

1-800-SHARED-TASKS at RegNLP: Lexical Reranking of Semantic Retrieval (LeSeR) for Regulatory Question Answering

Paper • 2412.06009 • Published Dec 8, 2024

Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published Dec 10, 2024 • 26

julien-c

posted an update about 1 month ago

Post

8325

After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥

cc: @reach-vb @pierric @victor and the HF team

28 replies

·

DrishtiSharma

authored 3 papers about 1 month ago

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Paper • 2411.19799 • Published Nov 29, 2024 • 11

SeQwen at the Financial Misinformation Detection Challenge Task: Sequential Learning for Claim Verification and Explanation Generation in Financial Domains

Paper • 2412.00549 • Published Nov 30, 2024 • 1

1-800-SHARED-TASKS @ NLU of Devanagari Script Languages: Detection of Language, Hate Speech, and Targets using LLMs

Paper • 2411.06850 • Published Nov 11, 2024 • 3

julien-c

posted an update about 2 months ago

Post

2557

wow 😮

INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.

PrimeIntellect/INTELLECT-1-Instruct

jeffboudier

posted an update about 2 months ago

Post

1019

New - add your bluesky account to your HF profile:
https://huggingface.co/settings/profile

Is the grass greener, the sky bluer? Will try and figure it out at https://bsky.app/profile/jeffboudier.bsky.social

By the way, HF people starter pack https://bsky.app/starter-pack/huggingface.bsky.social/3laz5x7naiz22

w11wo

authored a paper 3 months ago

GenUP: Generative User Profilers as In-Context Learners for Next POI Recommender Systems

Paper • 2410.20643 • Published Oct 28, 2024

DrishtiSharma

authored a paper 3 months ago

M-RewardBench: Evaluating Reward Models in Multilingual Settings

Paper • 2410.15522 • Published Oct 20, 2024 • 11

jeffboudier

posted an update 3 months ago

Post

1091

This week in Inference Endpoints - thx @erikkaum for the update!

👀 https://huggingface.co/blog/erikkaum/endpoints-changelog

1 reply

·

Amazon SageMaker Community

AI & ML interests

Recent Activity

amazon-sagemaker-community's activity

1-800-SHARED-TASKS at RegNLP: Lexical Reranking of Semantic Retrieval (LeSeR) for Regulatory Question Answering

Maya: An Instruction Finetuned Multilingual Multimodal Model

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

SeQwen at the Financial Misinformation Detection Challenge Task: Sequential Learning for Claim Verification and Explanation Generation in Financial Domains

1-800-SHARED-TASKS @ NLU of Devanagari Script Languages: Detection of Language, Hate Speech, and Targets using LLMs

GenUP: Generative User Profilers as In-Context Learners for Next POI Recommender Systems

M-RewardBench: Evaluating Reward Models in Multilingual Settings

AI & ML interests

Recent Activity

Team members 108

amazon-sagemaker-community's activity