Florent Daudens's picture

Florent Daudens

fdaudens

AI & ML interests

AI & Journalism

Recent Activity

Articles

Organizations

Hugging Face's profile picture Hugging Face OSS Metrics's profile picture ZeroGPU Explorers's profile picture LeRobot's profile picture Journalists on Hugging Face's profile picture Major TOM's profile picture MLX Community's profile picture Social Post Explorers's profile picture Projet Spinoza's profile picture Dev Mode Explorers's profile picture Hugging Face for Legal's profile picture Hugging Face Discord Community's profile picture Big Science Social Impact Evaluation for Bias and Stereotypes's profile picture Dataset Tools's profile picture Hugging Face Science's profile picture Data Is Better Together Contributor's profile picture

fdaudens's activity

reacted to AdinaY's post with πŸ”₯ about 20 hours ago
posted an update 1 day ago
view post
Post
1387
πŸ”₯ The AI Agent hype is real! This blog post deep dives into everything you need to know before deploying them: from key definitions to practical recommendations. A must-read for anyone building the future of autonomous systems.

πŸ“Š Key insight: A clear table breaking down the 5 levels of AI agents - from simple processors to fully autonomous systems. Essential framework for understanding where your agent stands on the autonomy spectrum

βš–οΈ Deep analysis of 15 core values reveals critical trade-offs: accuracy, privacy, safety, equity & more. The same features that make agents powerful can make them risky. Understanding these trade-offs is crucial for responsible deployment

🎯 6 key recommendations for the road ahead:
- Create rigorous evaluation protocols
- Study societal effects
- Understand ripple effects
- Improve transparency
- Open source can make a positive difference
- Monitor base model evolution

Read the blog post: https://huggingface.co/blog/ethics-soc-7 Brillant work by @meg @evijit @sasha @giadap
reacted to MoritzLaurer's post with ❀️ 3 days ago
view post
Post
2809
FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!

πŸ“ The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.

πŸ€– Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.

πŸ§ͺ The authors tested different prompt templates on held-out data to ensure their generalization.

πŸ“š It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.

πŸ’Ύ You can now download and reuse these prompt templates via the prompt-templates library!

πŸ”„ The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Let’s make LLM work more transparent and reproducible by sharing more templates like this!

Links πŸ‘‡
- prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
- all templates on the HF Hub: MoritzLaurer/facts-grounding-prompts
- FACTS paper: https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf
posted an update 27 days ago
view post
Post
1335
πŸ” From instruction-following to creative storytelling, dive into 2024's most impactful AI datasets! These gems are shaping everything from scientific research to video understanding.

Check it out: huggingface/open-source-ai-year-in-review-2024
posted an update 28 days ago
view post
Post
1277
🀝 Want to share your AI models while protecting your work? Licenses are key!

Fascinating to see that nearly 60% of models on the Hub use Apache & MIT licenses.

Explore the viz here: huggingface/open-source-ai-year-in-review-2024
posted an update 29 days ago
view post
Post
1329
Did a fun experiment: What are the main themes emerging from the 100+ Nieman Journalism Lab predictions for 2025?

I used natural language processing to cluster and map them β€” really helps spot patterns that weren't obvious when reading predictions one by one. So what will shape journalism next year? A lot of AI and US politics (surprise!), but there's also this horizontal axis that spans from industry strategies to deep reflections on how to talk to the public.

Click any dot to explore the original prediction. What themes surprise/interest you the most?

πŸ‘‰ fdaudens/nieman_lab_2025_predictions_visualization

P.s.: I discovered that Nieman Lab's content is under Creative Commons license!
reacted to lewtun's post with πŸ”₯ 30 days ago
view post
Post
6725
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute πŸ”₯

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

πŸ“ˆ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

πŸŽ„ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
Β·
posted an update about 1 month ago
reacted to yjernite's post with ❀️ about 1 month ago
view post
Post
2117
πŸ‡ͺπŸ‡Ί Policy Thoughts in the EU AI Act Implementation πŸ‡ͺπŸ‡Ί

There is a lot to like in the first draft of the EU GPAI Code of Practice, especially as regards transparency requirements. The Systemic Risks part, on the other hand, is concerning for both smaller developers and for external stakeholders.

I wrote more on this topic ahead of the next draft. TLDR: more attention to immediate large-scale risks and to collaborative solutions supported by evidence can help everyone - as long as developers disclose sufficient information about their design choices and deployment contexts.

Full blog here, based on our submitted response with @frimelle and @brunatrevelin :

https://huggingface.co/blog/yjernite/eu-draft-cop-risks#on-the-proposed-taxonomy-of-systemic-risks
  • 2 replies
Β·
reacted to Kseniase's post with πŸ”₯ about 1 month ago
view post
Post
2832
TL;DR: The Story of Attention's Development by @karpathy

Origin: First proposed in 2014 by @Dzmitry Bahdanau, @KyunghyunCho , and Yoshua Bengio in Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473) . Inspired by cognitive processes and later renamed from "RNNSearch."

Key Idea: A data-dependent weighted average for pooling and communication, enabling flexible and powerful neural network connections.

Breakthrough: Bahdanau's "soft search" mechanism (softmax + weighted averaging) solved encoder-decoder bottlenecks in machine translation.
Transformer Revolution: Attention Is All You Need (1706.03762) (2017) by @ashishvaswanigoogle et al. simplified architectures by stacking attention layers, introducing multi-headed attention and positional encodings.
Legacy: Attention replaced RNNs, driving modern AI systems like ChatGPT. It emerged independently but was influenced by contemporaneous work like Alex Graves’s Neural Turing Machines (1410.5401) and Jason Weston’s Memory Networks (1410.3916) .

Attention to history: JΓΌrgen Schmidhuber claims his 1992 Fast Weight Programmers anticipated modern attention mechanisms. While conceptually similar, the term β€œattention” was absent, and there’s no evidence it influenced Bahdanau, Cho, and Bengio’s 2014 work. Paying attention (!) to history might have brought us to genAI earlier – but credit for the breakthrough still goes to Montreal.

Referenced Papers:
Attention Origin: Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Transformers: Attention Is All You Need (1706.03762)
Alex Graves' Work: Neural Turing Machines (1410.5401), Generating Sequences With Recurrent Neural Networks (1308.0850)
Jason Weston @spermwhale 's Memory Networks (1410.3916)
Sequence to Sequence Learning with Neural Networks (1409.3215) by Ilya Sutskever ( @ilyasut ), Oriol Vinyals, Quoc V. Le

Who else deserves recognition in this groundbreaking narrative of innovation? Let’s ensure every contributor gets the credit they deserve. Leave a comment below πŸ‘‡πŸ»πŸ€—
Β·
posted an update about 1 month ago
reacted to thomwolf's post with πŸš€ about 1 month ago
view post
Post
4793
We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of πŸ—£οΈlanguages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

πŸ₯‚ FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive πŸ“œ ODC-By 1.0 license, and the πŸ’» code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a πŸ“ blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi
  • 2 replies
Β·
posted an update about 1 month ago
view post
Post
1391
The viz of the day for the Year in review: Network graph showing likes similarity between models.

Instructive to see which models serve as the "nodes" of the Hub!

Check it out: huggingface/open-source-ai-year-in-review-2024
posted an update about 1 month ago
view post
Post
340
🎯 New day, new viz!

This teaser barely captures the heat between Meta πŸ‡ΊπŸ‡Έ, Stability πŸ‡¬πŸ‡§ & Black Forest Labs πŸ‡©πŸ‡ͺ racing for HF Hub likes. Want to see the full Fast & Furious AI showdown? Check the link below! πŸŽοΈπŸ’¨

huggingface/open-source-ai-year-in-review-2024
posted an update about 1 month ago
view post
Post
1073
πŸ“ˆπŸ‘€ Just dropped: visualization mapping Hugging Face's most liked & downloaded models from 2022 to now. Small models are clearly on the rise - fascinating shift in both likes and download patterns.

Check it out: huggingface/open-source-ai-year-in-review-2024
posted an update about 1 month ago
view post
Post
1757
Keeping up with open-source AI in 2024 = overwhelming.

Here's help: We're launching our Year in Review on what actually matters, starting today!

Fresh content dropping daily until year end. Come along for the ride - first piece out now with @clem 's predictions for 2025.

Think of it as your end-of-year AI chocolate calendar.

Kudos to @BrigitteTousi @clefourrier @Wauplin @thomwolf for making it happen. We teamed up with aiworld.eu for awesome visualizations to make this digestibleβ€”it's a charm to work with their team.

Check it out: huggingface/open-source-ai-year-in-review-2024
reacted to clem's post with πŸš€ about 1 month ago
view post
Post
4534
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):

- There will be the first major public protest related to AI
- A big company will see its market cap divided by two or more because of AI
- At least 100,000 personal AI robots will be pre-ordered
- China will start to lead the AI race (as a consequence of leading the open-source AI race).
- There will be big breakthroughs in AI for biology and chemistry.
- We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.

How my predictions for 2024 turned out:

- A hyped AI company will go bankrupt or get acquired for a ridiculously low price
βœ… (Inflexion, AdeptAI,...)

- Open-source LLMs will reach the level of the best closed-source LLMs
βœ… with QwQ and dozens of others

- Big breakthroughs in AI for video, time-series, biology and chemistry
βœ… for video πŸ”΄for time-series, biology and chemistry

- We will talk much more about the cost (monetary and environmental) of AI
βœ…Monetary πŸ”΄Environmental (😒)

- A popular media will be mostly AI-generated
βœ… with NotebookLM by Google

- 10 millions AI builders on Hugging Face leading to no increase of unemployment
πŸ”œcurrently 7M of AI builders on Hugging Face
Β·
posted an update about 2 months ago
replied to their post about 2 months ago
view reply

I used Descript for the video. How about you?

posted an update about 2 months ago
view post
Post
1057
The rapid progress in small audio models is mind-blowing! 🀯 Just tested OuteTTS v0.2 - cloned my voice from a 10s clip with impressive accuracy and natural prosody.

At 500M parameters, it's efficient enough to run on basic hardware but powerful enough for professional use.

This could transform how we produce audio content for new - think instant translated interviews keeping original voices, or scaled audio article production!

Demo and Model on the Hub: OuteAI/OuteTTS-0.2-500M h/t @reach-vb
  • 3 replies
Β·