Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
26
5
146
Joseph G Flowers
Josephgflowers
Follow
Tonic's profile picture
willbrown433's profile picture
sexyfrad's profile picture
18 followers
Ā·
54 following
joseph-flowers-5020a1231
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 13 hours ago
Transformer^2: Self-adaptive LLMs
reacted
to
davanstrien
's
post
with š„
1 day ago
Introducing scandi-fine-web-cleaner https://huggingface.co/davanstrien/scandi-fine-web-cleaner, the first model trained on FineWeb-C community annotations! FineWeb2 is a massive multilingual dataset for pre-training language models. Like any web-scale dataset, it contains low-quality content. How can we improve it? Over the past months, an amazing community of 400+ annotators has been labelling content quality (using Argilla) across 23 languages through the FineWeb-C initiative. Today, I'm happy to share the first classifier trained on this data. š What we've built: - A lightweight classifier that efficiently removes low-quality content - 90%+ precision demonstrated on Danish & Swedish - Can process the 43M+ documents in Danish FineWeb2 with minimal compute š Why this matters: The approach can be reproduced for any of the 23 languages in FineWeb-C (https://huggingface.co/datasets/data-is-better-together/fineweb-c). We can improve training data quality at scale without massive compute resources by starting with community annotations and training small, efficient classifiers. Want to build a classifier for your language? Check out the full blog post with code examples and implementation details: https://danielvanstrien.xyz/posts/2025/FineWeb-c/scandinavian-content-filtering-fineweb.html
liked
a dataset
1 day ago
mlabonne/smoltalk-semhashed
View all activity
Organizations
Josephgflowers
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a dataset
1 day ago
mlabonne/smoltalk-semhashed
Viewer
ā¢
Updated
1 day ago
ā¢
861k
ā¢
9
ā¢
5
liked
a dataset
7 days ago
sequelbox/Tachibana-QVQ
Viewer
ā¢
Updated
8 days ago
ā¢
103k
ā¢
68
ā¢
3
liked
a dataset
11 days ago
bop-benchmark/datasets
Viewer
ā¢
Updated
Oct 19, 2024
ā¢
2.01M
ā¢
7.03k
ā¢
17
liked
a model
12 days ago
testerMore/textit
Text Classification
ā¢
Updated
13 days ago
ā¢
1
liked
a dataset
13 days ago
sujet-ai/Sujet-Finance-Instruct-177k
Viewer
ā¢
Updated
Apr 5, 2024
ā¢
178k
ā¢
117
ā¢
72
liked
a dataset
17 days ago
Josephgflowers/Phinance
Viewer
ā¢
Updated
16 days ago
ā¢
166k
ā¢
28
ā¢
1
liked
a dataset
18 days ago
fluently-sets/reasoning-1-1k
Viewer
ā¢
Updated
25 days ago
ā¢
1.15k
ā¢
359
ā¢
22
liked
a model
18 days ago
Josephgflowers/Tinyllama-1.5B-Cinder-Test-6
Text Generation
ā¢
Updated
Apr 16, 2024
ā¢
13
ā¢
1
liked
a model
29 days ago
NexaAIDev/OmniAudio-2.6B
Audio-Text-to-Text
ā¢
Updated
Dec 13, 2024
ā¢
9.58k
ā¢
228
liked
a model
about 1 month ago
meditsolutions/SmolLM2-MedIT-Upscale-2B
Updated
Dec 3, 2024
ā¢
128
ā¢
4
liked
3 datasets
about 2 months ago
gretelai/gretel-text-to-python-fintech-en-v1
Viewer
ā¢
Updated
Nov 11, 2024
ā¢
27.5k
ā¢
58
ā¢
13
arcee-ai/EvolKit-20k
Viewer
ā¢
Updated
Sep 10, 2024
ā¢
20k
ā¢
121
ā¢
57
ai2-adapt-dev/test-persona-geometry-10k
Viewer
ā¢
Updated
Oct 12, 2024
ā¢
10k
ā¢
37
ā¢
1
liked
a model
about 2 months ago
nvidia/Hymba-1.5B-Instruct
Text Generation
ā¢
Updated
13 days ago
ā¢
4.83k
ā¢
220
liked
a dataset
about 2 months ago
LDJnr/Pure-Dove
Viewer
ā¢
Updated
Jun 3, 2024
ā¢
3.86k
ā¢
127
ā¢
73
liked
a model
about 2 months ago
Snowflake/snowflake-arctic-instruct
Text Generation
ā¢
Updated
May 21, 2024
ā¢
12.6k
ā¢
351
liked
a dataset
about 2 months ago
layoric/labeled-multiple-choice-explained
Viewer
ā¢
Updated
Jun 26, 2023
ā¢
9.1k
ā¢
75
ā¢
3
liked
a model
about 2 months ago
NexaAIDev/OmniVLM-968M
Updated
30 days ago
ā¢
1.36k
ā¢
495
liked
2 datasets
about 2 months ago
microsoft/orca-agentinstruct-1M-v1
Viewer
ā¢
Updated
Nov 1, 2024
ā¢
1.05M
ā¢
3.62k
ā¢
413
alexshengzhili/Abstract2Appendix_v1_10k
Viewer
ā¢
Updated
Nov 15, 2024
ā¢
9.82k
ā¢
33
ā¢
3
Load more