BEEspoke Data

community
Activity Feed

AI & ML interests

'an LLM is only as good as the dataset it was trained on' - Sun Tzu

Recent Activity

pszemrajย  updated a dataset 3 days ago
BEE-spoke-data/LONGCOT-merged-en
pszemrajย  updated a dataset 3 days ago
BEE-spoke-data/LONGCOT-merged-1M
pszemrajย  updated a dataset 12 days ago
BEE-spoke-data/govdocs1-by-extension
View all activity

BEE-spoke-data's activity

pszemrajย 
updated a Space about 1 month ago
qnguyen3ย 
posted an update 7 months ago
qnguyen3ย 
posted an update 9 months ago
view post
Post
5368
๐ŸŽ‰ Introducing nanoLLaVA, a powerful multimodal AI model that packs the capabilities of a 1B parameter vision language model into just 5GB of VRAM. ๐Ÿš€ This makes it an ideal choice for edge devices, bringing cutting-edge visual understanding and generation to your devices like never before. ๐Ÿ“ฑ๐Ÿ’ป

Model: qnguyen3/nanoLLaVA ๐Ÿ”
Spaces: qnguyen3/nanoLLaVA (thanks to @merve )

Under the hood, nanoLLaVA is based on the powerful vilm/Quyen-SE-v0.1 (my Qwen1.5-0.5B finetune) and Google's impressive google/siglip-so400m-patch14-384. ๐Ÿง  The model is trained using a data-centric approach to ensure optimal performance. ๐Ÿ“Š

In the spirit of transparency and collaboration, all code and model weights are open-sourced under the Apache 2.0 license. ๐Ÿค
  • 1 reply
ยท
huu-ontocordย 
posted an update 10 months ago
view post
Post
1645
We would like to announce our Aurora-M multilingual models which is based on Starcoderplus.
Twitter: https://twitter.com/ontocord/status/1772778544051155029
LinkedIn: https://www.linkedin.com/feed/update/urn:li:activity:7178521998845759488/
Blog post: https://huggingface.co/blog/mayank-mishra/aurora
Arxiv: Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order (2404.00399)

Current LLMs are very susceptible to generating toxic, harmful and even dangerous content. They can also generate outputs with gender or racial biases. The Biden-Harris Executive Order https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence) sets forth guidelines on what is considered a safe AI system.
Following up on these guidelines, we present the world's first open source Biden-Harris Executive Order Red teamed Multilingual Language Model: Aurora-M. Inspired by BigScience, the model is trained on 5 languages: English, Hindi, Japanese, Vietnamese and Finnish.

* Red teamed model: aurora-m/aurora-m-biden-harris-redteamed tuned according to the order mentioned above)
* Base model: aurora-m/aurora-m-base (not safety tuned)
* Instruct model: aurora-m/aurora-m-instruct (not safety tuned)

@mayank-mishra @cabbage972 @sted97 @Xa9aX @Taishi-N324 @Muennighoff @vumichien @prateeky2806 @felfri @spyysalo and many many others!