Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

posted an update 1 day ago

minimal single script implementation of knowledge distillation in LLMs. In this implementation, we use GPT-2 (124M) as student model and GPT-2 Medium (340M) as teacher via reverse Kullback-Leibler (KL) divergence, trained on a small chunk of openwebtext. Code: https://github.com/Jaykef/ai-algorithms/blob/main/llm_knowledge_distillation.ipynb

liked a model 2 days ago

deepseek-ai/DeepSeek-V3

posted an update 6 days ago

Huge AI win in medicine👏 "Large language of life model" just dropped!! Full paper: https://www.nature.com/articles/s41586-024-08391-z

View all activity

Articles

In Honour of This Year's NeurIPs Test of Time Paper Awardees

Dec 10, 2024

• 2

Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

Dec 2, 2024

• 5

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

May 22, 2024

• 2

On Coding Your First Attention

Apr 21, 2024

• 7

Organizations

Jaward's activity

posted an update 1 day ago

Post

1057

minimal single script implementation of knowledge distillation in LLMs. In this implementation, we use GPT-2 (124M) as student model and GPT-2 Medium (340M) as teacher via reverse Kullback-Leibler (KL) divergence, trained on a small chunk of openwebtext.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/llm_knowledge_distillation.ipynb

liked a model 2 days ago

deepseek-ai/DeepSeek-V3

Updated 16 days ago • 132k • 1.86k

posted an update 6 days ago

Post

1324

Huge AI win in medicine👏
"Large language of life model" just dropped!!
Full paper: https://www.nature.com/articles/s41586-024-08391-z

1 reply

upvoted a collection 8 days ago

Cosmos

Collection

The collection of Cosmos models • 31 items • Updated 4 days ago • 224

posted an update 8 days ago

Post

2281

damn I love nvidia's bullish stance on taking AI to the edge - from being the overlord of compute to cutting-edge physical AI with SOTA multiverse simulation engines that brings the scaling laws under your control!!

My favorite: Cosmos - fully opensourced, open-weight physics based video gen platform, what an incredible way to start off the year✨

Code: https://github.com/NVIDIA/Cosmos
Models: nvidia/cosmos-6751e884dc10e013a0a0d8e6
Paper: https://d1qx31qr3h6wln.cloudfront.net/publications/NVIDIA%20Cosmos_2.pdf

liked a model 15 days ago

Qwen/QVQ-72B-Preview

Image-Text-to-Text • Updated 3 days ago • 135k • 500

posted an update 18 days ago

Post

2978

nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb

liked a model 20 days ago

deepseek-ai/DeepSeek-V3-Base

Updated 16 days ago • 13.8k • 1.24k

replied to their post 25 days ago

btw the background songs in the videos are actually what I listen to during implementation

posted an update 25 days ago

Post

1794

Implements from first-principle a discrete flow matching model for code generation- trained a small sized 2D dfm model on two variations of code for binary search. The result was amazing, code in comment:
Code: https://github.com/Jaykef/ai-algorithms/blob/main/dfm.ipynb

1 reply

liked a dataset about 1 month ago

HuggingFaceFW/fineweb-2

Viewer • Updated 7 days ago • 12.5B • 76.5k • 392

posted an update about 1 month ago

Post

598

In Honour of This Year's NeurIPs Test of Time Paper Awardees
This year's NIPs Test of Time Paper Awards went to two groundbreaking papers:
1. Generative Adversarial Nets (Goodfellow et al)
2. Sequence to Sequence Learning with Neural Networks (Ilya et al)
Let's explore how these papers helped pioneered breakthroughs in today's AI:

Full Article: https://huggingface.co/blog/Jaward/nip

published an article about 1 month ago

Article

In Honour of This Year's NeurIPs Test of Time Paper Awardees

•

Dec 10, 2024

• 2

posted an update about 1 month ago

Post

643

Lightweight implementation of the seminal paper “Sequence to Sequence Learning with Neural Networks”

Built, trained and eval a 2 layer deep seq2seq LSTM-based model (~10M params) on German-English corpus of Multi30K dataset. In honor of
ilya sutskever et al for winning this year’s NeurIPSConf Test of Time paper award 🫡

Code: https://github.com/Jaykef/ai-algorithms/blob/main/seq2seq.ipynb

posted an update about 1 month ago

Post

485

Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

As a young researcher, I've often pondered the limitations of backpropagation, especially when mapped with how learning occurs in the human brain. While backpropagation has been the workhorse of deep learning, it isn't without flaws. In this post, I aim to share some thoughts on these shortcomings from first principles.

Full article
https://huggingface.co/blog/Jaward/rethinking-backpropagation

posted an update about 2 months ago

Post

2425

Implements compute-efficient DeepPCR algorithm which parallelizes sequential operations thus speeding up inference and training of neural networks. DeepPCR can significantly reduce the time complexity in operations such as denoising in latent diffusion space from O(L) to O(log2 L).

Code: https://github.com/Jaykef/ai-algorithms/blob/main/deep_pcr.ipynb

liked a dataset about 2 months ago

osunlp/Multimodal-Mind2Web

Viewer • Updated Jun 5, 2024 • 14.2k • 1.21k • 56

posted an update about 2 months ago

Post

1237

This is supercool!!
Explores o1-like multimodal reasoning.
Multi-agents with DPO is a nice touch 👍
Paper: https://arxiv.org/pdf/2411.14432
Code: https://github.com/dongyh20/Insight-V

upvoted 2 papers about 2 months ago

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 43

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 113