Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

posted an update about 24 hours ago

minimal single script implementation of knowledge distillation in LLMs. In this implementation, we use GPT-2 (124M) as student model and GPT-2 Medium (340M) as teacher via reverse Kullback-Leibler (KL) divergence, trained on a small chunk of openwebtext. Code: https://github.com/Jaykef/ai-algorithms/blob/main/llm_knowledge_distillation.ipynb

liked a model 2 days ago

deepseek-ai/DeepSeek-V3

posted an update 6 days ago

Huge AI win in medicine👏 "Large language of life model" just dropped!! Full paper: https://www.nature.com/articles/s41586-024-08391-z

View all activity

Articles

In Honour of This Year's NeurIPs Test of Time Paper Awardees

Dec 10, 2024

• 2

Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

Dec 2, 2024

• 5

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

May 22, 2024

• 2

On Coding Your First Attention

Apr 21, 2024

• 7

Organizations

Posts 69

Post

880

minimal single script implementation of knowledge distillation in LLMs. In this implementation, we use GPT-2 (124M) as student model and GPT-2 Medium (340M) as teacher via reverse Kullback-Leibler (KL) divergence, trained on a small chunk of openwebtext.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/llm_knowledge_distillation.ipynb

Post

1323

Huge AI win in medicine👏
"Large language of life model" just dropped!!
Full paper: https://www.nature.com/articles/s41586-024-08391-z

View all posts

Seamless Speech Translator

models 3

Jaward/CodeOptimus-Instruct-Mistral-7B-v0.1.gguf

Updated Jun 5, 2024 • 10 • 1

Jaward/phi-3-mini-4k-instruct.Q4_0.gguf

Text Generation • Updated Apr 27, 2024 • 110 • 3

Jaward/mlx-bge-small-en

Updated Apr 17, 2024

datasets

None public yet