ModernBART wen?

#38
by Fizzarolli - opened

Title is /j, but in all seriousness is there any interest out there in producing a BART/T5-like encoder-decoder model with the improvements here? (flash attn, rope, etc)

Fizzarolli changed discussion status to closed
Fizzarolli changed discussion status to open

(misclick xD)

The encoder-decoder models could even use the current checkpoint, if modernBERT is supported:
https://github.com/huggingface/transformers/issues/35385
https://discuss.huggingface.co/t/training-modernbert-gpt2/134398/2

Sign up or log in to comment