ModernBART wen?
#38
by
Fizzarolli
- opened
Title is /j, but in all seriousness is there any interest out there in producing a BART/T5-like encoder-decoder model with the improvements here? (flash attn, rope, etc)
Fizzarolli
changed discussion status to
closed
Fizzarolli
changed discussion status to
open
(misclick xD)
The encoder-decoder models could even use the current checkpoint, if modernBERT is supported:
https://github.com/huggingface/transformers/issues/35385
https://discuss.huggingface.co/t/training-modernbert-gpt2/134398/2