Demystifying Domain-adaptive Post-training for Financial LLMs Paper • 2501.04961 • Published 6 days ago • 9
view article Article Exploring Hard Negative Mining with NV-Retriever in Korean Financial Text By Albertmade • 3 days ago • 8
Translation Alignment Analysis Collection Related paper: "Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis" (accepted at WMT 2024) • 55 items • Updated Oct 2, 2024 • 1
Skywork-Reward-Data-Collection Collection Open-source preference datasets used to train the Skywork reward model series • 17 items • Updated Oct 12, 2024 • 13
MagpieLM Collection Aligning LMs with Fully Open Recipe + Synthetic Data Generated from Open-Source LMs. • 9 items • Updated 2 days ago • 16
Direct Preference Optimization Datasets Collection Datasets suitable for Direct Preference Optimization based on their colum names • 1597 items • Updated Jul 10, 2024 • 2
Probably DPO datasets Collection A collection of datasets that probably support DPO • 146 items • Updated Jun 26, 2024 • 12
Datasets built with ⚗️ distilabel Collection This collection contains some datasets generated and/or labelled using https://github.com/argilla-io/distilabel • 8 items • Updated Dec 11, 2024 • 12
Translated (En->Ko) dataset Collection Datasets translated from English to Korean using llama3-instrucTrans-enko-8b • 19 items • Updated Nov 23, 2024 • 3
Synthetic (text) Dataset Generation Collection Papers about synthetic dataset generation • 9 items • Updated Jun 21, 2024 • 8
synthetic-data-generation-demos Collection A collection of demos for various approaches to synthetic data generation • 4 items • Updated Jun 25, 2024 • 14
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 225
Korean Datasets I've released so far. Collection 지금까지 업로드한 한국어 데이터셋 콜렉션입니다. • 8 items • Updated May 24, 2024 • 16