Suzie Oh's picture

Suzie Oh

ohsuz

·

ohsuz

AI & ML interests

None yet

Recent Activity

upvoted a paper about 14 hours ago

Demystifying Domain-adaptive Post-training for Financial LLMs

updated a dataset 2 days ago

KKACHI-HUB/Reward-Preference-35K

updated a dataset 2 days ago

KKACHI-HUB/Reward-Preference-24K

View all activity

Organizations

ohsuz's activity

upvoted a paper about 14 hours ago

Demystifying Domain-adaptive Post-training for Financial LLMs

Paper • 2501.04961 • Published 6 days ago • 9

upvoted an article 3 days ago

Article

Exploring Hard Negative Mining with NV-Retriever in Korean Financial Text

By

•

3 days ago

• 8

upvoted a collection 7 days ago

Translation Alignment Analysis

Related paper: "Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis" (accepted at WMT 2024) • 55 items • Updated Oct 2, 2024 • 1

upvoted an article 7 days ago

Article

The Rise of Agentic Data Generation

By

•

Jul 15, 2024

• 81

upvoted a collection 7 days ago

Deita

14 items • Updated May 20, 2024 • 10

upvoted an article 3 months ago

Article

Navigating Korean LLM Research #1: Models

By

•

Oct 22, 2024

• 23

upvoted 2 collections 3 months ago

Skywork-Reward-Data-Collection

Open-source preference datasets used to train the Skywork reward model series • 17 items • Updated Oct 12, 2024 • 13

MagpieLM

Aligning LMs with Fully Open Recipe + Synthetic Data Generated from Open-Source LMs. • 9 items • Updated 2 days ago • 16

upvoted 3 collections 5 months ago

Direct Preference Optimization Datasets

Datasets suitable for Direct Preference Optimization based on their colum names • 1597 items • Updated Jul 10, 2024 • 2

Probably DPO datasets

A collection of datasets that probably support DPO • 146 items • Updated Jun 26, 2024 • 12

Datasets built with ⚗️ distilabel

This collection contains some datasets generated and/or labelled using https://github.com/argilla-io/distilabel • 8 items • Updated Dec 11, 2024 • 12

upvoted a collection 6 months ago

Mini Pretrain Datasets

9 items • Updated Jul 9, 2024 • 9

upvoted 3 collections 7 months ago

Translated (En->Ko) dataset

Datasets translated from English to Korean using llama3-instrucTrans-enko-8b • 19 items • Updated Nov 23, 2024 • 3

Synthetic (text) Dataset Generation

Papers about synthetic dataset generation • 9 items • Updated Jun 21, 2024 • 8

synthetic-data-generation-demos

A collection of demos for various approaches to synthetic data generation • 4 items • Updated Jun 25, 2024 • 14

upvoted 2 collections 10 months ago

Model Merging

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 225

Korean Datasets I've released so far.

지금까지 업로드한 한국어 데이터셋 콜렉션입니다. • 8 items • Updated May 24, 2024 • 16