#

rlhf

Here are 117 public repositories matching this topic...

princeton-nlp / SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward

alignment large-language-models rlhf preference-alignment

Updated Jun 2, 2024
Python

tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated Jun 2, 2024
Jupyter Notebook

Esmail-ibraheem / Axon

AI research lab🔬: implementations of AI papers and theoretical research: InstructGPT, llama, transformers, diffusion models, RLHF, etc...

transformers pytorch llama research-paper paper-implementations arxiv-papers llms rlhf

Updated Jun 2, 2024
Jupyter Notebook

jazelly / FinetuneLLMs

Finetune an LLM, within a few clicks!

python mac ui ai llama train lora finetune sft llm rlhf

Updated Jun 2, 2024
JavaScript

LLaMA-Factory

hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Updated Jun 2, 2024
Python

Aligner2024 / aligner

Achieving Efficient Alignment through Learned Correction

alignment aligner llm rlhf weak-to-strong

Updated Jun 1, 2024
Python

log10-io / log10

Python client library for improving your LLM app accuracy

python debugging ai monitoring evaluations feedback logging artificial-intelligence openai agents autonomous-agents fine-tuning llms rlhf llmops anthropic

Updated May 31, 2024
Python

argilla

argilla-io / argilla

Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated May 31, 2024
Python

distilabel

argilla-io / distilabel

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated May 31, 2024
Python

InternLM / InternLM

Official release of InternLM2 7B and 20B base and chat models. 200K context support

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated May 31, 2024
Python

jianzhnie / LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated Jun 1, 2024
Python

ContextualAI / HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated May 30, 2024
Python

TideDra / VL-RLHF

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

Updated May 30, 2024
Python

xtreme1

xtreme1-io / xtreme1

Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.

computer-vision image-annotation annotation point-cloud image-classification annotation-tool 3d-annotation labeling-tool multimodal image-labelling-tool rlhf

Updated May 30, 2024
TypeScript

MOONLAPSED / cognOS

Python package for cognosis kb, syntax, and markup language. Under-construction.

agent rlhf local-llm llama2

Updated May 30, 2024
Python

RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

llm rlhf reward-models llama3

Updated May 30, 2024
Python

CyberAgentAILab / annotation-efficient-po

Code of "Annotation-Efficient Preference Optimization for Language Model Alignment"

alignment llm rlhf

Updated May 29, 2024
Python

AlignInc / aligner-replication

The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction

alignment aligner rlhf

Updated May 29, 2024
Python

mengdi-li / awesome-RLAIF

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

alignment rl llms rlhf rlaif

Updated May 28, 2024

natolambert / rlhf-book

Textbook on reinforcement learning from human feedback

ai alignment rlhf

Updated May 27, 2024
HTML

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."