SimPO: Simple Preference Optimization | Syed Hasan posted on the topic | LinkedIn (2024)

Syed Hasan

Machine Learning Engineer | GPU Poor

  • Report this post

SimPO: Simple Preference Optimization, an innovative alternative to DPO with numerous key benefits:- Simpler and more effective compared to existing approaches like DPO- Compute and memory efficient, eliminating the need for a reference model- Improved performance, with a larger margin between winning and losing responsesThis approach utilizes the average log probability of a sequence as the implicit reward and introduces a target reward margin to the Bradley-Terry objective, enhancing algorithm performance significantly.Tested on meta-llama-3-8b-instruct using Maxime Labonne's orpo-dpo-mix-40k dataset for 200 steps, SimPO consistently outperforms existing approaches without increasing response length substantially. According to the authors, it surpasses DPO by up to 6.4 points on AlpacaEval 2 and by up to 7.5 points on Arena-Hard.Explore more about SimPO:🤗 HuggingFace: https://lnkd.in/gY_93tAu🏵 Github: https://lnkd.in/gFmkMmSf🗞 Paper: https://lnkd.in/gNkkXvT4#PreferenceOptimization #Algorithm #Innovation

Syed-Hasan-8503/Llama-3-8b-instruct-SimPO · Hugging Face huggingface.co

64

2 Comments

Like Comment

Muhammad hammad jamil

ML |LLM| RAG| Autoagent

3d

  • Report this comment

Hassan have you tried other approaches like KTO and ORPO training?

Like Reply

1Reaction 2Reactions

To view or add a comment, sign in

More Relevant Posts

  • Syed Hasan

    Machine Learning Engineer | GPU Poor

    • Report this post

    💡Here comes the SUPRA!!Linear transformers, though promising, struggle with scaling and performance. SUPRA ( Scalable UPtraining for Recurrent Attention) offers a cost-effective solution by uptraining pre-trained transformers into RNNs, achieving competitive results at just 5% of the training cost (Linearizing Large Language Models).To see the effectiveness of this approach, I pretrained a linear model (87M params) from scratch on a subset of redpajama dataset for 1 epoch. The training was done on 1x A4000 for only 4 hours. As expected, the model was not producing coherent english, but it wasn't gibberish at all. You can also try out their uptraining script for mistral-7b linear model if you have the compute or try to extend my experiment on 87m params model. The checkpoints are available on my huggingface.🤗 Huggingface: Syed-Hasan-8503/Linear_Tiny_87M · Hugging Face🗞 Paper: https://lnkd.in/dEuUXyiv⚙ Github: https://lnkd.in/dg-jn7vq

    • SimPO: Simple Preference Optimization | Syed Hasan posted on the topic | LinkedIn (8)

    52

    1 Comment

    Like Comment

    To view or add a comment, sign in

  • Syed Hasan

    Machine Learning Engineer | GPU Poor

    • Report this post

    ⚡ Seeking faster inference speeds without sacrificing accuracy?To answer this question, I recently tried out the latest 💡QoQ (quattuor-octo-quattuor), a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache on 🔍Llama-3-8B-Instruct-262k.First step was togenerate QoQ quantized checkpoints usingLMQuant and dump the fake-quantized models. Afterwards, Qserve provides a checkpoint converter to real-quantize and pack the model into QServe formatI ran the throughput benchmark on 1x A100 in order to compare the findings with the Qserve documented values for Llama-3-8B on A100. 📈 Impressive results achieved! With an average throughput of 2925 tok/s over 3 rounds and a batch size of 256, QoQ showcases its efficiency and scalability.🤗 Huggingface: https://lnkd.in/dsmd5qxq⚙️Qserve: https://lnkd.in/duHyQx7U⚙️lmquant: https://lnkd.in/dq4XhDMM

    Syed-Hasan-8503/Llama-3-8B-Instruct-262k-Qserve · Hugging Face huggingface.co

    66

    5 Comments

    Like Comment

    To view or add a comment, sign in

  • Syed Hasan

    Machine Learning Engineer | GPU Poor

    • Report this post

    Say hello to 🔥 NOLA - A novel Fine-Tuning Approach!!!Similar to LoRA, NOLA uses a low-rank decomposition of weight matrices for the fine-tuning step. However, LoRA face two primary limitations:▪ The parameter count is lower-bounded by the rank one decomposition▪ The extent of reduction is heavily influenced by both the model architecture and the chosen rank.To test this technique out, I fine-tuned Meta Llama-3-8B on oasst1 dataset for 100 steps only due to limited compute. The parameters ratio for LoRA rank 16 and NOLA basis 512 were as follows: Trainable params: 0.2M || all params: 4.5B || Trainable: 0.005🖇 wandb: https://lnkd.in/dxrkY4vc⚙ Github: https://lnkd.in/diQ5J2t4🗞 Paper: https://lnkd.in/dervqhr8🤗 Huggingface: https://lnkd.in/dgYv4pqd

    Syed-Hasan-8503/Llama-3-8B-NOLA · Hugging Face huggingface.co

    82

    3 Comments

    Like Comment

    To view or add a comment, sign in

  • Machine Learning Engineer | GPU Poor

    • Report this post

    Unlocking Efficiency: ✂ Layer Pruning in LLMs ✨ Just pruned 12 layers out of Llama-2-13B using epfl-llm/guidelines dataset and reduced the model size to almost 8B. The model's generation capabilities is still intact but there is an expected performance degradation from 51.26 to 48.75 on Open Medical LLM Leaderboard. This approach suggests that deeper layers in these LLMs are often more redundant than previously thought. After pruning, the model can be "healed" using Parameter-Efficient Fine-tuning (PEFT) methods like QLoRA to recover from the pruning-induced performance loss. ❗Due to limited compute resources ❗ , I have left the healing phase out of this experiment. Feel free to perform healing and share the results. Pruning has been done using Muhammad Bin Usman's 🛠 AutoPrune Notebook.🤗 Huggingface Repo: https://lnkd.in/dRYCqZBh🗞 Paper: https://lnkd.in/dCFjn99w

    • SimPO: Simple Preference Optimization | Syed Hasan posted on the topic | LinkedIn (23)

    51

    1 Comment

    Like Comment

    To view or add a comment, sign in

  • Syed Hasan

    Machine Learning Engineer | GPU Poor

    • Report this post

    🔥 Introducing Llama-3-openhermes-reft! 🦙Llama-3-openhermes-reft is a fine-tuned version of meta-llama/Meta-Llama-3-8B on a 10K subset of teknium/OpenHermes-2.5 dataset using the cutting-edge Representation Fine-Tuning (ReFT) technique! 🌟🔍 But what exactly is ReFT? ReFT, short for Representation Fine-Tuning, is unlike other traditional fine-tuning methods, ReFT operates on a frozen base model and learns task-specific interventions on hidden representations. It's all about maximizing efficiency and performance! 💡💻 The model has been trained for 1 epoch using PyReFT.🚀 PyReFT is a Python library that supports adapting internal language model representations via trainable interventions. Built on top of pyvene, it offers seamless integration with any pretrained LM available on HuggingFace. With PyReFT, finetuning with ReFT methods becomes a breeze, and you can easily upload your finetuned models to HuggingFace.Feel free to check out my blog (medium) on finetuning llama-3 using reft📰 Medium Blog: https://lnkd.in/dFUaxj_c👉 Check out PyReFT on Github: https://lnkd.in/dUUjfxYc🤗 Huggingface Repo: https://lnkd.in/dMpYanxe#NLP #ReFT #PyReFT #Llama3 #HuggingFace #AGI #LanguageModels #AI #DeepLearning #medium #LLMs #FineTuning #PeFT

    • SimPO: Simple Preference Optimization | Syed Hasan posted on the topic | LinkedIn (28)

    37

    Like Comment

    To view or add a comment, sign in

  • Syed Hasan

    Machine Learning Engineer | GPU Poor

    • Report this post

    💥Idefics-8B-SFT Idefics2-8B-SFT is a fine-tuned version ofHuggingFaceM4/idefics2-8bon 35kTextVQA dataset. This fine-tuned variant achieves a Levenshtein score of 82.29% (TextVQA) as comapred to 65.50% when fine-tuned on nielsr/docvqa_1200_examples dataset.Idefics2-8B-SFT can be used to perform inference on multimodal (image + text) tasks in which the input is composed of a text query along with one (or multiple) image(s). Training was performed on RTX A5000 for almost 10 hrs.🤗 https://lnkd.in/dhx8_fSC#VLM #AI #FineTuning #Multimodal #HuggingFace #MachineLearning #ModelPerformance

    • SimPO: Simple Preference Optimization | Syed Hasan posted on the topic | LinkedIn (31)

    23

    Like Comment

    To view or add a comment, sign in

  • Syed Hasan

    Machine Learning Engineer | GPU Poor

    • Report this post

    🌟Introducing Mistral_classification_head_qloraJust wrapped up an experiment where I integrated a new transformer head for a classification task to Mistral AI's Mistral-7B-Instruct-v0.2 using transformer-heads library. The goal was to perform a completely different task by attaching a new transformer head e.g one can attach a sequence classification head to a causal LM and finetune it to do sentimental classification.I have done the same by adding an additional head at the third last layer and finetuning it for 1 epoch on dair-ai's emotion dataset using QLoRA. I have used 🏵 1x A40 GPU. The evaluation loss for the new head (emotion-head-3) came out to be 1.313 approx.You can also extend it to Joint multi-task learning (many heads doing completely different tasks + QLoRA, all trained at the same time).🤗 Huggingface => https://lnkd.in/dCuT4QKE⚙ Transformer-heads Library => https://lnkd.in/dqJjzEWZ#AI #Innovation #Classification #TransformerHeads #QloRA #EmotionDetection #TechBreakthroughs

    • SimPO: Simple Preference Optimization | Syed Hasan posted on the topic | LinkedIn (34)

    49

    1 Comment

    Like Comment

    To view or add a comment, sign in

  • Syed Hasan

    Machine Learning Engineer | GPU Poor

    • Report this post

    🌟 Phi-2-ORPO: A fine-tuned version of Phi-2 on argilla's dpo-mix-7k dataset using Odds Ratio Preference Optimization (ORPO). The model was trained for 1 epoch on 1x A40 GPU. Odds Ratio Preference Optimization (ORPO) proposes a new method to train LLMs by combining SFT and Alignment into a new objective (loss function), achieving state of the art results.ORPO beats out SFT, SFT+DPO on PHI-2, Llama 2, and Mistral! (e.g., 📊Mistral ORPO achieves 12.20% on AlpacaEval2.0, 66.19% on IFEval, and 7.32 on MT-Bench)Phi-2-ORPO on other benchmarks. Here's how it did:ARC: 22.7HellaSwag: 25.04MMLU: 23.12Winogrande: 49.57Training was done using LazyORPO (by Zain ul abideen), an automation tool for ORPO training!🤗 Hugging Face: https://lnkd.in/dcyFq96E⚙ LazyORPO: https://lnkd.in/di3TzPdH📃 Paper: https://lnkd.in/dmWTjNHs#LLM #NLP #AI #MachineLearning #ORPO #SFT #FineTuning #StateOfTheArt #Argilla #DPO #LazyORPO #HuggingFace

    Syed-Hasan-8503/phi-2-ORPO · Hugging Face huggingface.co

    65

    4 Comments

    Like Comment

    To view or add a comment, sign in

SimPO: Simple Preference Optimization | Syed Hasan posted on the topic | LinkedIn (42)

SimPO: Simple Preference Optimization | Syed Hasan posted on the topic | LinkedIn (43)

1,066 followers

  • 17 Posts

View Profile

Follow

Explore topics

  • Sales
  • Marketing
  • Business Administration
  • HR Management
  • Content Management
  • Engineering
  • Soft Skills
  • See All
SimPO: Simple Preference Optimization | Syed Hasan posted on the topic | LinkedIn (2024)

References

Top Articles
Latest Posts
Article information

Author: Tyson Zemlak

Last Updated:

Views: 5624

Rating: 4.2 / 5 (63 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Tyson Zemlak

Birthday: 1992-03-17

Address: Apt. 662 96191 Quigley Dam, Kubview, MA 42013

Phone: +441678032891

Job: Community-Services Orchestrator

Hobby: Coffee roasting, Calligraphy, Metalworking, Fashion, Vehicle restoration, Shopping, Photography

Introduction: My name is Tyson Zemlak, I am a excited, light, sparkling, super, open, fair, magnificent person who loves writing and wants to share my knowledge and understanding with you.