quantization

This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.

Updated Jun 11, 2024
Python

intel / auto-round

Star

SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

rounding quantization awq int4 gptq neural-compressor weight-only

Updated Jun 11, 2024
Python

sony / model_optimization

Star

Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.

machine-learning deep-neural-networks deep-learning neural-network tensorflow optimizer pytorch quantization qat network-quantization network-compression edge-ai ptq

Updated Jun 10, 2024
Python

autohdw / QuBLAS

Star

Quantized BLAS

template cpp blas quantization meta-programming cpp23

Updated Jun 10, 2024
C++

Victorletzelter / VoronoiWTA

Star

Official code of the ICML24 paper: "Winner-takes-all learners are geometry-aware conditional density estimators"

quantization uncertainty-quantification density-estimation voronoi-tessellation

Updated Jun 10, 2024
Python

huggingface / optimum

Star

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

Updated Jun 10, 2024
Python

Aisuko / notebooks

Sponsor

Star

Implementation for the different ML tasks on Kaggle platform with GPUs.

natural-language-processing computer-vision neural-network accelerator transformers pytorch kaggle quantization visulization fine-tuning peft multimodal wandb renforcement-learning large-language-models

Updated Jun 11, 2024
Jupyter Notebook

ModelTC / TFMQ-DM

Star

[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".

highlight quantization cvpr ldm diffusion-models post-training-quantization ddim stable-diffusion cvpr2024

Updated Jun 9, 2024
Jupyter Notebook

swastikmaiti / Llama-2-7B-Chat-PEFT

Star

PEFT is a wonderful tool that enables training a very large model in a low resource environment. Quantization and PEFT will enable widespread adoption of LLM.

quantization peft huggingface llama2-7b peft-fine-tuning-llm

Updated Jun 9, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization

Here are 580 public repositories matching this topic...

intel / neural-compressor

quic / aimet

openvinotoolkit / training_extensions

hiyouga / LLaMA-Factory

quic / aimet-pages

Picovoice / picollm

satabios / sconce

huggingface / optimum-intel

OpenNMT / CTranslate2

huggingface / optimum-quanto

openvinotoolkit / nncf

ModelTC / llmc

intel / auto-round

sony / model_optimization

autohdw / QuBLAS

Victorletzelter / VoronoiWTA

huggingface / optimum

Aisuko / notebooks

ModelTC / TFMQ-DM

swastikmaiti / Llama-2-7B-Chat-PEFT

Improve this page

Add this topic to your repo