quantization
Here are 580 public repositories matching this topic...
Unify Efficient Fine-Tuning of 100+ LLMs
-
Updated
Jun 12, 2024 - Python
This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
-
Updated
Jun 12, 2024 - Python
Neural Network Compression Framework for enhanced OpenVINO™ inference
-
Updated
Jun 12, 2024 - Python
[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
-
Updated
Jun 12, 2024 - Jupyter Notebook
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Jun 12, 2024 - Python
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
-
Updated
Jun 12, 2024 - Python
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
-
Updated
Jun 12, 2024 - Python
Dataflow compiler for QNN inference on FPGAs
-
Updated
Jun 12, 2024 - Python
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
-
Updated
Jun 12, 2024 - Python
AIMET GitHub pages documentation
-
Updated
Jun 11, 2024 - HTML
On-device LLM Inference Powered by X-Bit Quantization
-
Updated
Jun 11, 2024 - Python
Model Compression/Inference Made Easy
-
Updated
Jun 11, 2024 - Jupyter Notebook
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
-
Updated
Jun 12, 2024 - Jupyter Notebook
Fast inference engine for Transformer models
-
Updated
Jun 11, 2024 - C++
OpenMMLab Model Compression Toolbox and Benchmark.
-
Updated
Jun 11, 2024 - Python
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
-
Updated
Jun 11, 2024 - Python
Model Compression Toolkit (MCT) is an open source project for neural network model optimization under efficient, constrained hardware. This project provides researchers, developers, and engineers advanced quantization and compression tools for deploying state-of-the-art neural networks.
-
Updated
Jun 10, 2024 - Python
Official code of the ICML24 paper: "Winner-takes-all learners are geometry-aware conditional density estimators"
-
Updated
Jun 10, 2024 - Python
Improve this page
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."