🤗 Optimum Intel: Accelerate inference with Intel optimization tools
-
Updated
Jun 11, 2024 - Jupyter Notebook
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Characterization study repository for pruning, a popular way to compress a DL model. this repo also investigates optimal sparse tensor layouts for pruned nets
Neural Network Compression Framework for enhanced OpenVINO™ inference
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Pruning tool to identify small subsets of network partitions that are significant from the perspective of stochastic block model inference. This method works for single-layer and multi-layer networks, as well as for restricting focus to a fixed number of communities when desired.
Architecture for pruning methods analysis using pytorch prune module
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
PyTorch Lightning implementation of the paper Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. This repository allows to reproduce the main findings of the paper on MNIST and Imagenette datasets.
Sparsity-aware deep learning inference runtime for CPUs
Tutorial notebooks for hls4ml
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
[CVPR 2023] Towards Any Structural Pruning; LLMs / SAM / Diffusion / Transformers / YOLOv8 / CNNs
Chess engine
AIMET GitHub pages documentation
[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.
OpenMMLab Model Compression Toolbox and Benchmark.
《李宏毅深度学习教程》(李宏毅老师推荐👍),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
Add a description, image, and links to the pruning topic page so that developers can more easily learn about it.
To associate your repository with the pruning topic, visit your repo's landing page and select "manage topics."