pruning

Star

Here are 433 public repositories matching this topic...

huggingface / optimum-intel

Star

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

optimization intel transformers inference pruning quantization distillation onnx openvino diffusers

Updated Jun 11, 2024
Jupyter Notebook

amikom-gace-research-group / characterize-pruning

Star

Characterization study repository for pruning, a popular way to compress a DL model. this repo also investigates optimal sparse tensor layouts for pruned nets

pruning model-compression edge-devices sparse-neural-networks characterization-study

Updated Jun 11, 2024
Python

openvinotoolkit / nncf

Star

Neural Network Compression Framework for enhanced OpenVINO™ inference

nlp sparsity compression deep-learning tensorflow transformers pytorch classification pruning object-detection quantization semantic-segmentation bert hawq onnx openvino mmdetection mixed-precision-training quantization-aware-training

Updated Jun 11, 2024
Python

intel / neural-compressor

Star

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Jun 11, 2024
Python

This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.

Updated Jun 11, 2024
Python

quic / aimet

Star

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated Jun 11, 2024
Python

ragibson / ModularityPruning

Star

Pruning tool to identify small subsets of network partitions that are significant from the perspective of stochastic block model inference. This method works for single-layer and multi-layer networks, as well as for restricting focus to a fixed number of communities when desired.

community-detection network-graph pruning stochastic-block-model multilayer-networks

Updated Jun 10, 2024
Python

janumiko / pruning-benchmark

Star

Architecture for pruning methods analysis using pytorch prune module

machine-learning research deep-learning pytorch pruning prune pytorch-pruning pytorch-prune

Updated Jun 10, 2024
Python

neuralmagic / sparseml

Star

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

nlp sparsity tensorflow keras pytorch deep-learning-algorithms image-classification deep-learning-library pruning object-detection transfer-learning automl computer-vision-algorithms onnx deep-learning-models sparsification pruning-algorithms smaller-models sparsification-recipes

Updated Jun 11, 2024
Python

ciodar / deep-compression

Star

PyTorch Lightning implementation of the paper Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. This repository allows to reproduce the main findings of the paper on MNIST and Imagenette datasets.

deep-learning pruning vector-quantization model-compression