A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jun 13, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Everything you want to know about Google Cloud TPU
TPU pod commander is a package for managing and launching jobs on Google Cloud TPU pods.
DECIMER: Deep Learning for Chemical Image Recognition using Efficient-Net V2 + Transformer
Everything we actually know about the Apple Neural Engine (ANE)
Artificial Intelligence
cBLUE is a tool to calculate the total propagated uncertainty of bathymetric lidar data.
Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)
Differentiable Fluid Dynamics Package
Benchmarking suite to evaluate 🤖 robotics computing performance. Vendor-neutral. ⚪Grey-box and ⚫Black-box approaches.
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion
Google Coral TPU DKMS Driver package for Fedora, RHEL, OpenSUSE, and OpenMandriva
Solana TpuClient Typescript Implementation
Add a description, image, and links to the tpu topic page so that developers can more easily learn about it.
To associate your repository with the tpu topic, visit your repo's landing page and select "manage topics."