ThunderKittens：a simple yet faster flashattention alternative #7276

sorasoras · 2024-05-14T06:51:12Z

Feature Description

ThunderKittens is an embedded domain-specific language (DSL) within CUDA designed to simplify the development of high-performance AI kernels on GPUs. It provides abstractions for working with small tiles (e.g., 16x16) of data, which aligns well with the capabilities of modern GPU architectures and tensor cores.

Performance: Despite its simplicity, kernels written in ThunderKittens can match or outperform hand-written CUDA kernels. For example, on the H100 GPU, a ThunderKittens implementation of the forward flash attention kernel outperforms FlashAttention-2 by around 30%.

On 4090s and A100s, TK matches FA2 performance in just a few lines of code.

On H100s, TK is faster forward and backward than FA2 by quite a bit -- so there is no tradeoff of clean versus speed (in this case!)

Tiles Seem Pretty General
Coming soon --
ThunderKittens on AMD hardware!

Motivation

To give us a better fa implementation with less code？

Possible Implementation

https://hazyresearch.stanford.edu/blog/2024-05-12-tk

https://github.com/HazyResearch/ThunderKittens

JohannesGaessler · 2024-05-14T07:15:52Z

The design philosophy of ggml/llama.cpp is not to use external dependencies if at all possible. I was recently informed by an NVIDIA engineer that the way to go for tensor cores is to directly write PTX code (the NVIDIA equivalent of assembly) so I may take a look at the project in terms of that.

Also, I know that you're an AMD user so I would advise you not to count your chickens before they hatch. If the project does what I think it does it would need significant effort to write the equivalence of PTX code for AMD (at least if the performance is supposed to be actually good) so I'm skeptical about AMD support "soon" (but I'll gladly let myself be proven wrong).

sorasoras added the enhancement New feature or request label May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThunderKittens：a simple yet faster flashattention alternative #7276

ThunderKittens：a simple yet faster flashattention alternative #7276

sorasoras commented May 14, 2024

JohannesGaessler commented May 14, 2024

ThunderKittens：a simple yet faster flashattention alternative #7276

ThunderKittens：a simple yet faster flashattention alternative #7276

Comments

sorasoras commented May 14, 2024

Feature Description

Motivation

Possible Implementation

JohannesGaessler commented May 14, 2024