This repository provides a set of codes designed to overcome the quadratic increase in complexity associated with Scaled Dot-Product Attention in long sequences. It aims to create more efficient models by applying Linear Projection based on sequence length to various modeling techniques.
The idea of applying Linear Projection to sequences is inspired by the paper "Linformer: Self-Attention with Linear Complexity." This concept is further extended to be applied differently to encoders and decoders, and it also explores the effects of both Linear and Non-Linear Projections.