NUmerical methods for Compression and LEarning

Name: NUmerical methods for Compression and LEarning
Start: 2022-05-11T10:00:00+02:00
End: 2022-05-13T19:00:00+02:00
Location: Gran Sasso Science Institute

11–13 May 2022

Gran Sasso Science Institute

Europe/Rome timezone

Contact

numerics@gssi.it

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform

Not scheduled

20m

Gran Sasso Science Institute

Viale Francesco Crispi 7 67100 L'Aquila (AQ) Italy

Poster Poster

Carmelo Scribano (University of Modena and Reggio Emilia)

The Trasformer family of Deep-Learning models is emerging as the dominating paradigm for both natural language processing and, more recently, computer vision applications.
An intrinsic limitation of this family of "fully-attentive" architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as $O(n^2)$ where $n$ stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of lossy data compression to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive experimental analysis shows that our method takes up less memory and computation for similar performance, drastically reducing inference times.
We aim that the results of our research might serve as a starting point for a broader class of deep neural models with reduced memory footprint.
The implementation is publicly available at https://github.com/cscribano/DCT-Former-Public.

Carmelo Scribano (University of Modena and Reggio Emilia) Dr Giorgia Franchini (University of Modena and Reggio Emilia)

Prof. Marco Prato (University of Modena and Reggio Emilia) Prof. Marko Bertogna (University of Modena and Reggio Emilia)

There are no materials yet.

NUmerical methods for Compression and LEarning

Contact

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform

Gran Sasso Science Institute

Speaker

Description

Primary authors

Co-authors

Presentation materials