Shady Agwa

Research Fellow


Curriculum vitae


[email protected]


+44 (0) 7950676030


School of Engineering

The University of Edinburgh

1.24D Murchison House, King's Buildings Campus, Edinburgh, EH9 3BF, UK



TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Dataflow and Analytical Modelling


Journal article


Cristian Sestito, Shady O. Agwa, T. Prodromakis
2024

Semantic Scholar ArXiv
Cite

Cite

APA   Click to copy
Sestito, C., Agwa, S. O., & Prodromakis, T. (2024). TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Dataflow and Analytical Modelling.


Chicago/Turabian   Click to copy
Sestito, Cristian, Shady O. Agwa, and T. Prodromakis. “TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Dataflow and Analytical Modelling” (2024).


MLA   Click to copy
Sestito, Cristian, et al. TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Dataflow and Analytical Modelling. 2024.


BibTeX   Click to copy

@article{cristian2024a,
  title = {TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Dataflow and Analytical Modelling},
  year = {2024},
  author = {Sestito, Cristian and Agwa, Shady O. and Prodromakis, T.}
}

Abstract

In order to follow the ever-growing computational complexity and data intensity of state-of-the-art AI models, new computing paradigms are being proposed. These paradigms aim at achieving high energy efficiency, by mitigating the Von Neumann bottleneck that relates to the energy cost of moving data between the processing cores and the memory. Convolutional Neural Networks (CNNs) are susceptible to this bottleneck, given the massive data they have to manage. Systolic Arrays (SAs) are promising architectures to mitigate the data transmission cost, thanks to high data utilization of Processing Elements (PEs). These PEs continuously exchange and process data locally based on specific dataflows (like weight stationary and row stationary), in turn reducing the number of memory accesses to the main memory. In SAs, convolutions are managed either as matrix multiplications or exploiting the raster-order scan of sliding windows. However, data redundancy is a primary concern affecting area, power and energy. In this paper, we propose TrIM: a novel dataflow for SAs based on a Triangular Input Movement and compatible with CNN computing. TrIM maximizes the local input utilization, minimizes the weight data movement and solves the data redundancy problem. Furthermore, TrIM does not incur the significant on-chip memory penalty introduced by the row stationary dataflow. When compared to state-of-the-art SA dataflows the high data utilization offered by TrIM guarantees ~10x less memory access. Furthermore, considering that PEs continuously overlap multiplications and accumulations, TrIM achieves high throughput (up to 81.8% higher than row stationary), other than requiring a limited number of registers (up to 15.6x fewer registers than row stationary).


Share

Tools
Translate to