Shady Agwa

Research Fellow


Curriculum vitae


[email protected]


+44 (0) 7950676030


School of Engineering

The University of Edinburgh

1.24D Murchison House, King's Buildings Campus, Edinburgh, EH9 3BF, UK



Towards a Reconfigurable Bit-Serial/Bit-Parallel Vector Accelerator using In-Situ Processing-In-SRAM


Journal article


Khalid Al-Hawaj, O. Afuye, Shady Agwa, A. Apsel, C. Batten
International Symposium on Circuits and Systems, 2020

Semantic Scholar DBLP DOI
Cite

Cite

APA   Click to copy
Al-Hawaj, K., Afuye, O., Agwa, S., Apsel, A., & Batten, C. (2020). Towards a Reconfigurable Bit-Serial/Bit-Parallel Vector Accelerator using In-Situ Processing-In-SRAM. International Symposium on Circuits and Systems.


Chicago/Turabian   Click to copy
Al-Hawaj, Khalid, O. Afuye, Shady Agwa, A. Apsel, and C. Batten. “Towards a Reconfigurable Bit-Serial/Bit-Parallel Vector Accelerator Using In-Situ Processing-In-SRAM.” International Symposium on Circuits and Systems (2020).


MLA   Click to copy
Al-Hawaj, Khalid, et al. “Towards a Reconfigurable Bit-Serial/Bit-Parallel Vector Accelerator Using In-Situ Processing-In-SRAM.” International Symposium on Circuits and Systems, 2020.


BibTeX   Click to copy

@article{khalid2020a,
  title = {Towards a Reconfigurable Bit-Serial/Bit-Parallel Vector Accelerator using In-Situ Processing-In-SRAM},
  year = {2020},
  journal = {International Symposium on Circuits and Systems},
  author = {Al-Hawaj, Khalid and Afuye, O. and Agwa, Shady and Apsel, A. and Batten, C.}
}

Abstract

Vector accelerators can efficiently execute regular data-parallel workloads, but they require expensive multi-ported register files to feed large vector ALUs. Recent work on in-situ processing-in-SRAM shows promise in enabling area-efficient vector acceleration. This work explores two different approaches to leveraging in-situ processing-in-SRAM: BS-VRAM, which uses bit-serial execution, and BP-VRAM, which uses bit-parallel execution. The two approaches have very different latency vs. throughput trade-offs. BS-VRAM requires more cycles per operation, but is able to execute thousands of operations in parallel, while BP-VRAM requires fewer cycles per operation, but can only execute hundreds of operations in parallel. This paper is the first work to perform a rigorous evaluation of bit-serial vs. bit-parallel in-situ processing-in-SRAM. Our results show that both approaches have similar area overheads. For 32-bit arithmetic operations, BS-VRAM improves throughput by 1.3–5.0× compared to BP-VRAM, while BP-VRAM improves latency by 3.0–23.0× compared to BS-VRAM.


Share

Tools
Translate to