Accelerating Graph Analytics Using Near Data Processing

Dwaipayan Choudhury

doi:10.7273/000005087

Graph application workloads are dominated by random memory accesses with poor locality. Due to inherent irregularities in most of the real-world graph applications, when they are mapped to manycore architectures, data communication becomes bottleneck in overall performance and energy improvement. Network-on-Chip (NoC)-based architectures provide a way to overcome this challenge as the architectural topology can be used to approximately model the expected traffic patterns that emerge from graph application workloads. Hence, we first study the mix of long- and short-range traffic patterns generated on-chip using graph workloads, and subsequently use the findings to adapt the design of an optimal NoC-based architecture. In particular, by leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)-enabled manycore architecture, where the placement of the links follow a power-law distribution. Moreover, as graph applications are inherently memory intensive, off-chip data movement gives rise to latency and energy overheads in presence of external DRAM. In conventional manycore architectures, as the main memory layer is not integrated with the logic, off-chip data movement negatively impacts overall performance and energy consumption. In such cases, Processing-In Memory (PIM) or Near Data Processing (NDP) can present an effective paradigm to reduce data movement overheads by moving the computations closer to the data stored in memory. It enables faster transfer of the data to/from memory to the logic layer. Hence, it reduces both latency and energy consumption. NDP has the ability to take advantage of the emerging 3D-stacked memory and logic devices (such as Micron’s Hybrid Memory Cube or HMC), to enable high-bandwidth, low latency, and low energy memory accesses. In data-intensive applications, a manycore architecture with NDP is capable of breaking the barrier between memory access and computational efficiency, but its potential is yet to be adequately demonstrated through careful design and performance evaluation.Another way to tackle the irregular and sparse nature of computation is recently proposed ReRAM-based Processing-in-Memory (PIM) architectures. Most of these ReRAM architecture designs have focused on mapping graph computations into a set of multiply-and-accumulate (MAC) operations. ReRAMs also offer a key advantage in reducing memory latency between cores and memory by allowing for processing-in-memory (PIM). However, when implemented on a ReRAM-based manycore architecture, graph applications still pose two key challenges – significant storage requirements (particularly due to wasted zero cell storage), and significant amount of on-chip traffic. Our proposed architecture incorporates a novel crossbar-aware node reordering to reduce ReRAM storage requirements. Secondly, its 3D NoC-enabled design reduces on-chip communication latency. Overall, in this work we propose novel NDP design on manycore GPU architectures and ReRAM-based PIM architectures to accelerate graph applications.

Accelerating Graph Analytics Using Near Data Processing

Files and links (1)

Abstract

Metrics

Details