AI-DRIVEN DESIGN AND OPTIMIZATION OF HETEROGENEOUS CHIPLET SYSTEMS FOR SERVER-SCALE AI WORKLOADS

Harsh Sharma

We are at the cusp of a technological revolution driven by deep generative models, also referred to as foundation models (FMs), such as ChatGPT for text and Stable Diffusion for images. These models, containing hundreds of billions of parameters trained on massive unlabeled datasets, have redefined the computational scale of modern artificial intelligence. In the near future, fine-tuning foundation models on domain-specific and privacy-sensitive data will emerge as a central challenge. The dominant architecture enabling this new era is the Transformer, whose scalability and generalization capabilities have made it indispensable across natural language processing, computer vision, and multi-modal learning.However, the exponential growth in model size and computational demand has outpaced the capabilities of monolithic System-on-Chip (SoC) designs. The yield, compute, and power-delivery limits of a monolithic-die have led to chiplet-based architectures, where multiple smaller dies called chiplets are interconnected through an interposer. This architectural paradigm provides modularity, higher yield, and cost efficiency but also introduces new design challenges related to communication bottlenecks, defect tolerance, thermal balance, and heterogeneous integration. This dissertation addresses these challenges through a series of analytical, architectural, and optimization frameworks proposed that in conjunction enable scalable, reliable, and energy-efficient heterogeneous multi-chiplet systems for deep learning acceleration. First, SWAP introduces a server-scale communication-aware framework that co-optimizes chiplet and link placement for server-scale deep learning workloads. Building on this foundation, Florets for chiplets proposes a dataflow-aware Network-on-Interposer (NoI) using space-filling curves (SFCs) to align computational and communication locality, achieving high performance and robustness even under defective chiplets. Extending the applicability of the space-filling curve-based paradigm, the TODAES paper presents a heterogeneous multi-chiplet accelerator to accelerate Transformer models, optimizing data-access patterns to improving latency and energy efficiency compared to state-of-the-art accelerators. To exploit device- and architecture-level heterogeneity, HeMu introduces a multi-objective optimization framework that integrates diverse PIM technologies (SRAM, ReRAM, and their varying architectures) in a unified 2.5D platform. It determines optimal chiplet configurations, achieving an order of magnitude higher energy-efficiency improvement over homogeneous baselines. Building on this, HetOU extends the optimization to the operation-unit level, dynamically managing wordline and bitline activations in PIM crossbars for fine-grained energy control. Finally, paving a way towards next generation of multi-chiplet architectures, an architecture-package co-design framework for glass-interposer-based systems is proposed. The architecture bridges architecture and packaging, ensuring mechanical integrity and sustained high performance within the thermal limits. Collectively, these contributions establish a unified design methodology that spans communication, architecture, device, and package co-optimization, forming a complete stack for next-generation heterogeneous computing systems. The proposed frameworks not only improve scalability, performance, and thermal balance but also lay the foundation for adaptive, ML-driven design exploration which can be exploited in future computing paradigms.

AI-DRIVEN DESIGN AND OPTIMIZATION OF HETEROGENEOUS CHIPLET SYSTEMS FOR SERVER-SCALE AI WORKLOADS

Files and links (1)

Abstract

Metrics

Details