Hardware Accelerators For Machine Learning From 3d Manycore To Processing In Memory Architectures

Download Hardware Accelerators For Machine Learning From 3d Manycore To Processing In Memory Architectures full books in PDF, epub, and Kindle. Read online Hardware Accelerators For Machine Learning From 3d Manycore To Processing In Memory Architectures ebook anywhere anytime directly on your device. Fast Download speed and no annoying ads. We cannot guarantee that every ebooks is available!

Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures

Author : Aqeeb Iqbal Arka
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.3/5 (529 download)

DOWNLOAD NOW!

Book Synopsis Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures by : Aqeeb Iqbal Arka

Download or read book Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures written by Aqeeb Iqbal Arka and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Big data applications such as - deep learning and graph analytics require hardware platforms that are energy-efficient yet computationally powerful. 3D manycore architectures are the key to efficiently executing such compute- and data-intensive applications. Through silicon via (TSV)-based 3D manycore system is a promising solution in this direction as it enables integration of disparate heterogeneous computing cores on a single system. Recent industry trends show the viability of 3D integration in real products (e.g., Intel Lakefield SoC Architecture, the AMD Radeon R9 Fury X graphics card, and Xilinx Virtex-7 2000T/H580T, etc.). However, the achievable performance of conventional through-silicon-via (TSV)-based 3D systems is ultimately bottlenecked by the horizontal wires (wires in each planar die). Moreover, current TSV 3D architectures suffer from thermal limitations. Hence, TSV-based architectures do not realize the full potential of 3D integration. Monolithic 3D (M3D) integration, a breakthrough technology to achieve "More Moore and More Than Moore," and opens up the possibility of designing cores and associated network routers using multiple layers by utilizing monolithic inter-tier vias (MIVs) and hence, reducing the effective wire length. Compared to TSV-based 3D ICs, M3D offers the "true" benefits of vertical dimension for system integration: the size of a MIV used in M3D is over 100x smaller than a TSV. However, designing these new architectures often involves optimizingmultiple conflicting objectives (e.g., performance, thermal, etc.) due to thepresence of a mix of computing elements and communication methodologies; each with a different requirement for high performance. To overcome the difficult optimization challenges due to the large design space and complex interactions among the heterogeneous components (CPU, GPU, Last Level Cache, etc.) in an M3D-based manycore chip, Machine Learning algorithms can be explored as a promising solution to this problem and. The first part of this dissertation focuses on the design of high-performance and energy-efficient architectures for big-data applications, enabled by M3D vertical integration and data-driven machine learning algorithms. As an example, we consider heterogeneous manycore architectures with CPUs, GPUs, and Cache as the choice of hardware platform in this part of the work. The disparate nature of these processing elements introduces conflicting design requirements that need to be satisfied simultaneously. Moreover, the on-chip traffic pattern exhibited by different big-data applications (like many-to-few-to-many in CPU/GPU-based manycore architectures) need to be incorporated in the design process for optimal power-performance trade-off. In this dissertation, we first design a M3D-enabled heterogeneous manycore architecture and we demonstrate the efficacy of machine learning algorithms for efficiently exploring a large design space. For large design space exploration problems, the proposed machine learning algorithm can find good solutions in significantly less amount of time than exiting state-of-the-art counterparts. However, the M3D-enabled heterogeneous manycore architecture is still limited by the inherent memory bandwidth bottlenecks of traditional von-Neumann architectures. As a result, later in this dissertation, we focus on Processing-in-Memory (PIM) architectures tailor-made to accelerate deep learning applications such as Graph Neural Networks (GNNs) as such architectures can achieve massive data parallelism and do not suffer from memory bandwidth-related issues. We choose GNNs as an example workload as GNNs are more complex compared to traditional deep learning applications as they simultaneously exhibit attributes of both deep learning and graph computations. Hence, it is both compute- and data-intensive in nature. The high amount of data movement required by GNN computation poses a challenge to conventional von-Neuman architectures (such as CPUs, GPUs, and heterogeneous system-on-chips (SoCs)) as they have limited memory bandwidth. Hence, we propose the use of PIM-based non-volatile memory such as Resistive Random Access Memory (ReRAM). We leverage the efficient matrix operations enabled by ReRAMs and design manycore architectures that can facilitate the unique computation and communication needs of large-scale GNN training. We then exploit various techniques such as regularization methods to further accelerate GNN training ReRAM-based manycore systems. Finally, we streamline the GNN training process by reducing the amount of redundant information in both the GNN model and the input graph.Overall, this work focuses on the design challenges of high-performance and energy-efficient manycore architectures for machine learning applications. We propose novel architectures that use M3D or ReRAM-based PIM architectures to accelerate such applications. Moreover, we focus on hardware/software co-design to ensure the best possible performance.

Towards Heterogeneous Multi-core Systems-on-Chip for Edge Machine Learning

Author : Vikram Jain
Publisher : Springer Nature
ISBN 13 : 3031382307
Total Pages : 199 pages
Book Rating : 4.0/5 (313 download)

DOWNLOAD NOW!

Book Synopsis Towards Heterogeneous Multi-core Systems-on-Chip for Edge Machine Learning by : Vikram Jain

Download or read book Towards Heterogeneous Multi-core Systems-on-Chip for Edge Machine Learning written by Vikram Jain and published by Springer Nature. This book was released on 2023-09-15 with total page 199 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book explores and motivates the need for building homogeneous and heterogeneous multi-core systems for machine learning to enable flexibility and energy-efficiency. Coverage focuses on a key aspect of the challenges of (extreme-)edge-computing, i.e., design of energy-efficient and flexible hardware architectures, and hardware-software co-optimization strategies to enable early design space exploration of hardware architectures. The authors investigate possible design solutions for building single-core specialized hardware accelerators for machine learning and motivates the need for building homogeneous and heterogeneous multi-core systems to enable flexibility and energy-efficiency. The advantages of scaling to heterogeneous multi-core systems are shown through the implementation of multiple test chips and architectural optimizations.

In-Memory Computing Hardware Accelerators for Data-Intensive Applications

Author : Baker Mohammad
Publisher : Springer Nature
ISBN 13 : 303134233X
Total Pages : 145 pages
Book Rating : 4.0/5 (313 download)

DOWNLOAD NOW!

Book Synopsis In-Memory Computing Hardware Accelerators for Data-Intensive Applications by : Baker Mohammad

Download or read book In-Memory Computing Hardware Accelerators for Data-Intensive Applications written by Baker Mohammad and published by Springer Nature. This book was released on 2023-10-27 with total page 145 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book describes the state-of-the-art of technology and research on In-Memory Computing Hardware Accelerators for Data-Intensive Applications. The authors discuss how processing-centric computing has become insufficient to meet target requirements and how Memory-centric computing may be better suited for the needs of current applications. This reveals for readers how current and emerging memory technologies are causing a shift in the computing paradigm. The authors do deep-dive discussions on volatile and non-volatile memory technologies, covering their basic memory cell structures, operations, different computational memory designs and the challenges associated with them. Specific case studies and potential applications are provided along with their current status and commercial availability in the market.

Hardware Accelerators in Data Centers

Author : Christoforos Kachris
Publisher : Springer
ISBN 13 : 3319927922
Total Pages : 280 pages
Book Rating : 4.3/5 (199 download)

DOWNLOAD NOW!

Book Synopsis Hardware Accelerators in Data Centers by : Christoforos Kachris

Download or read book Hardware Accelerators in Data Centers written by Christoforos Kachris and published by Springer. This book was released on 2018-08-21 with total page 280 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides readers with an overview of the architectures, programming frameworks, and hardware accelerators for typical cloud computing applications in data centers. The authors present the most recent and promising solutions, using hardware accelerators to provide high throughput, reduced latency and higher energy efficiency compared to current servers based on commodity processors. Readers will benefit from state-of-the-art information regarding application requirements in contemporary data centers, computational complexity of typical tasks in cloud computing, and a programming framework for the efficient utilization of the hardware accelerators.

Machine Learning-Enabled Vertically Integrated Heterogeneous Manycore Systems for Big-Data Analytics

Author : Biresh Kumar Joardar
Publisher :
ISBN 13 :
Total Pages : 101 pages
Book Rating : 4.6/5 (985 download)

DOWNLOAD NOW!

Book Synopsis Machine Learning-Enabled Vertically Integrated Heterogeneous Manycore Systems for Big-Data Analytics by : Biresh Kumar Joardar

Download or read book Machine Learning-Enabled Vertically Integrated Heterogeneous Manycore Systems for Big-Data Analytics written by Biresh Kumar Joardar and published by . This book was released on 2020 with total page 101 pages. Available in PDF, EPUB and Kindle. Book excerpt: The rising use of deep learning and other big-data algorithms has led to an increasing demand for hardware platforms that are computationally powerful, yet energy-efficient. Heterogeneous manycore architectures that integrate multiple types of cores on a single chip present a promising direction in this regard. However, designing these new architectures often involves optimizing multiple conflicting objectives (e.g., performance, power, thermal, reliability, etc.) due to the presence of a mix of computing elements and communication methodologies; each with a different requirement for high-performance. This has made the design, and evaluation of new architectures an increasingly challenging problem. Machine Learning algorithms are a promising solution to this problem and should be investigated further. This dissertation focuses on the design of high-performance and energy efficient architectures for big-data applications, enabled by data-driven machine learning algorithms. As an example, we consider heterogeneous manycore architectures with CPUs, GPUs, and Resistive Random-Access Memory (ReRAMs) as the choice of hardware platform in this work. The disparate nature of these processing elements introduces conflicting design requirements that need to be satisfied simultaneously. In addition, novel design techniques like Processing-in-memory and 3D integration introduces additional design constraints (like temperature, noise, etc.) that need to be considered in the design process. Moreover, the on-chip traffic pattern exhibited by different big-data applications (like many-to-few-to-many in CPU/GPU-based manycore architectures) need to be incorporated in the design process for optimal power-performance trade-off. However, optimizing all these objectives simultaneously leads to an exponential increase in the design space of possible architectures. Existing optimization algorithms do not scale well to such large design spaces and often require more time to reach a good solution. In this work, we highlight the efficacy of machine learning algorithms for efficiently designing a suitable heterogeneous manycore architecture. For large design space exploration problems, the proposed machine learning algorithm can find good solutions in significantly less amount of time than exiting state-of-the-art counterparts.On overall, this work focuses on the design challenges of high-performance and energy efficient architectures for big-data applications, and proposes machine learning algorithms capable of addressing these challenges.

Scientific Computing with Multicore and Accelerators

Author : Jakub Kurzak
Publisher : CRC Press
ISBN 13 : 1439825378
Total Pages : 495 pages
Book Rating : 4.4/5 (398 download)

DOWNLOAD NOW!

Book Synopsis Scientific Computing with Multicore and Accelerators by : Jakub Kurzak

Download or read book Scientific Computing with Multicore and Accelerators written by Jakub Kurzak and published by CRC Press. This book was released on 2010-12-07 with total page 495 pages. Available in PDF, EPUB and Kindle. Book excerpt: The hybrid/heterogeneous nature of future microprocessors and large high-performance computing systems will result in a reliance on two major types of components: multicore/manycore central processing units and special purpose hardware/massively parallel accelerators. While these technologies have numerous benefits, they also pose substantial perfo

Deep In-memory Architectures for Machine Learning

Author : Mingu Kang
Publisher : Springer Nature
ISBN 13 : 3030359719
Total Pages : 181 pages
Book Rating : 4.0/5 (33 download)

DOWNLOAD NOW!

Book Synopsis Deep In-memory Architectures for Machine Learning by : Mingu Kang

Download or read book Deep In-memory Architectures for Machine Learning written by Mingu Kang and published by Springer Nature. This book was released on 2020-01-30 with total page 181 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book describes the recent innovation of deep in-memory architectures for realizing AI systems that operate at the edge of energy-latency-accuracy trade-offs. From first principles to lab prototypes, this book provides a comprehensive view of this emerging topic for both the practicing engineer in industry and the researcher in academia. The book is a journey into the exciting world of AI systems in hardware.

Many-Core Computing

Author : Bashir M. Al-Hashimi
Publisher : Computing and Networks
ISBN 13 : 1785615823
Total Pages : 601 pages
Book Rating : 4.7/5 (856 download)

DOWNLOAD NOW!

Book Synopsis Many-Core Computing by : Bashir M. Al-Hashimi

Download or read book Many-Core Computing written by Bashir M. Al-Hashimi and published by Computing and Networks. This book was released on 2019-04 with total page 601 pages. Available in PDF, EPUB and Kindle. Book excerpt: The primary aim of this book is to provide a timely and coherent account of the recent advances in many-core computing research. Starting with programming models, operating systems and their applications; it presents runtime management techniques, followed by system modelling, verification and testing methods, and architectures and systems.

Research Infrastructures for Hardware Accelerators

Author : Yakun Sophia Shao
Publisher : Springer Nature
ISBN 13 : 3031017501
Total Pages : 85 pages
Book Rating : 4.0/5 (31 download)

DOWNLOAD NOW!

Book Synopsis Research Infrastructures for Hardware Accelerators by : Yakun Sophia Shao

Download or read book Research Infrastructures for Hardware Accelerators written by Yakun Sophia Shao and published by Springer Nature. This book was released on 2022-05-31 with total page 85 pages. Available in PDF, EPUB and Kindle. Book excerpt: Hardware acceleration in the form of customized datapath and control circuitry tuned to specific applications has gained popularity for its promise to utilize transistors more efficiently. Historically, the computer architecture community has focused on general-purpose processors, and extensive research infrastructure has been developed to support research efforts in this domain. Envisioning future computing systems with a diverse set of general-purpose cores and accelerators, computer architects must add accelerator-related research infrastructures to their toolboxes to explore future heterogeneous systems. This book serves as a primer for the field, as an overview of the vast literature on accelerator architectures and their design flows, and as a resource guidebook for researchers working in related areas.

Machine Learning-Inspired Resource Management in M3D-Enabled Manycore Architectures

Author : Anwesha Chatterjee
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.8/5 (454 download)

DOWNLOAD NOW!

Book Synopsis Machine Learning-Inspired Resource Management in M3D-Enabled Manycore Architectures by : Anwesha Chatterjee

Download or read book Machine Learning-Inspired Resource Management in M3D-Enabled Manycore Architectures written by Anwesha Chatterjee and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Monolithic 3D (M3D) integration has emerged as an enabling technology to design high performance and energy-efficient circuits and systems. The smaller dimension of vertical monolithic inter-tier vias (MIVs) lowers effective wirelength and allows high integration density. To design an energy-efficient many-core architecture, necessitates efficient resource management of the full SOC system, in terms of power and performance of the system. Voltage/frequency island (VFI)-based power management is a popular methodology for designing energy-efficient manycore architectures without incurring significant performance overhead. In an M3D chip, the vertical layers introduce inter-tier process variations that affect the performance of transistors and interconnects in different layers. Therefore, VFI-based power management in M3D manycore systems requires the consideration of inter-tier process variation effects. In this dissertation, we undertake the problem of resource management in M3D many-core architectures degraded due to inter-tier process variation effects inherent in M3D chips. Firstly, we present the design of an imitation learning (IL)-enabled VFI-based power management strategy that considers the inter-tier process-variation effects in M3D manycore chips. We demonstrate that the IL-based power management strategy can be fine-tuned based on the M3D characteristics. Our policy generates suitable V/F levels based on the computation and communication characteristics of the system for both process-oblivious and process-aware configurations. Subsequently, we propose a machine learning-based online update strategy of IL-based DVFI policies for process degraded M3D architectures. We demonstrate that with no prior knowledge of process-variation parameters, our online strategy captures the inter-tier process variations in the M3D system improving the power-performance trade-off than a process-oblivious offline DVFI policy for the degraded M3D many-core architecture. Furthermore, we show that online update strategy improves the overall energy-efficiency for unseen workloads that are not considered during offline DVFI policy creation.

Many-core Architecture for Programmable Hardware Accelerator

Author : Junghee Lee
Publisher :
ISBN 13 :
Total Pages : pages
Book Rating : 4.:/5 (88 download)

DOWNLOAD NOW!

Book Synopsis Many-core Architecture for Programmable Hardware Accelerator by : Junghee Lee

Download or read book Many-core Architecture for Programmable Hardware Accelerator written by Junghee Lee and published by . This book was released on 2013 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: As the further development of single-core architectures faces seemingly insurmountable physical and technological limitations, computer designers have turned their attention to alternative approaches. One such promising alternative is the use of several smaller cores working in unison as a programmable hardware accelerator. It is clear that the vast -- and, as yet, largely untapped -- potential of hardware accelerators is coming to the forefront of computer architecture. There are many challenges that must be addressed for the programmable hardware accelerator to be realized in practice. In this thesis, load-balancing, on-chip communication, and an execution model are studied. Imbalanced distribution of workloads across the processing elements constitutes wasteful use of resources, which results in degrading the performance of the system. In this thesis, a hardware-based load-balancing technique is proposed, which is demonstrated to be more scalable than state-of-the-art loadbalancing techniques. To facilitate efficient communication among ever increasing number of cores, a scalable communication network is imperative. Packet switching networks-on-chip (NoC) is considered as a viable candidate for scalable communication fabric. The size of flit, which is a unit of flow control in NoC, is one of important design parameters that determine latency, throughput and cost of NoC routers. How to determine an optimal flit size is studied in this thesis and a novel router architecture is proposed, which overcomes a problem related with the flit size. This thesis also includes a new execution model and its supporting architecture. An event-driven model that is an extension of hardware description language is employed as an execution model. The dynamic scheduling and module-level prefetching for supporting the event-driven execution model are evaluated.

Heterogeneous Computing Architectures

Author : Olivier Terzo
Publisher : CRC Press
ISBN 13 : 042968004X
Total Pages : 316 pages
Book Rating : 4.4/5 (296 download)

DOWNLOAD NOW!

Book Synopsis Heterogeneous Computing Architectures by : Olivier Terzo

Download or read book Heterogeneous Computing Architectures written by Olivier Terzo and published by CRC Press. This book was released on 2019-09-10 with total page 316 pages. Available in PDF, EPUB and Kindle. Book excerpt: Heterogeneous Computing Architectures: Challenges and Vision provides an updated vision of the state-of-the-art of heterogeneous computing systems, covering all the aspects related to their design: from the architecture and programming models to hardware/software integration and orchestration to real-time and security requirements. The transitions from multicore processors, GPU computing, and Cloud computing are not separate trends, but aspects of a single trend-mainstream; computers from desktop to smartphones are being permanently transformed into heterogeneous supercomputer clusters. The reader will get an organic perspective of modern heterogeneous systems and their future evolution.

Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network

Author : Adwaya Amey Kulkarni
Publisher :
ISBN 13 :
Total Pages : 224 pages
Book Rating : 4.:/5 (14 download)

DOWNLOAD NOW!

Book Synopsis Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network by : Adwaya Amey Kulkarni

Download or read book Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network written by Adwaya Amey Kulkarni and published by . This book was released on 2017 with total page 224 pages. Available in PDF, EPUB and Kindle. Book excerpt: Lightweight Machine Learning (ML) and Convolution Neural Network (CNN) can offer solutions for wearable cognitive devices and the resource-constrained Internet of Things (IoT) platforms. However, the implementation of ML and CNN kernels are computationally intensive and faces memory storage issues on tiny embedded platforms. In recent years, heterogeneous hardware and acceleration, where compute intensive tasks are performed on kernel specific cores, have gained attention with growing interest in the industry to develop tiny lightweight manycore accelerators that address these issues.

Accelerator Architecture for Secure and Energy Efficient Machine Learning

Author : Mohammad Hossein Samavatian
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (139 download)

DOWNLOAD NOW!

Book Synopsis Accelerator Architecture for Secure and Energy Efficient Machine Learning by : Mohammad Hossein Samavatian

Download or read book Accelerator Architecture for Secure and Energy Efficient Machine Learning written by Mohammad Hossein Samavatian and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: ML applications are driving the next computing revolution. In this context both performance and security are crucial. We propose hardware/software co-design solutions for addressing both. First, we propose RNNFast, an accelerator for Recurrent Neural Networks (RNNs). RNNs are particularly well suited for machine learning problems in which context is important, such as language translation. RNNFast leverages an emerging class of non-volatile memory called domain-wall memory (DWM). We show that DWM is very well suited for RNN acceleration due to its very high density and low read/write energy. RNNFast is very efficient and highly scalable, with a flexible mapping of logical neurons to RNN hardware blocks. The accelerator is designed to minimize data movement by closely interleaving DWM storage and computation. We compare our design with a state-of-the-art GPGPU and find 21.8X higher performance with 70X lower energy. Second, we brought ML security into ML accelerator design for more efficiency and robustness. Deep Neural Networks (DNNs) are employed in an increasing number of applications, some of which are safety-critical. Unfortunately, DNNs are known to be vulnerable to so-called adversarial attacks. In general, the proposed defenses have high overhead, some require attack-specific re-training of the model or careful tuning to adapt to different attacks. We show that these approaches, while successful for a range of inputs, are insufficient to address stronger, high-confidence adversarial attacks. To address this, we propose HASI and DNNShield, two hardware-accelerated defenses that adapt the strength of the response to the confidence of the adversarial input. Both techniques rely on approximation or random noise deliberately introduced into the model. HASI uses direct noise injection into the model at inference. DNNShield uses approximation that relies on dynamic and random sparsification of the DNN model to achieve inference approximation efficiently and with fine-grain control over the approximation error. Both techniques use the output distribution characteristics of noisy/sparsified inference compared to a baseline output to detect adversarial inputs. We show an adversarial detection rate of 86% when applied to VGG16 and 88% when applied to ResNet50, which exceeds the detection rate of the state-of-the-art approaches, with a much lower overhead. We demonstrate a software/hardware-accelerated FPGA prototype, which reduces the performance impact of HASI and DNNShield relative to software-only CPU and GPU implementations.

Efficient Processing of Deep Neural Networks

Author : Vivienne Sze
Publisher : Springer Nature
ISBN 13 : 3031017668
Total Pages : 254 pages
Book Rating : 4.0/5 (31 download)

DOWNLOAD NOW!

Book Synopsis Efficient Processing of Deep Neural Networks by : Vivienne Sze

Download or read book Efficient Processing of Deep Neural Networks written by Vivienne Sze and published by Springer Nature. This book was released on 2022-05-31 with total page 254 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.

Scalable Near-data Processing Systems for Data-intensive Applications

Author : Mingyu Gao
Publisher :
ISBN 13 :
Total Pages : pages
Book Rating : 4.:/5 (14 download)

DOWNLOAD NOW!

Book Synopsis Scalable Near-data Processing Systems for Data-intensive Applications by : Mingyu Gao

Download or read book Scalable Near-data Processing Systems for Data-intensive Applications written by Mingyu Gao and published by . This book was released on 2018 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Emerging big data applications, such as deep learning, graph processing, and data analytics, process massive data sets within rigorous time constraints. For such data-intensive workloads, the frequent and expensive data movement between memory and compute modules dominates both execution time and energy consumption, seriously impeding future performance scaling. Moreover, the end of silicon scaling has made all compute systems energy-constrained. It now becomes increasingly critical to address this energy bottleneck for data-intensive applications. One promising way to alleviate the inefficiencies of data movement is to avoid it altogether by executing computations closer to data locations, an approach commonly referred to as Near-Data Processing (NDP). Recent advances in integration technology allow us to implement NDP systems in a practical way by vertically stacking logic chips and memory modules. Hence, it is now the time to develop architectural support across both hardware and software levels for NDP. This involves developing practical system architectures and programming models as an easy-to-use hardware/software interface, designing efficient processing logic hardware to exploit the abundant 3D memory bandwidth, and investigating scalable software dataflow schemes that achieve optimized scheduling on the hardware resources. The focus of this dissertation is to architect practical, efficient, and scalable NDP systems for data-intensive processing. To this end, we present a coherent set of hardware and software solutions to address architectural challenges for both general-purpose and specialized computing platforms. First, we propose a practical and scalable NDP system architecture for big data applications such as deep learning and graph analytics. The architecture features simple yet efficient support for virtual memory, cache coherence, and data communication, which leads to a 2.5x energy efficiency improvement over prior NDP designs and 16x over conventional systems. Second, we design an efficient NDP compute logic HRL, which uses a reconfigurable array with both fine-grained compute units for efficient arithmetic computations, and coarse-grained logic blocks for flexible data and control flows. HRL improves the energy efficiency by 2x over conventional fine-grained and coarse-grained reconfigurable circuits. Third, we investigate domain-specific NDP accelerators for deep learning, and develop TETRIS, a neural network accelerator using 3D-stacked DRAM. We develop both the hardware architecture and dataflow scheduling for TETRIS, enabling 4x higher performance and 1.5x better energy efficiency compared to state-of-the-art accelerators. Finally, we present the enabling techniques for using dense commodity DRAM arrays as a fine-grained reconfigurable fabric called DRAF. DRAF is 10x denser and 3x more power-efficient than conventional FPGAs, and also supports multiple design contexts. These features make DRAF appropriate for cost and power constrained applications in multi-tenancy environments such as datacenters and mobile devices.

Accelerator Architectures for Deep Learning and Graph Processing

Author : Linghao Song
Publisher :
ISBN 13 :
Total Pages : 157 pages
Book Rating : 4.6/5 (721 download)

DOWNLOAD NOW!

Book Synopsis Accelerator Architectures for Deep Learning and Graph Processing by : Linghao Song

Download or read book Accelerator Architectures for Deep Learning and Graph Processing written by Linghao Song and published by . This book was released on 2020 with total page 157 pages. Available in PDF, EPUB and Kindle. Book excerpt: Deep learning and graph processing are two big-data applications and they are widely applied in many domains. The training of deep learning is essential for inference and has not yet been fully studied. With data forward, error backward, and gradient calculation, deep learning training is a more complicated process with higher computation and communication intensity. Distributing computations on multiple heterogeneous accelerators to achieve high throughput and balanced execution, however, remaining challenging. In this dissertation, I present AccPar, a principled and systematic method of determining the tensor partition for multiple heterogeneous accelerators for efficient training acceleration. Emerging resistive random access memory (ReRAM) is promising for processing in memory (PIM). For high-throughput training acceleration in ReRAM-based PIM accelerator, I present PipeLayer, an architecture for layer-wise pipelined parallelism. Graph processing is well-known for poor locality and high memory bandwidth demand. In conventional architectures, graph processing incurs a significant amount of data movements and energy consumption. I present GraphR, the first ReRAM-based graph processing accelerator which follows the principle of near-data processing and explores the opportunity of performing massive parallel analog operations with low hardware and energy cost. Sparse matrix-vector multiplication (SpMV), a subset of graph processing, is the key computation in iterative solvers for scientific computing. The efficiently accelerating floating-point processing in ReRAM remains a challenge. In this dissertation, I present ReFloat, a data format, and a supporting accelerator architecture, for low-cost floating-point processing in ReRAM for scientific computing.