Data-Level Parallelism in vector, SIMD, and GPU Architectures: Introduction, Vector Architecture, SIMD Instructions Set Extensions for Multimedia, Graphics Processing Units, Detecting and Enhancing Loop-level Parallelism, Crosscutting Issues, Putting it All Together: Mobile versus Server GPUs and Tesla versus Core i7, Fallacies and Pitfalls, Concluding Remarks, Historical Perspective and References Case Study and Exercises by Jason D. Bakos.
Thread-Level Parallelism: Introduction, Centralized Shared-Memory Architectures, Performance of Symmetric Shared-Memory Multiprocessors, Distributed Shared-Memory and Directory-Based Coherence, Synchronization: The Basics, Models of Memory Consistency: An Introduction, Crosscutting Issues, Putting it All Together: Multicore Processors and Their Performance, Fallacies and Pitfalls, Concluding Remarks, Historical Perspective and References Case Studies and Exercises by Amr Zaky and David A. Wood.
Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism: Introduction, Programming Models and Workloads for Warehouse-Scale Computers, Computer Architecture of Warehouse-Scale Computers, Physical Infrastructure and Costs of Warehouse-Scale Computers, Cloud Computing: the Return of Utility Computing, Crosscutting Issues, Putting it All Together: A Google Warehouse-Scale Computer, Fallacies and Pitfalls, Concluding Remarks, Historical Perspective and References Case Studies and Exercises by ParthasarathyRanganathan.
Vector Processors in More Depth : Why Vector Processors?, Basic Vector Architecture, Two Real-World Issues: Vector Length and Stride, Enhancing Vector Performance, Effectiveness of Compiler Vectorization, Putting it All Together: Performance of Vector Processors, a Modern Vector Supercomputer: The Cray X1 Fallacies and Pitfalls, Concluding Remarks, Historical Perspective and References Exercises
Hardware and Software for VLIW and EPIC: Introduction: Exploiting Instruction-Level Parallelism Statically, Detecting and Enhancing Loop-Level Parallelism, Scheduling and Structuring Code for Parallelism, Hardware Support for Exposing Parallelism: Predicated Instructions, Hardware Support for Compiler Speculation, The Intel IA-64 Architecture and Itanium Processor, Concluding Remarks.