Computer Science-Adaptive Two-level Thread Management Adaptive Two-level Thread Management for MPI Execution on Multiprogrammed Shared Memory Machines Kai Shen, Hong Tang, and Tao Yang
Department puter Science University of California, Santa Barbara 2017/12/21 1 Shen, Tang, and Yang @ puting'99 MPI-Based putation on Shared Memory Machines Shared Memory Machines (SMMs) or SMM Clusters e popular for high puting. MPI is a portable high performance parallel programming model. MPI on SMMs Threads are easy to program. But MPI is still used on SMMs: Better portability for running on other platforms (. SMM clusters); Good data locality due to data partitioning. 2017/12/21 2 Shen, Tang, and Yang @ puting'99 Scheduling for Parallel Jobs in Multiprogrammed SMMs Gang-scheduling Good for parallel programs which synchronize frequently; Affect resource utilization (Processor-fragmentation; not enough parallelism to use allocated resource). Space/time Sharing Time bined with dynamic partitioning; High throughput. Popular in current OS (., IRIX ) Impact on MPI program execution Not all MPI nodes are scheduled simultaneously; The number of available processors for each application may change dynamically. Optimization is needed for fast MPI execution on SMMs. 2017/12/21 3 Shen, Tang, and Yang @ puting'99 Techniques Studied Thread-Based MPI execution [PPoPP’99] Compile-time transformation for thread-safe MPI execution Fast context switch and synchronization munication through address sharing Two-level thread management for multiprogrammed environments Even faster context switch/synchronization Use scheduling information to guide synchronization Our prototype system: TMPI 2017/12/21 4 Shen, Tang, and Yang @ puting'99 Impact of synchronization on coarse-grain parallel programs Running munication-infrequent MPI program (SWEEP3D) on 8 SGI Origin 2000 processors with multiprogramming degree 3. Synchronization costs 43%-84% of total time. Execution time breakdown for TMPI and SGI MPI: 2017