Helper Threading | richardcrispo

Course:

Survey Project:

High Performance Computer Architecture

Optimizing Application Code for Simultaneous Multi-threading and Chip Multiprocessing

Contributed on a team of three graduate students that identified the trend in microprocessor architecture towards increased levels of on-chip multiprocessing and multi-threading. The goal of the project is to identify and report on software techniques that can be used to optimize code to run on these microprocessors. The methodology is to conduct surveys on the current state of the art of research, drawing from sources such as IEEE® articles and technical conference proceedings.

The academic community has identified a bottleneck in computer architecture in the relatively large amount of cycles it takes to make a call to main memory vs. the small amount needed to process software that is already loaded in cache. Prediction based data prefetching has received popular support. Once a hot topic for increasing compiler performance, Instruction Level Parallelism (ILP) has become mature and is now demonstrating diminishing returns on research investment. (As Dr. Ben Lee from Oregon State University has declared, “there's only so much ILP you can extract.” That statement has been verified by almost every reference in the paper's bibliography.) Therefore, researchers are now looking for opportunities to exploit parallelism by way of Symmetric Multithreading (SMT).

A promising new research area is "pre-execution" techniques and thread-based pre-fetching in particular. Instead of trying to predict program flow like with pre-fetching, pre-execution techniques actually execute a subset of the original code in a thread that runs alongside the main thread. A “helper” thread (HELPER) is spawned to run ahead of the main thread (MAIN) and trigger cache misses so that the appropriate data can be loaded into cache for when MAIN needs to call it. It is important that the HELPER not get too far ahead of the MAIN because it could fill cache lines that are not needed for an extended period of time and “pollute” the cache when it could be used for something more useful. Likewise if it falls too far behind it will not fill the cache line on time and the MAIN will be better off getting its data from main memory like it normally would.

The report includes a survey-style review of

the different considerations for architecting HELPERs well, such dispatching and synchronizing them effectively
architecting HELPERs so that program correctness is not compromised
the industry tools used to assist HELPER code development
examples of the different hardware platforms and software applications on which researchers have implemented HELPERs