A common approach to improve memory access in NUMA machines exploits operating system (OS) page protection mechanisms to induce faults to determine which pages are accessed by what thread, so as to move the thread and its working-set of pages to the same NUMA node. However, existing proposals do not fully fit the requirements of truly multi-thread applications with non-partitioned accesses to virtual pages. In fact, these proposals exploit (induced) faults on a same page-table for all the threads of a same process to determine the access pattern. Hence, the fault by one thread (and the consequent re-opening of the access to the corresponding page) would mask those by other threads on the same page. This may lead to inaccuracy in the estimation of the working-set of individual threads. We overcome this drawback by presenting a lightweight operating system support for Linux, referred to as multi-view address space, explicitly targeting accuracy of per-thread working-set estimation in truly multi-thread applications with non-partitioned accesses, and an associated thread/data migration policy. Our solution is fully transparent to user-space code. It is embedded in a Linux/x86-64 module that installs any required modification to the original kernel image by solely relying on dynamic patching. A motivated case study in the context of HPC is also presented for an assessment of our proposal.
Di Gennaro, I., Pellegrini, A., Quaglia, F. (2016). OS-Based NUMA Optimization: Tackling the Case of Truly Multi-thread Applications with Non-partitioned Virtual Page Accesses. In Cluster, Cloud and Grid Computing (CCGrid), 2016 16th IEEE/ACM International Symposium on (pp.291-300). IEEE [10.1109/CCGrid.2016.91].
OS-Based NUMA Optimization: Tackling the Case of Truly Multi-thread Applications with Non-partitioned Virtual Page Accesses
Alessandro Pellegrini;Francesco Quaglia
2016-05-01
Abstract
A common approach to improve memory access in NUMA machines exploits operating system (OS) page protection mechanisms to induce faults to determine which pages are accessed by what thread, so as to move the thread and its working-set of pages to the same NUMA node. However, existing proposals do not fully fit the requirements of truly multi-thread applications with non-partitioned accesses to virtual pages. In fact, these proposals exploit (induced) faults on a same page-table for all the threads of a same process to determine the access pattern. Hence, the fault by one thread (and the consequent re-opening of the access to the corresponding page) would mask those by other threads on the same page. This may lead to inaccuracy in the estimation of the working-set of individual threads. We overcome this drawback by presenting a lightweight operating system support for Linux, referred to as multi-view address space, explicitly targeting accuracy of per-thread working-set estimation in truly multi-thread applications with non-partitioned accesses, and an associated thread/data migration policy. Our solution is fully transparent to user-space code. It is embedded in a Linux/x86-64 module that installs any required modification to the original kernel image by solely relying on dynamic patching. A motivated case study in the context of HPC is also presented for an assessment of our proposal.File | Dimensione | Formato | |
---|---|---|---|
DiG16.pdf
solo utenti autorizzati
Tipologia:
Documento in Pre-print
Licenza:
Copyright dell'editore
Dimensione
197.33 kB
Formato
Adobe PDF
|
197.33 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.