libcrpm: Improving the Checkpoint Performance of NVM
TimeWednesday, July 13th3:50pm - 4:10pm PDT
Location3004, Level 3
Event Type
Research Manuscript
Embedded Memory, Storage and Networking
Embedded Systems
Descriptionlibcrpm is a new programming library to improve the checkpoint performance for applications running in NVM. It proposes the failure-atomic differential checkpointing protocol, which addresses both problems that exist in the current NVM-based checkpoint-recovery libraries: (1) high write amplification when page-granularity incremental checkpointing is used, and (2) high persistence costs from excessive memory fence instructions when fine-grained undo-log or copy-on-write is used. Evaluation results show that libcrpm reduces the checkpoint overhead in realistic workloads. For MPI-based parallel applications such as LULESH, the checkpoint overhead of libcrpm is only 44.78% of FTI, an application-level checkpoint-recovery library.