The processor industry is at an inflection point. In the past, performance was the driving force behind the processor industry. But in the coming many-core era, improving programmability and reliability of the system will be at least as important as improving raw performance. To meet this vision, this thesis presents a processor feature that assists programmers in understanding software failures. Reproducing software failures is a significant challenge. The problem is severe especially for multi-threaded programs because the causes of failure can be non-deterministic in nature. The proposed processor feature continuously logs a program's execution while sacrificing very little performance (̃1\ %). If the program crashes, the developer can use the log to debug the failure by deterministically replaying every single instruction executed as part of the failed program's execution. Two key mechanisms enable this deterministic replay feature. One is BugNet, a checkpointing technique, which logs all of the non- deterministic input to a thread by logging the values of load instructions. The other is Strata, a logging primitive for recording shared-memory dependencies in a snoop-based or a directory-based shared-memory multi- processor. The former is sufficient for uni-processor systems and the later is required for multi-processor systems. As a proof-of-concept, this thesis presents a software implementation of BugNet replayer built using the Pin instrumentation tool. To understand the space requirements of the BugNet recorder for debugging, this thesis empirically quantifies how much of a program's execution need to be logged and replayed in order to understand the root cause of a majority of bugs. Finally, to demonstrate the utility of the deterministic replay feature, this thesis presents a software tool built using a deterministic replayer that finds data race bugs in shared-memory multi-threaded programs and automatically prioritizes them. The data race detection tool was built in collaboration with Microsoft. It has been used to find and fix data race bugs in production code, including Windows Vista and Internet Explorer