# **UC Santa Cruz** # **UC Santa Cruz Previously Published Works** ## **Title** TSPC-DICE: a single phase clock high performance SEU hardened flip-flop ### **Permalink** https://escholarship.org/uc/item/1zv715np # **Author** Islam, Riadul # **Publication Date** 2016-09-04 Peer reviewed # TSPC-DICE: A Single Phase Clock High Performance SEU Hardened Flip-flop Shah M. Jahinuzzaman and Riadul Islam Electrical and Computer Engineering Concordia University Montreal, QC, Canada shah@ece.concordia.ca Abstract—This paper presents a true single-phase clock (TSPC) flip-flop that is robust against radiation-induced single event upsets (SEUs) or soft errors. The flip-flop consists of an input stage that uses a single phase clock to pass the data to a storage unit at the positive edge of the clock. The single phase clock enables designing power-efficient and easily-routed clock-tree and reducing the NBTI effect on the setup and hold times. The storage unit consists of the SEU robust dual interlocked cell (DICE), which has four nodes that replicate the data bit and its complement for recovering from a single event transient (SET). Two nodes with the same logic value inside the storage unit drive a C-element at the output. The C-element masks the propagation of any SET from the internal nodes of the storage unit to the output. The proposed flip-flop consists of only 22 transistors, consumes smaller area, and exhibits as much as 12% lower power-delay product when compared with a recently reported SEU robust flip-flop implemented in a commercial 65nm CMOS technology. Keywords-Cosmic radiation, single event upset, flip-flop, fault tolerance. #### I. Introduction Cosmic radiation-induced single event transient (SET) has emerged as a critical reliability concern for integrated circuits in sub-100 nanometre CMOS technologies [1]. Cosmic radiation, which primarily comprises of neutrons at the ground level, originates from intergalactic rays and as such, has a higher flux density at aircraft altitudes. However, the ground level neutron flux (~20 neutrons/cm²/s at New York City) is sufficient to interact with the silicon atoms in the substrate and generate unwanted charge. The amount of the unwanted charge is comparable to the signal charge of a circuit in nanometric technologies [2]. In particular, when collected by a sensitive circuit node, the unwanted charge causes a voltage transient at the collecting node. The transient is referred to as an SET, which can alter a logic value ('0' to '1' or vice versa) if the amplitude and duration of the SET is large. When an SET changes the stored value in a memory element or latch, it is referred to as a single event upset (SEU) or often a 'soft error' as it does not permanently damage the device. However, SEUs cause computational errors, which can lead to system malfunctions. Accordingly, state-of-the-art microprocessors require SEU protection [3], [4]. Since a Figure 1. SEU mechanisms in a typical master-slave D flip-flop. microprocessor or a system-on-chip (SOC) consists of a large number of flip-flops, making the flip-flops SEU robust is vital to ensure the overall reliability of the system. Typically, a flip-flop experiences an SEU through two possible mechanisms: i) by latching an SET arriving at the input data line during the latching window of the clock and ii) by having an SET at a sensitive node of the latch inside the flip-flop. Fig. 1 illustrates these two mechanisms for a masterslave D flip-flop. In the first mechanism, since the SET cannot be distinguished from the data, managing the resulting SEU is very difficult and incurs unacceptable performance penalty. In contrast, for protecting from the second SEU mechanism, the flip-flop can be made robust by applying circuit techniques while satisfying the required performance metrics. This paper presents a high performance flip-flop that belongs to this latter approach. In particular, we propose a true single phase clock (TSPC) flip-flop that is based on the radiation-hardened, i.e., SEU tolerant dual interlocked cell (DICE) [5]. The key contributions of the paper are: i) use of an efficient TSPC type input stage to write the data into the DICE cell, ii) use of a Celement as the output buffer, and iii) the area-powerperformance comparison of the proposed flip-flop with other DICE-based and conventional flip-flops. The proposed flipflop has been designed and laid out in a commercial 65nm CMOS process. Post layout simulations confirm at least 12% lower power delay product (PDP) when compared with a DICE-based flip-flop reported in [3]. Figure 2. a) Spatial and b) temporal redundancy schemes, and c) the SEU robust DICE latch for HBD schemes. #### II. SEU HARDENING TECHNIQUES FOR FLIP-FLOPS Unlike cache memories, the irregular distribution of flipflops across the chip makes it difficult to protect them using the parity check or error correction code (ECC). Instead, the protection techniques involve either the redundancy or the circuit hardening by design (HBD). Redundancy can again be spatial or temporal. The most commonly used spatial redundancy method is the triple modular redundancy (TMR). TMR replicates the hardware, such as a flip-flop three times and applies majority voting to extract the correct data in the case of an SEU (see Fig. 2(a)). While this technique corrects an SEU in any latch inside the replicated flip-flops (mechanism-ii of Fig. 1), the technique fails to detect and correct an SEU caused by an SET on the data line (mechanism-i of Fig. 1). The temporal redundancy technique, on the other hand, samples the data at different times (Clk1, Clk2, and Clk3 in Fig. 2(b)) with an interval greater than the pulse width of the SET. Then it stores the sampled values in different latches and uses majority voting to determine the correct data [6]. This technique can detect and correct a SEU for an SET on the data line. In addition, since it involves majority voting of replicated latches, it can correct a SEU occurred inside any of the latches. However, both of these redundancy techniques incur large area and power penalties (~3x for TMR) in the replicated circuits and performance penalty in the sampling and/or voter circuit. In contrast, HBD techniques employ SEU immune latches instead of replicating the hardware. In the event of an SET at any of the sensitive nodes, these latches prevent flipping of the data stored in the flip-flops [3], [5], [7], [8]. Although the HBD techniques cannot correct SEUs caused by the data line SETs, they are more attractive than the redundancy schemes because of the significantly lower area, power, and delay penalties. The most commonly used HBD flip-flops are based on the eight transistor DICE cell shown in Fig. 2(c) [5]. The cell stores a logic '0' or a logic '1' as a combination of four node voltages, two nodes holding the original data and two nodes the complement of the data. When the state of any node is modified by an SET, other unaffected nodes help to restore the correct value of the affected node. This is because, one transistor of each inverter driving one of the affected nodes is driven by one unaffected node (see Fig. 2(c)). The transistor can supply the current to restore the correct logic value at the affected node. Thus, as long as only one node is affected by an SET, the DICE cell shows an excellent SEU immunity. Figure 3. Proposed TSPC-DICE flip-flop. Typically, DICE-based flip-flops use either a single DICE cell as the storage element or two DICE cells in a master-slave configuration. An example of the former is the flip-flop proposed in [3]. We refer this flip-flop as "pulsed DICE" as it consists of a pulsed transfer gate coupled with the DICE cell. While this flip-flop has no sizing constraints on the transistors inside the DICE cell, it shows higher power consumption, particularly at low data activities. Naseer et. al. proposed a delay filtered DICE (DF-DICE) flip-flop where they use the input data signal and a delayed version of it to a C-element that conditionally passes the data to the DICE storage cell. Consequently, the DF-DICE suffers from significant performance penalty. Similarly, the master-slave DICE (MS-DICE) flip-flop suffers from large area overhead and speed penalty [8]. This necessitates the design of an SEU robust flipflop with minimal power and performance penalty in order to meet the overall power budget and reliability of microprocessors and systems-on-chip (SOCs). #### III. PROPOSED TSPC-DICE FLIP-FLOP In this section we propose a DICE-based true single phase clock (TSPC) flip-flop that offers the SEU immunity at low power and area penalties. Fig. 3 shows the proposed flip-flop. It consists of a TSPC input stage, the SEU hardened DICE latch, and a C-element output stage. An equalizer transistor M18 works in conjunction with the input stage to enable writing into the DICE latch at the rising edge of the clock (clk). For a stored data value of '1' in the flip-flop, the voltages at internal nodes A, B, C, and D are '1', '0', '1', and '0'. For a stored data value of '0', the node voltages are the opposite. Figure 4. Simulation waveforms of the internal nodes and output of the TSPC-DICE flip-flop. The operation of the flip-flop can be described with reference to Fig. 3 and Fig. 4. When clk='0', node X is precharged to the complement of the data while node Y is precharged to '1'. Consequently, M7 and M8 are OFF, leaving node B at a logic value determined by the DICE latch. When clk becomes '1', the data is written into the DICE latch in two ways. If the data is '1' and clk='1', node X is '0' and node Y remains at '1' (see Fig. 4), which pulls down node B and turns on M18. A low-impedance path through M18 then pulls down node D, changing the voltages at nodes A and C from '0' to '1'. Since nodes B and D are both '0', output node O is pulled up to '1', which is the same as the input data. On the other hand, if the data is '0' and clk='1', node X is '1' and node Y is pulled down to '0'. This pulls up node B through M7 if node B (and hence node D) was previously holding '0'. Subsequently, node D is also pulled up through M18 (and M16), and node A and C are updated. The pull-up of the node D potential using the equalizer M18 requires M18 and M7 be large enough to quickly overpower M17, which is driven by node A. In addition, M13 and M15 are made slightly larger than the minimum sized M11 and M17 in order to facilitate the write process faster. In fact, by driving both nodes B and D to the same potential, we write into the DICE latch. In contrast, it is assumed that an SET can affect only one node of the DICE latch, thus failing to upset it. In order to validate this assumption in the implemented design, we place similar potential nodes (nodes B and D or nodes A and C) as far as possible in the layout. Such layout minimizes neighbouring nodes' charge sharing, which can potentially upset the DICE latch in nanometric technologies [9], [10]. #### IV. IMPLEMENTATION AND PERFORMANCE EVALUATION We have designed and laid out the proposed TSPC-DICE flip-flop, the conventional master-slave D flip-flop, a master-slave DICE flip-flop similar to [8], however, without preset and clear, and the pulsed DICE flip-flop in a commercial 65nm TABLE I. AREA AND DELAY COMPARISON OF FLIP-FLOPS | Flip-flop Types | # of<br>Transistors | Layout Area<br>(µm²) | C-Q<br>Delay (ps) | |------------------------|---------------------|----------------------|-------------------| | Master-salve D | 22 | 12.75 | 36.8 | | Master-slave DICE | 36 | 23.09 | 74.3 | | Pulsed DICE (Ref. [3]) | 32 | 18.83 | 63.7 | | TSPC DICE (this work) | 22 | 18.62 | 63.5 | Figure 5. a) Monte-Carlo simulations of C-Q delay of the TSPC-DICE flip-flop and b) testbench for the power measurement. CMOS technology. The layout areas of these flip-flops are listed in Table I, which shows that the proposed TSPC-DICE flip-flop consumes comparable or even lower area than the pulsed DICE flip-flop reported in [3]. The performance of the flip-flops is evaluated using post layout simulations at a clock frequency of 2 GHz and a supply voltage of 1 V. #### A. Delay and Power The clock-to-output (C-Q) delays of the flip-flops are measured under relaxed timing conditions and listed in Table I. The distribution of the C-Q delay of the TSPC DICE flip-flop under varying process and mismatch conditions at 27 °C is shown in Fig. 5(a). Once the C-Q delays are characterized, the flip-flops' setup-time is extracted by moving the data edge closer to the clock's rising edge until the C-Q delay begins to rise. We define the setup time as the point where the C-Q delay is 20% greater than the nominal C-Q delay. This point is obtained using an OCEAN script, which automatically varies the clock and data edges. The worst-case setup time thus found for the proposed TSPC DICE flip-flop is 18 ps, which is only 8% higher than that of the master-slave D flip-flop. In order to measure the power consumption, both the internal power of a flip-flop and the loading on the clock and data lines are considered. To quantify the clock and data loading by the flip-flop, the average currents through the final inverters in the clock and data buffers are measured. These currents give the clock and data power ( $P_{\rm clk}$ and $P_{\rm dut}$ ), respectively (see Fig. 5(b)). Then, the current into the flip-flop itself for a fan-out of five is measured and used to calculate its internal power ( $P_{\rm int}$ ). The total power of a flip-flop is then given by: $P_{\rm FF} = P_{\rm int} + P_{\rm clk} + P_{\rm duta}$ . Fig. 6 compares $P_{\rm FF}$ and the power-delay (C-Q) product (PDP) at different data activities. Since the master-slave DICE flip-flop incurs significantly large area and delay penalties, we exclude it in the power-delay analysis. Figure 6. a) Power consumption and b) power-delay product of flipflops at different data activities. Figure 7. Simulated transient waveforms showing the response of the internal nodes and output of the TSPC-DICE flip-flop to SETs. #### B. SEU Immunity The SEU robustness of the proposed TSPC-DICE flip-flop is verified in the SPICE by injecting an exponential current pulse at nodes A, B, C, and D, mimicking a particle-induced SET. Results show that all nodes are capable of recovering from both a '1'-to-'0' SET and a '0'-to-'1' SET. Fig. 7 shows a transient simulation of two SETs occurring at nodes A and B at different times. As evident, the data stored in the latch is unchanged and the output (Q) is not at all disturbed during each SET. The latter property of the flip-flop is very advantageous because it masks the SET from propagating to the next logic stage and causing computational error in the pipeline. #### V. DISCUSSIONS The proposed TSPC DICE flip-flop consists of the same number of transistors as the conventional D flip-flop, however, offers the robustness against the SEU. The single phase architecture enables designing a power efficient and less-area consuming clock tree. In addition, it limits the effects of the negative bias temperature instability (NBTI), which is a mechanism of increasing the PMOS threshold voltage over time due to hydrogen diffusion in the gate dielectric of an 'ON' PMOS. Since the PMOS transistors in the TSPC input stage are 'OFF' for more than half of their lifecycle, the NBTI effect on these transistors and hence on the setup and hold time of the flip-flop is minimal. The TSPC-DICE flip-flop offers SEU robustness in two ways. First, it is robust against an SET in the latch while holding the data (static). Second, unlike the pulsed DICE flip-flop in [3], it prevents propagation of any SET from an internal node to the output (dynamic). The SEU robustness comes at the expense of higher power consumption than the conventional D flip-flop (see Fig. 6(a)). However, the TSPC-DICE flip-flop consumes less power than the pulsed DICE flip-flop for moderate ( $\sim$ 50%) to low data activities. In particular, for the similar C-Q delay, the TSPC-DICE flip-flop exhibits 12% lower PDP at a data activity of 12.5%. It should be noted that despite fewer transistors and smaller $P_{im}$ , the total power consumption of the proposed TSPC-DICE flip-flop is not overwhelmingly lower than the pulsed DICE flip-flop. This is because, the former exhibits a larger clock loading ( $P_{cik}$ ). #### VI. CONCLUSION We have presented an area and power efficient true single phase clock radiation-hard flip-flop. The single phase architecture is attractive for designing power-efficient clock trees and limiting the NBTI effect on the setup and hold times. The use of a C-element output buffer masks the propagation of any transient from the internal nodes of the flip-flop to the next logic stage in the pipeline. Thus, the flip-flop can be used to realize fault tolerant SOCs and microprocessors, which experience single event transients in nanometric technologies. #### ACKNOWLEDGMENT The authors are thankful to Dr. David Rennie of the University of Waterloo for his help in the layout. #### REFERENCES - R. C. Baumann, "Soft errors in advanced computer systems," *IEEE Des. Test. Comput.*, vol. 22, no. 3, pp. 258–266, May/Jun. 2005. - [2] P. E. Dodd and L. W. Massengill, "Basic mechanisms and modeling of single-event upset in digital microelectronics," *IEEE Trans. Device Mat. Rel.*, vol. 50, no. 3, pp. 583–602, Jun. 2003. - [3] D. Krueger, E. Francom, and J. Langsdorf, "Circuit design for voltage scaling and SER immunity on a quad-core Itanium® processor," ISSCC Dig. Tech. Papers, pp. 94–95, 2008. - [4] S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, "Robust system design with built-in soft-error resilience," *Computer*, vol. 38, no. 2, pp. 43–52, Feb. 2005. - [5] T. Calin, M. Nicolaidis, R. Velazco, "Upset hardened memory design for submicron CMOS technology," *IEEE Trans. Nucl. Sci.*, vol. 43, no. 6, pp. 2874–2878, Dec. 1996. - [6] D. G. Mavis and P. H. Eaton, "Soft error rate mitigation techniques for modern microcircuits," in *Proc. Int. Rel. Phys. Symp.*, Dallas, TX, Apr. 2002, pp. 216 – 225. - [7] R. Naseer and J. Draper, "DF-DICE: a scalable solution for soft error tolerant circuit design," in *Proc. IEEE Int. Symp. on Circuits and Systems*, Island of Kos, Greece, May 2006, pp. 3890-3893. - [8] W. Wang and H. Gong, "Edge triggered pulse latch design with delayed latching edge for radiation hardened application," *IEEE Trans. Nucl. Sci.*, vol. 51, no. 6, pp. 3626-3630, Dec. 2004. - [9] O. A. Amusan, L. W. Massengill, M. P. Baze, A. L. Sternberg, A. F. Witulski, B. L. Bhuva, and J. D. Black, "Single event upsets in deep-submicrometer technologies due to charge sharing," *IEEE Trans. Device Mater. Rel.*, vol. 8, no. 3, pp. 582–589, Sep. 2008. - [10] M. Haghi and J. Draper, "The 90 nm Double-DICE storage element to reduce Single-Event upsets," *IEEE Int. Midwest Symp. on Circuits and Syst.*, Cancun, Mexico, 2009, pp. 463-466.