# **UC Irvine** ## **UC Irvine Electronic Theses and Dissertations** #### **Title** New Circuit Techniques Enabling Millimeter-Wave and Terahertz Transceivers in Nanoscale Silicon #### **Permalink** https://escholarship.org/uc/item/6v50776h #### **Author** Wang, Zheng #### **Publication Date** 2014 Peer reviewed|Thesis/dissertation #### UNIVERSITY OF CALIFORNIA, #### **IRVINE** # New Circuit Techniques Enabling Millimeter-Wave and Terahertz Transceivers in Nanoscale Silicon #### **DISSERTATION** submitted in partial satisfaction of the requirements for the degree of #### DOCTOR OF PHILOSOPHY in Electrical and Computer Engineering by Zheng Wang Dissertation Committee: Professor Payam Heydari, Chair Professor Michael Green Professor Ender Ayanoglu To my parents and Qian ## **TABLE OF CONTENTS** | | Page | |--------------------------------------------------------------------------------------------|------| | LIST OF FIGURES | v | | LIST OF TABLES | ix | | ACKNOWLEDGEMENTS | X | | CURRICULUM VITAE | xii | | ABSTRACT OF THE DISSERTATION | xiv | | Chapter 1 Introduction | 1 | | 1.1 Motivation | 1 | | 1.2 Organization | 4 | | Chapter 2 Amplifier in near- $f_{max}$ Region | 5 | | 2.1 Neutralization Technique for a differential pair | 5 | | 2.2 Revisiting the Neutralization Technique | 7 | | 2.3 Study on achieving the upper limit of power gain | 12 | | 2.3.1 Relationship between $U$ and $G_{max}$ | 13 | | 2.3.2 The Gain-Plane | 15 | | 2.3.3 Moving locus of Y, Z embedding network in the Gain-Plane | 20 | | 2.4 Amplifier Design in near-f <sub>max</sub> Region – Application of Gain-Plane Technique | 22 | | 2.5 Chapter Summary | 25 | | Chapter 3 A CMOS 210GHz Fundamental Transceiver | 26 | | 3.1 System Architecture | 26 | | 3.2 On-Chip Antenna and Balun Design. | 28 | | 3.2.1 2×2 Antenna Array | 28 | | 3.2.2 Marchand Balun | 32 | | 3.3 Active Circuits Design | 34 | | 3.3.1 200GHz Transistor Layout for Power Amplifier | 34 | | 3.3.1 210GHz Power Amplifier Design | 37 | | 3.3.2 210GHz Low Noise Amplifier Design | 41 | |-----------------------------------------------------------------------------|----| | 3.3.3 210GHz Voltage-Controlled Oscillator Design | 44 | | 3.3.4 210GHz Power Detector Design | 51 | | 3.4 Measurement Results | 53 | | 3.5 Chapter Summary | 60 | | Chapter 4 A W-band Passive Imaging with Spatial-Overlapping Super Pixels | 62 | | 4.1 Proposed 9-Element Receiver Array | 62 | | 4.2 Design of a W-band Power Splitter with Built-in True Time Delay Circuit | 69 | | 4.3 Measurement Results | 75 | | 4.4 Chapter Summary | 79 | | Chapter 5 Conclusions | 80 | | Bibliography | 81 | ## LIST OF FIGURES | Pa | age | |---------------------------------------------------------------------------------------------------------------|-----| | Fig. 1.1 The attenuation of electromagnetic waves in the air [6] | 2 | | Fig. 1.2 The ITRS 2008 [16] | 3 | | Fig. 2.1 Neutralization technique for differential pair (a) schematic (b) MAG | 7 | | Fig. 2.2 RF small signal model for MOSFET | 8 | | Fig. 2.3 Maximum power gain with respect to neutralization capacitance at (a) 100GHz 200GHz | | | Fig. 2.4 The $G_{max}$ curves with different $C_n$ cross over at the frequency point $(f_{max})$ | 12 | | Fig. 2.5 A conjugate-matched amplifier employing an embedded transistor | 13 | | Fig. 2.6 Stability regions in the <i>U/A</i> plane | 17 | | Fig. 2.7 The graphic representation of normalized unity gain $G_{max}/U$ in the $U/A$ plane | 18 | | Fig. 2.8 The loci of normalized gain $G_{max}/U$ in the $U/A$ plane | 19 | | Fig. 2.9 Y-embedding for a transistor | 20 | | Fig. 2.10 Z-embedding for a transistor | 21 | | Fig. 2.11 Moving locus of Y, Z embedding network in the gain-plane | 22 | | Fig. 2.12 Y-embedding (adding neutralization cap) for 100GHz transistor (a) gain-plane $G_{max}$ - $C_n$ plot | | | Fig. 2.13 Y-embedding (adding neutralization cap) for 200GHz transistor (a) gain-plane $G_{max}$ - $C_n$ plot | | | Fig. 2.14 Combining Y-embedding and Z-embedding for a transistor | 25 | | Fig. 3.1 The 210GHz fully integrated differential TRX architecture | 27 | | Fig. 3.2 the novel balun-based differential power distribution network | 28 | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----| | Fig. 3.3 on-chip shielded dipole antenna (a) structure (b) radiation efficiency with respect frequency | | | Fig. 3.4 Antenna parameters (a) gain and return loss (b) pattern at 210GHz | 31 | | Fig. 3.5 2×2 antenna array parameters (a) 3-D pattern (b) antenna coupling | 32 | | Fig. 3.6 Marchand balun | 33 | | Fig. 3.7 Marchand balun performance S-Parameters and phase difference | 34 | | Fig. 3.8 200GHz transistor layout issues (a) U's sensitivity to $C_{gs}$ and $C_{gd}$ (b) Post layout $f_{max}$ w different finger width | | | Fig. 3.9 Floorplan for transistor layout (a) top view (b) side view | 36 | | Fig. 3.10 210GHz CMOS power amplifier schematic | 37 | | Fig. 3.11 Over-neutralization technique (a) $C_{gd}$ of similar MOSFET as $C_n$ (b) gain boosting | 38 | | Fig. 3.12 Measurement result for PA (a) Pout, Gain and PAE with respect to Pin (b) Power gawith respect to frequency | | | Fig. 3.13 Low noise amplifier design (a) $NF_{min}$ with respect current density (b) simultaneous no and power match (c) trade-off between reflection and $G_{max}$ (d) $NF_{min}$ and $Kf$ with respect $L_{deg}$ | | | Fig. 3.14 210GHz CMOS LNA schematic | 43 | | Fig. 3.15 Measurement results for LNA | 44 | | Fig. 3.16 Source degeneration for cross coupled pair (a) schematic (b) equivalent circuit | 45 | | Fig. 3.17 Negative resistive source degeneration (a) resistance only (b) with parasitic capacitar | | | Fig. 3.18 Simulation effective parallel resistance $R_p$ and effective parallel resistance $C_p$ for crocoupled pair with source degeneration (a) sweep $R_s$ , when $C_s = 0$ (b) sweep $C_s$ , when $-R_s = -9$ | | | Fig. 3.19 210GHz CMOS VCO schematic | 49 | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------| | Fig. 3.20 Measurement results for VCO (a) tuning range and output power (b) phase | e noise 50 | | Fig. 3.21 The schematic of the 210GHz CMPS OOK envelop detector ( <i>M1</i> and <i>M2</i> ) baseband amplifier | _ | | Fig. 3.22 simulation results for detector (a) Input return loss (b) output swing of $V_D$ input signal of 200 GHz (c) responsivity versus $P_{\rm in}$ with input carrier frequency of baseband signal with BW of 5 GHz | 200 GHz and | | Fig. 3.23 Die photo of the TRX | 54 | | Fig. 3.24 Transmitter spectrum measurement (a) test set up (b) measured IF s down-conversion | _ | | Fig. 3.25 Transmitter power measurement (a) test set up (b) measurement radiation | pattern 57 | | Fig. 3.26 Continuous wave wireless link over 3.5cm (a) test set up (b) measu spectrum after receiver | | | Fig. 3.27 Measured output SNR with respect to baseband signal frequency | 58 | | Fig. 4.1 A 2×2 subarray | 63 | | Fig. 4.2 Conventional array of non-overlapping subarrays | 63 | | Fig. 4.3 New array of overlapping subarrays | 63 | | Fig. 4.4 Amplitude and delay weighting coefficient diagram | 65 | | Fig. 4.5 Single RF path circuit diagram | 67 | | Fig. 4.6 Single super-pixel circuit diagram | 68 | | Fig. 4.7 Complete array block diagram | 69 | | Fig. 4.8 Three-stage distributed true time delay circuit | 70 | | Fig. 4.9.1:2 Active power splitter with true time delay control | 71 | | Fig. 4.10 1:4 Active power splitter with true time delay control | 72 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------| | Fig. 4.11 Chip micrograph of the 1:4 active power splitter with true time delay contra | rol 72 | | Fig. 4.12 Measured true time delay circuit s-parameters across all delay states. (a)S <sub>1</sub> | | | Fig. 4.13 (a) Measured phase vs. frequency response and best-fit straight lines of tactive power splitter with true time delay for 7 delay settings. (b) Error of measurespect to best-fit data versus frequency | ed delay with | | Fig. 4.14 Die photo of the 9-element imaging array receiver | 75 | | Fig. 4.15 Subarray beam steering radiation patterns compared with single ante patterns | | | Fig. 4.16 Single antenna beam pattern for various VGA gain settings | 77 | | Fig. 4.17 Reconstructed images from the outputs of the four individual 2×2 subarray the composite image obtained by combining the four overlapping subarray images | • | ## LIST OF TABLES | | | Page | |-----------|----------------------------------------------------|------| | Table 3.1 | Performance summary of the 210GHz CMOS transceiver | 59 | | Table 3.2 | Comparison table | 60 | | Table 4.1 | Summary of the Receiver Array Performance | 78 | | Table 4.2 | Performance Comparison of W-Band Imagers | 79 | #### **ACKNOWLEDGEMENTS** Firstly, I would like to express my deepest appreciation to my research advisor, Professor Payam Heydari, for providing me this great opportunity to work in the Nanoscale Communication Integrated Circuits Laboratory, also for his kind support and excellent guidance. Without his persistent support and guidance, this dissertation would not have been possible. Along with Professor Heydari, I would like also thank to Professor Michael Green and Professor Ender Ayanoglu for serving as my dissertation committee and providing me valuable suggestions. In addition, I would like to thank Dr. Zhiming Chen and Dr. Chun-Cheng Wang, without their helps, the challenging 210GHz TRx project could not be delivered. And also thank to Dr. Pei-Yuan Chiang, Mr. Peyman Nazari, Dr. Francis Caster, Dr. Leland Gilreath, my teammates on the research projects, for their collaboration. Great discussions, great collaboration, great teamwork! And I would like to thank all the current members and alumni of the Nanoscale Communication Integrated Circuits Lab for the helpful suggestions, friendships, and the excellent discussion environment. I would like to thank Leading Edge Access Program (LEAP), MOSIS, Towerjazz for chip fabrication. I also like to thank Professor Rebeiz's Lab, JPL, Rohde & Schwarz and Anritsu, for their kind helps on the measurements. Also I would like to thank all my friends for their helps. Due to the limited pages, I am sorry I could not list all of their names. Every one of you is very important to me. Last, but not least, I want to express sincere gratitude to my family. Thank you for your support, understanding, and encouragement. Finally, I would like to thank my wife Qian. You are my life! #### **CURRICULUM VITAE** ## **Zheng Wang** #### Education 09/2010-06/2014 Ph.D. in Electrical and Computer Engineering, University of California, Irvine GPA: 4.0/4.0 09/2007-07/2010 M.S. in Microelectronics, Tsinghua University, China GPA: 3.8/4.0 08/2003-07/2007 B.S. in Electronic Engineering, Tsinghua University, China GPA: 3.7/4.0 ## **Experience** | 06/2013-03/2014 | Engineering Intern, Broadcom Corporation, Irvine, CA | |-----------------|------------------------------------------------------| | 06/2012-12/2012 | Engineering Intern, Broadcom Corporation, Irvine, CA | 07/2011-12/2011 Engineering Intern, Quantenna Communications, Fremont, CA #### **Publications** - [1] P. Chiang, Z. Wang, O. Momeni, P. Heydari, "A Silicon-Based THz Frequency Synthesizer With Wide Locking Range", Invited to *IEEE J. Solid-States Circuits Special Issue of ISSCC* 2014, Dec. 2014 - [2] F. Caster, L. Gilreath, S. Pan, Z. Wang, F. Capolino, P. Heydari, "Design and Analysis of a W-band 9-Element Imaging Array Receiver Using a New Concept of Spatial-Overlapping Super-Pixels in Silicon", *IEEE J. Solid-States Circuits*, vol. 49, no.6, June 2014 - [3] Z. Wang, P. Chiang, P. Nazari, C. Wang, Z. Chen, P. Heydari, "A CMOS 210GHz - Fundamental Transceiver with OOK Modulation", *IEEE J. Solid-States Circuits*, vol. 49, no.3, pp. 564-580, Mar. 2014. - [4] P. Chiang, Z. Wang, O. Momeni, P. Heydari, "A 300GHz Frequency Synthesizer with 7.9% Locking Range in 90nm SiGe BiCMOS", *IEEE ISSCC Dig. Tech. Papers*, pp. 260-261, Feb. 2014. - [5] P. Nazari, B. Chun, V. Kumar, E. Middleton, Z. Wang, and P. Heydari, "A 130nm CMOS Polar Quantizer for Cellular Applications", *IEEE RFIC Symp. Dig.*, Jun. 2013. - [6] F. Caster, L. Gilreath, S. Pan, Z. Wang, F. Capolino, P. Heydari, "A 93-113GHz BiCMOS 9-element Imaging Array Receiver Utilizing Spatial-Overlapping Pixels with Wideband Phase and Amplitude Control", *IEEE ISSCC Dig. Tech. Papers*, pp. 144-145, Feb. 2013. - [7] Z. Wang, P. Chiang, P. Nazari, C. Wang, Z. Chen, P. Heydari, "A 210GHz Fully Integrated Differential Transceiver with Fundamental Frequency VCO in 32 nm SOI CMOS", *IEEE ISSCC Dig. Tech. Papers*, pp. 136-137, Feb. 2013. ### ABSTRACT OF THE DISSERTATION New Circuit Techniques Enabling Millimeter-Wave and Terahertz Transceivers in Nanoscale Silicon By ## Zheng Wang Doctor of Philosophy in Electrical and Computer Engineering University of California, Irvine, 2014 Professor Payam Heydari, Chair The vastly under-utilized spectrum in the sub-THz frequency range enables disruptive applications including 10Gb/s chip-to-chip wireless communications and imaging/spectroscopy. Owing to aggressive scaling in feature size and device $f_T/f_{max}$ , nanoscale CMOS technology potentially enables integration of sophisticated systems at this frequency range. This dissertation mainly focuses on the design of a 210GHz fundamental transceiver and also covers the design of a W-band fully integrated imaging system utilizing a novel concept of spatial-overlapping super pixels. Firstly, a 210GHz transceiver with OOK modulation in a 32nm SOI CMOS process $(f_T/f_{max}=250/320\text{GHz})$ is presented. The transmitter (TX) employs a 2×2 spatial combining array consisting of a double-stacked cross-coupled voltage controlled oscillator (VCO) at 210GHz with an on-off-keying (OOK) modulator, a power amplifier (PA) driver, a novel balun-based differential power distribution network, four PAs and an on-chip $2\times2$ dipole antenna array. The non-coherent receiver (RX) utilizes a direct detection architecture consisting of an on-chip antenna, a low noise amplifier (LNA), and a power detector. The VCO generates measured -13.5dBm output power; and the PA shows a measured 15dB gain and 4.6dBm $P_{sat}$ . The LNA exhibits a measured in-band gain of 18dB and minimum in-band noise figure (NF) of 11dB. The TX achieves an EIRP of 5.13dBm at 10dB back-off from saturated power. It achieves an estimated EIRP of 15.2dBm when the PAs are fully driven. This is the first demonstration of a fundamental frequency CMOS transceiver at the 200GHz frequency range. Secondly, a W-band direct-detection-based receiver array in an advanced 0.18µm BiCMOS process is presented, which incorporates a new concept of spatial-overlapping super-pixels for millimeter-wave imaging applications. The use of spatial-overlapping super-pixels results in (1) improved SNR at the pixel level through a reduction of spillover losses, (2) partially correlated adjacent super-pixels, (3) a 2×2 window averaging function in the RF domain, (4) the ability to compensate for the systematic phase delay and amplitude variations due to the off-focal-point effect for antennas away from the focal point, and (5) the ability to compensate for mutual coupling effects among the array elements. The receiver chip achieves a measured peak coherent responsivity of 1,150MV/W, an incoherent responsivity of 1,000MV/W, a minimum noise-equivalent power (NEP) of 0.28fW/Hz<sup>1/2</sup> and a front-end 3-dB bandwidth from 87–108GHz, while consuming 225mW per receiver element. The measured noise-equivalent temperature difference (NETD) of the SiGe receiver chip is 0.45K with a 20ms integration time. # **Chapter 1 Introduction** # 1.1 Motivation The vastly under-utilized spectrum in the millimeter-wave (MMW) / Terahertz (THz) frequency range enables disruptive applications including 10-gigabit chip-to-chip wireless communications and imaging/spectroscopy. On the imaging applications front, THz imaging is considered to be one of the emerging technologies [1]. Electromagnetic wave at these frequencies can pass through non-conducting materials. Meanwhile, many materials have a fingerprint spectrum at MMW/THz frequency range, making it possible to be used in non-ionized imaging and material spectroscopy [1]-[3]. On the sensing and communications front, the availability of broad unlicensed frequency spectrum across the MMW/THz frequency range unfolds new ideas on super-precise sensing at micrometer-level and multi-10-gigabit instant wireless access at the centimeter-level spacing between transmitter (TX) and receiver (RX) [4]-[5]. Fig. 1.1 The attenuation of electromagnetic waves in the air [6] Within the frequency range (30-300GHz), there are propagation windows located near 35, 94, 140, 220GHz, as shown in Fig. 1.1. And the under-utilized spectrum at sub-THz frequencies enables a variety of exciting applications including 10-gigabit wireless communications and imaging. Today, THz front-ends are mainly implemented using Schottky diodes [7], nonlinear optical [8], [9], or III-V devices [10]. A complete transceiver (TRX) in a 50 nm mHEMT technology has been developed for wireless links with up to 25Gbit/s data rate at 220GHz [11]. Owing to aggressive scaling in feature size and device $f_T/f_{max}$ (Fig. 1), nanoscale CMOS technology potentially enables integration of sophisticated systems at THz frequency range, once only be implemented in compound semiconductor technologies. Recently, CMOS THz signal sources and TRXs have been reported [12]-[15], employing techniques such as distributed active radiator (DAR) and super-harmonic signal generator. Fig. 1.2 The ITRS 2008 错误! 未找到引用源。 From the ITRS 2008 shown in Fig. 1.2, before the year of 2008, the device $f_T/f_{max}$ are both approximately less than 200GHz, as the CMOS process developed to 32nm, the device $f_{max}$ is around 350GHz, which enables a 200GHz system design in CMOS. This dissertation focuses on novel circuit techniques enabling the design and implementation of a 210GHz TRX with OOK modulation in a 32nm SOI CMOS process ( $f_T/f_{max}$ =250/320GHz). This fundamental frequency TRX incorporates a 2×2 TX antenna array, a 2×2 spatial combining power amplifier (PA), a fundamental frequency voltage-controlled oscillator (VCO), and a low noise amplifier (LNA). Short range wireless test was carried out, showing the possibility of wireless data rate of 10Gbps for chip-chip communications. Besides, this dissertation covers the design of a highly integrated 9-element imaging receiver comprising a 2×2 super-pixel array with spatial-overlapping super-pixels with integrated antennas in a low cost BiCMOS process. A unique overlapping super-pixel concept was demonstrated, which employed mechanisms for 7-step amplitude and 9-step delay control settings. # 1.2 Organization The remainder of this dissertation is organized as follows: In Chapter 2, design of amplifier in near- $f_{max}$ region is discussed. A useful over-neutralization technique is proposed for THz amplifier design. Chapter 3 focuses on the design and implementation of a 210GHz fundamental transceiver, showing the possibility of wireless data rate of 10Gbps for chip-chip communications. In Chapter 4, the design of a W-band 9-element imaging array receiver using a new concept of spatial-overlapping super-pixels is described. The important core building block — true time delay circuit — is discussed in details. Finally, Chapter 5 summarizes the dissertation. # Chapter 2 Amplifier in near- $f_{max}$ Region In the context of a general amplifier design theory, many design parameters such as noise, linearity and etc. needs to be taken into account in the design and trade-offs. Gain, which is always assumed to be high enough, usually is less concerned. At the same time, too much gain from amplifier from an advanced CMOS process may cause stability problem. Therefore, gain needs to be well controlled in order to achieve a robust stable amplifier design. However, as the research interest migrates from millimeter-wave towards terahertz frequency range, the system's operation frequency is getting more and more close to the maximum oscillation frequency $f_{max}$ of transistor. The lack of front-end amplification results in high power consumption and noise figure. This chapter presents the study on the amplifier which operates close the $f_{max}$ of transistor. Some techniques back to 1960s, which are overlooked by the main-stream CMOS designers, are revisited and employed for the purpose of obtaining enough amplification in the THz frequencies. # 2.1 Neutralization Technique for a differential pair Neutralization technique, shown in Fig. 2.1(a), using a pair of cross-connected capacitors in a differential pair amplifier has been widely exercised to stabilize the amplifier. The neutralization capacitors $C_n$ introduce an equivalent capacitance $-C_n$ that compensates for the effect of intrinsic $C_{gd}$ of the transistor, thereby unilateralizing the device. In addition, neutralization technique is also utilized to achieve higher maximum available power gain (MAG, also called $G_{max}$ ) [17]-[19]. The MAG for a fully unilaterized device is actually Mason's U defined as [20] $$U = \frac{|Y_{12} - Y_{21}|^2}{4(\text{Re}[Y_{11}]\text{Re}[Y_{22}] - \text{Re}[Y_{21}]\text{Re}[Y_{12}])}.$$ (2.1) $C_{gd}$ is commonly considered to be the only contributor to $Y_{12}$ , hence, Re[ $Y_{12}$ ] is zero. In this case, Eq. (2.1) is often expressed as [17]: $$U = \frac{|\text{Re}[Y_{21}]^2}{4 \,\text{Re}[Y_{11}] \,\text{Re}[Y_{22}]} = \frac{g_m^2}{4 g_g g_{ds}}.$$ (2.2) where $g_g$ and $g_{ds}$ are the gate and drain-source conductances of the device, respectively. Fig. 2.1(b) shows the MAG and the Mason's U of a MOS device vs. frequency. The knee point existed in the MAG curve of the device without neutralization corresponds to stability factor $K_f$ of one, and locates around $1/3\sim2/3$ of $f_{max}$ . For frequencies below the knee point, neutralization technique helps boost the gain while stabilizing the device. After the knee point, defined as "near- $f_{max}$ region", only a slight difference exists between the MAG of device without neutralization and the Mason's invariant U. Fig. 2.1 Neutralization technique for differential pair (a) schematic (b) MAG # 2.2 Revisiting the Neutralization Technique To boost the power gain in the near- $f_{max}$ region, the neutralization technique is revisited. First, Y-parameters of the MOS common source configuration, shown in Fig. 2.2, is derived, i.e., $$Y_{11} = \omega^2 (C_{gs} + C_{gd})^2 r_g + j\omega (C_{gs} + C_{gd})$$ (2.3) $$Y_{12} = -\omega^2 (C_{gs} + C_{gd}) C_{gd} r_g - j\omega C_{gd}$$ (2.4) $$Y_{21} = g_m \Big[ 1 - j\omega (C_{gs} + C_{gd}) r_g \Big] - \omega^2 (C_{gs} + C_{gd}) C_{gd} r_g - j\omega C_{gd} \approx g_m - j\omega \Big[ g_m r_g (C_{gs} + C_{gd}) + C_{gd} \Big]$$ (2.5) $$Y_{22} = g_{ds} + \omega^2 C_{gd} r_g \left[ g_m r_g (C_{gs} + C_{gd}) + C_{gd} \right] + j \omega \left[ C_{ds} + C_{gd} (1 + g_m r_g) \right] \approx g_{ds} + j \omega \left[ C_{ds} + C_{gd} (1 + g_m r_g) \right]$$ (2.6) The above approximations are obtained under the condition that $f < 2f_{max}$ where $f_{max}$ is the maximum oscillation frequency of the transistor, $f_{max} = \frac{1}{4\pi} (g_m / [r_g C_{gd} (C_{gd} + C_{gs})])^{0.5}$ . Fig. 2.2 RF small signal model for MOSFET Substituting Eqs. (2.3)-(2.6) into Eq. (2.1), the Mason's U for MOSFET is derived as, $$U \approx \frac{|g_m|^2}{4[\omega^2 (C_{gs} + C_{gd})^2 r_g g_{ds} + \omega^2 (C_{gs} + C_{gd}) C_{gd} r_g g_m]}.$$ (2.7) Noticing that $Re[Y_{12}]$ is non-zero due to gate resistance $r_g$ , the contribution of $Y_{12}$ to Mason's U is non-zero, thereby invalidating Eq. (2.2). In a capacitance-based neutralization topology, an extra equivalent capacitance $-C_n$ is effectively placed in parallel with the input and output ports of the original two port network. The new Y-parameters thus become $$Y'_{11} = Y_{11} - j\omega C_n = \omega^2 (C_{gs} + C_{gd})^2 r_g + j\omega C_{gs} + j\omega (C_{gd} - C_n)$$ (2.8) $$Y_{12} = Y_{12} + j\omega C_n = -\omega^2 (C_{gs} + C_{gd})C_{gd}r_g - j\omega (C_{gd} - C_n)$$ (2.9) $$Y'_{21} = Y_{21} + j\omega C_n \approx g_m - j\omega [g_m r_g (C_{gs} + C_{gd}) + C_{gd} - C_n]$$ (2.10) $$Y'_{22} = Y_{22} - j\omega C_n \approx g_{ds} + j\omega C_{ds} + j\omega [C_{gd}(1 + g_m r_g) - C_n]$$ (2.11) If $C_n = C_{gd}$ , the imaginary part of $Y_{12}$ , Im $[Y_{12}]$ , will be neutralized, while Re $[Y_{12}]$ will be intact. This means that the device is not fully unilateralized by an external equivalent capacitance $-C_n$ . This rather different conclusion from what is conventionally known about capacitance-based neutralization technique stems from the non-zero contribution of gate resistance $r_g$ at high frequencies. $K_f$ and $G_{max}$ are derived using new Y-parameters in Eqs. (2.8)-(2.11) [21]: $$K_f = \frac{2 \operatorname{Re}[Y_{11}] \operatorname{Re}[Y_{22}] - \operatorname{Re}[Y_{12}Y_{21}]}{|Y_{12}Y_{21}|}.$$ (2.12) $$G_{\text{max}} = \left| \frac{Y_{21}}{Y_{12}} \right| \left( K_f - \sqrt{K_f^2 - 1} \right)$$ (2.13) Fig. 2.3 Maximum power gain with respect to neutralization capacitance at (a) 100GHz (b) 200GHz Fig. 2.3(a) shows $G_{max}$ , Mason's U and $K_f$ with respect to $C_n$ at 100GHz, a frequency range close to knee frequency in Fig. 2.1(b). Due to its invariance in the presence of any embedded linear lossless reciprocal network [20], the Mason's U remains constant with $C_n$ . The bell-shaped $K_f$ curve reaches its peak value at $C_n = C_{gd}$ . On the other hand, $G_{max}$ reaches its peak value for two values of $C_n$ when $K_f = 1$ , and reaches a local minimum at $C_n = C_{gd}$ [17]. Interestingly, at 100GHz, $G_{max}$ after neutralization ( $C_n = C_{gd}$ ) is higher than the unilateral power gain U, justifying this important notion that the differential pair with neutralization capacitance cannot be treated as a fully unilateralized device because $Re[Y_{I2}]$ is not compensated. When the device is operating in its near- $f_{max}$ region (e.g. 200GHz), the K-factor of the original device itself is greater than unity, as shown in Fig. 2.1 (b). The effects of the neutralization capacitance are shown in Fig. 2.3(b). Since now $K_f > 1$ , even without any neutralization capacitance, the original device itself is unconditionally stable and the power gain is almost the same as that of the neutralized device. When $C_n$ keeps increasing into over-neutralization region, gain is boosted by pushing the device to the edge of the stability region. From Fig. 2.3(b), significant gain boosting is achieved by employing over-neutralization technique, indicating that it is much more effective in gain boosting compared to neutralization technique in the near- $f_{max}$ region. The maximum boosted gain achieved using over-neutralization is close to the upper limit of $2U - 1 + 2\sqrt{U(U - 1)}$ obtained in [22]. Based on our observation, the achievable power gain of the device is both upper and lower bounded when the device is located in the region of $K_f > 1$ , i.e., $$U \le G_{max} \le (2U - 1) + 2\sqrt{U(U - 1)}. \tag{2.14}$$ Another interesting observation is that at $f_{max}$ , both lower and upper limits of $G_{max}$ converge to unity value. Thus, $f_{max}$ is fixed regardless of what kind of lossless reciprocal network is placed around the device. Fig. 2.4 shows $G_{max}$ of devices with respect to frequency for different neutralization capacitances. All these $G_{max}$ curves intercept at one frequency, that is, $f_{max}$ . Fig. 2.4 The $G_{max}$ curves with different $C_n$ cross over at the frequency point $(f_{max})$ # 2.3 Study on achieving the upper limit of power gain From previous investigation, the use of over-neutralization technique could significantly improve the power gain of a transistor. Based on our observation, the power gain will reach its local maximum when $K_f$ =1. However, the upper limit of $2U-1+2\sqrt{U(U-1)}$ predicted in [22] has not been achieved just by using over-neutralization technique. In other words, an important question arises; that is, what is best embedding network for a transistor to obtain its upper limit of power gain in Fig. 2.5? To answer this question, we obtain closed-form analytical expressions that related U to $G_{max}$ so as to achieve more insight and explore the necessary and sufficient condition to attain an upper limit. Fig. 2.5 A conjugate-matched amplifier employing an embedded transistor # 2.3.1 Relationship between U and $G_{max}$ To study on how to achieve the upper limit of power gain, it is worthwhile to revisit the general theory of linear two-port network. In fact, an attractively simple statement of $G_{max}$ performance was discovered by Singhakowinta in 1963 [23]. First, some new parameters are defined in order to simplify the derivation. $$g_{11} = \text{Re}[Y_{11}].$$ (2.15) $$g_{12} = \text{Re}[Y_{12}]. \tag{2.16}$$ $$g_{21} = \text{Re}[Y_{21}]. \tag{2.17}$$ $$g_{22} = \text{Re}[Y_{22}].$$ (2.18) $$M = \text{Re}[Y_{12}Y_{21}]. \tag{2.19}$$ $$N = \text{Im}[Y_{12}Y_{21}]. \tag{2.20}$$ $$L^{2} = M^{2} + N^{2} = |Y_{12}Y_{21}|. (2.21)$$ $$\eta = \frac{2g_{11}g_{22} - M}{L} \,. \tag{2.22}$$ $$A = \frac{Y_{21}}{Y_{12}} \,. \tag{2.23}$$ Here, $\eta$ is the same as $K_f$ as Eq. (2.12) and (2.22) are equivalent. A is also called the complex measure of nonreciprocity. Eq. (2.13) of power gain can also be re-written as $$G_{\text{max}} = \frac{|A|}{\eta + \sqrt{\eta^2 - 1}}.$$ (2.24) Starting from the definition of U from Eq. (2.1), it can be re-written as $$U = \frac{\left| Y_{21} - Y_{12} \right|^2}{4(g_{11}g_{22} - g_{12}g_{21})}.$$ (2.25) Dividing both numerator and denominator of (2.25) by $Y_{12}$ , and introducing the parameter of $\eta$ in this equation will yield: $$U = \frac{\left|\frac{Y_{21}}{Y_{12}} - 1\right|^2}{2\left[\frac{2g_{11}g_{22} - M}{\left|Y_{12}\right|^2}\right] - 2\left[\frac{2g_{12}g_{21} - M}{\left|Y_{12}\right|^2}\right]}$$ (2.26) $$= \frac{|A-1|^2}{\frac{2\eta L}{|Y_{12}|^2} - 2\left[\frac{\text{Re}\left[Y_{12}^*Y_{21}\right]}{|Y_{12}|^2}\right]} = \frac{|A-1|^2}{2\eta A - 2A_R}.$$ (2.27) where $A_R$ denotes the real part of A, i.e., Re[A]. Now, the Mason's U is expressed in terms of nonreciprocity A and the stability factor $\eta$ . We re-order (2.27) so that it becomes similar in form to Eq. (2.24): $$U = \frac{|A-1|^2}{|A|(\eta + \sqrt{\eta^2 - 1}) + |A|(\eta - \sqrt{\eta^2 - 1}) - 2A_R}$$ (2.28) $$= \frac{|A-1|^2}{|A|^2 \frac{\eta + \sqrt{\eta^2 - 1}}{|A|} + \frac{|A|}{\eta + \sqrt{\eta^2 - 1}} - 2A_R}$$ (2.29) Substituting Eqs. (2.24) into Eq. (2.29), the Mason's U is derived as $$U = \frac{|A-1|^2}{|A|^2 / G_{\text{max}} + G_{\text{max}} - 2A_R}$$ (2.30) Or, more concisely $$\frac{U}{G_{\text{max}}} = \left| \frac{A - 1}{A - G_{\text{max}}} \right|^2 \tag{2.31}$$ Comparing Eq. (2.24) and Eq. (2.31), at first glance, they convey similar physical meaning and implication, i.e., they relate $G_{max}$ to the parameter of device and the abstract Y-parameters are absent. However, Eq. (2.31) indeed gives more insight into the behavior of the embedded amplifier in Fig. 2.5. First, it shows dependency on two basic parameters U and A, which can be controlled independently; Second, based on Eq. (2.24), it is not straightforward to easily perceive insightful relationship between $\eta$ and the embedding network. In contrast, the simple and useful graphical representation (the gain-plane) of Eq. (2.31) provides a more intuitive view, as we can plot the loci of constant power gain in the U/A plane. #### 2.3.2 The Gain-Plane In order to completely represent the Eq. (2.31), a three-dimensional model including U, Re[A] and Im[A] is needed. However it failed to offer the best medium for the design. A two-dimensional tool [24] usually is much more useful, like Smith Chart for matching design. In practice it is normally true that $$|A| >> 1 \tag{2.32}$$ As a good approximation, Eq. (2.31) can be expressed as $$\sqrt{\frac{G_{\text{max}}}{U}} = \left| 1 - \frac{G_{\text{max}}}{U} \times \frac{U}{A} \right| \tag{2.33}$$ Here, the normalized gain $G_{max}/U$ is expressed as a function of only U/A. So that in the U/A plane (also called *gain-plane*) with coordinate axes Re[U/A] and Im[U/A], we should be able to locate unique loci of constant normalized gain $G_{max}/U$ . Before studying on the loci of constant normalized gain, the stability boundaries could also be plotted in the gain-plane. From [25], the unconditional stability is given by $$\left(\operatorname{Im}\left\lceil\frac{U}{A}\right\rceil\right)^{2} < \operatorname{Re}\left\lceil\frac{U}{A}\right\rceil + \frac{1}{4} \tag{2.34}$$ Inequality (2.34) describes the region inside a parabola in the gain plane shown in Fig. 2.6. The parabola intersects the real axis at -0.25, and the imaginary axis at $\pm 0.5$ . The region within this parabola corresponds to the unconditional stable region. Fig. 2.6 Stability regions in the U/A plane To locate the constant-gain loci within the unconditional stable region of the gain-plane, Eq. (2.33) is written $$\frac{U}{G_{\text{max}}} = \left[ \text{Re} \left[ \frac{U}{A} \right] - \frac{U}{G_{\text{max}}} \right]^2 + \left[ \text{Im} \left[ \frac{U}{A} \right] \right]^2$$ (2.35) Eq. (2.35) describes a family of circles in the gain-plane. Each circle is the locus of a given value of normalized gain $G_{max}/U$ , and has center $(U/G_{max}, 0)$ and radius $\sqrt{(U/G_{max})}$ . Shown in Fig. 2.7 is a case when $G_{max} = U$ . The blue circle represents the unity normalized gain locus which intersects the parabola tangentially. However, the locus is an incomplete circle, only the segment inside the identifier green circle is valid. The reason is that the rest of the blue circle violates the implicit constraint that $G_{max} < |A|$ , which is equivalent to $K_P > 1$ . However, it is not convenient to always draw an auxiliary green circle when the constant gain loci for different gain value are needed. There is an interesting and insightful observation in Fig. 2.7, the intersection point of the green circle and blue circle is exactly the same as the tangent point of the blue circle and red parabola. Therefore, green identifier circle is not needed anymore, as two tangent points of blue constant gain circle and boundary parabola determine the valid part in the blue circle. Fig. 2.7 The graphic representation of normalized unity gain $G_{max}/U$ in the U/A plane Then the loci of normalized gain $G_{max}/U$ can be plotted in the U/A plane shown in Fig. 2.8. Fig. 2.8 The loci of normalized gain $G_{max}/U$ in the U/A plane Any three-terminal network can be represented by a point in the gain-plane. The gain plane gives immediate appreciation of a number of involved relationships, in that the dependence of the gain on the parameters U, A is simultaneously shown. The advantage is that any change in the operating conditions of the conjugate-matched, two-port amplifier, or in its own parameters, results in a corresponding locus for the 'operating point' in the gain-plane. Equally, any design technique, such as the application of feedback or use of lossy padding at the ports, causes the operating point to follow a certain locus, and the design process may be interpreted as one of confining this locus to a region of the plane suitable for achieving the desired gain performance. It should be noted that the locus for $G_{max}$ =4U is, in fact, a single point, and the maximum possible gain is 4U. And it is straightforward from Fig. 2.8 the maximum is achieved if and only if the imaginary part of A is zero and the device is at the edge of stability region. It is the necessary and sufficient condition for achieving the theoretical upper limit of power gain. $$\begin{cases} \operatorname{Im} \left[ \frac{U}{A} \right] = 0 \\ K_f = 1 \end{cases}$$ (2.36) One small correction to this conclusion is that the maximum gain of 4U is obtained from the assumption of A>>1. Substituting the Eq. (2.36) into the accurate formula Eq. (2.31), the maximum value of $G_{max}$ is obtained, the same result by [23] $$\max\{G_{\max}\} = (2U - 1) + 2\sqrt{U\sqrt{U - 1}}$$ (2.37) #### 2.3.3 Moving locus of Y, Z embedding network in the Gain-Plane By means of the graphic tool of gain-plane, the necessary and sufficient condition for achieving the theoretical upper limit of power gain has been obtained in Eq. (2.36). In this section, the relationship between the embedding network and the movement in the gain-plane is discussed [25]. Fig. 2.9 Y-embedding for a transistor First, consider a CS transistor with a single admittance $Y_f$ if connected directly between the input and output ports shown in Fig. 2.9. Then the new admittance matrix is $$\begin{bmatrix} Y_{11} + Y_f & Y_{12} - Y_f \\ Y_{21} - Y_f & Y_{22} + Y_f \end{bmatrix}$$ (2.38) Therefore the value of 1/A of the embedded device is $(Y_{12} - Y_f) / (Y_{21} - Y_f)$ . Using a prime (') to denote the embedded device, we can see that under the condition $|Y_f| \ll |Y_{21}|$ , U=U', $$\frac{U'}{A'} = \frac{U}{A} - \frac{UY_f}{Y_{21}} = \frac{U}{A} + \frac{U|Y_f|}{|Y_{21}|} \angle (\pm \frac{\pi}{2} - \arg Y_{21})$$ (2.39) In fact, Eq. (2.39) describes the straight line in the gain-plane (U/A place) drawn at an angle of $\pm \pi/2 - \underline{/Y_{21}}$ to the Re[U/A] axis. Then the Y-embedding leads to movement along this line; the direction is dictated by the nature of $Y_f$ (L or C); the distance moved is proportional to its magnitude. The simplicity of this movement constitutes one of the advantages of the gain-plane representation of embedded networks. Another simple form of embedding is the connection of an impedance in series with common terminal of this transistor in Fig. 2.10. Following this similar discussion, the embedded device is described as $$\frac{U'}{A'} = \frac{U}{A} + \frac{U|Z_f|}{|Z_{21}|} \angle (\pm \frac{\pi}{2} - \arg Z_{21})$$ (2.40) Similarly, Eq. (2.40) describes another straight line in the gain plane. The graphic representation of Y and Z embedding network is plotted in Fig. 2.11. Fig. 2.11 Moving locus of Y, Z embedding network in the gain-plane # **2.4** Amplifier Design in near- $f_{max}$ Region – Application of Gain-Plane Technique In the session 2.4, the use of over-neutralization technique in amplifier design in near- $f_{max}$ region is discussed. However, from the figures that plot the $G_{max}$ with respect to $C_n$ we do not gain much insight. In this session, a useful tool, the gain-plane, is employed to represent the over-neutralization technique. For a transistor model with no gate resistance (or at low frequency the effect of fate resistance is negligible), the U/A is approximate to $-j\omega C_{gd}/g_m$ , a point locating at the negative imaginary axis of the gain-plane. As long as the angle of $Y_{21}$ (which is simply gm) is approximate to 0, the locus of Y-embedding (adding neutralization cap $C_n$ ) is actually the y-axis in the gain-plane. When $C_n = C_{gd}$ , the device is moved to the origin of the gain-plane, which is also called the unilateralization. Besides, the $G_{max}$ v.s. $C_n$ plot is symmetric [17] could be easily explained since the constant-gain contours are symmetric to x-axis in the gain-plane. Fig. 2.12 shows two representations for the transistor at 100GHz. In this case, the effect gate resistance is not negligible. Device before embedding is not locating at imaginary axis anymore and the direction of moving locus of Y-embedding is rotating counterclockwise as frequency increases from low frequency. However, the effect of gate resistance is not significant either. Fig. 2.12(b) stills exhibit a sort of symmetry in the $G_{max}$ - $C_n$ plot. Fig. 2.12 Y-embedding (adding neutralization cap) for 100GHz transistor (a) gain-plane (b) $G_{max}$ - $C_n$ plot Fig. 2.13 shows two representations for the transistor at 200GHz. In this case, the effect gate resistance is significant. Device before embedding is now locating in the boundary of stability which means the device itself is unconditionally stable. The direction of moving locus of Y-embedding is now is 117 degree to the x-axis. Therefore totally different behavior of transistor at 200GHz compared to its 100GHz operation is shown in Fig. 2.13(b). From Fig. 2.13(a), another important observation is the far-end intersect of moving locus and stability boundary, which represents the highest gain from over-neutralization technique, is very close to the theoretical upper limit. Fig. 2.13 Y-embedding (adding neutralization cap) for 200GHz transistor (a) gain-plane (b) $G_{max}$ - $C_n$ plot Fig. 2.13 illustrates that the over-neutralization technique is very effective in pushing a Common-Source transistor to its upper limit. However, only employing Y-embedding does not give enough flexibility to the design. Fig. 2.14 shows the combination of Y-embedding and Z-embedding. As Z-embedding gives another moving locus which is different of moving locus of Y-embedding in the gain-plane, by properly choosing the value of embedding network, the upper limit point is achievable. In practice, it is expected that the circuit topology is the combination of neutralization cap and capacitive source degeneration. Fig. 2.14 Combining Y-embedding and Z-embedding for a transistor # 2.5 Chapter Summary In this chapter, conventional neutralization and over-neutralization technique was discussed for the THz amplifier design. It is proven that over-neutralization technique is much more effective in gain boosting compared to neutralization technique in the near- $f_{max}$ region. A classic technique backing to 1960s, the gain-plane, was revisited. It is demonstrated that the gain-plane graphic representation gives insights of relationship between $G_{max}$ , U and A. It is useful in finding the theoretical upper limit of unconditional stable power gain and the necessary and sufficient condition for achieving it for a given U. # Chapter 3 A CMOS 210GHz Fundamental Transceiver ## 3.1 System Architecture Harmonic-based TRXs reported to date in (Bi-)CMOS processes [14], [15] all suffer from high power consumption and noise figure (NF) due to the lack of front-end amplification. On the TX side, the frequency multiplier– placed usually as the last stage prior to antenna – exhibits negative power gain (e.g., –10dB). Therefore, to generate adequate output power, a stronger signal (e.g., 10dB higher) than the TX power needs to be generated by a lower frequency pre-PA, thus resulting in low efficiency and high power consumption. On the RX side, due to lack of LNA in the chain, the noise contribution from the subsequent stages cannot be suppressed, thereby leading to poor NF and poor RX sensitivity. This work addresses the above issues by implementing a TRX architecture that operates at TRX's fundamental frequency. The TRX system architecture is shown in Fig. 3.1 [26], and is integrated in a nanoscale CMOS process alongside on-chip antenna array [27]. It employs fully differential topology, as it is inherently robust to common-mode substrate and power/ground induced noise and exhibits better linearity than single ended topology. Fig. 3.1 The 210GHz fully integrated differential TRX architecture The TX incorporates a 2×2 spatial power combining array architecture, consisting of a new double-stacked cross-coupled VCO at 210GHz with an OOK modulator, a PA driver, a novel balun-based differential power distribution network, four PAs and on-chip 2×2 dipole antenna array. The non-coherent RX employs a direct detection architecture comprising an on-chip antenna, an LNA, and a power detector. Fig. 3.2 shows the balun-based differential power distribution network, which is amenable to high frequencies. In the conventional distribution network (Fig. 3.2), the undesired cross-overs in the layout result in routing-related problems and signal integrity issues, such as delay/amplitude imbalance, crosstalk, and EM coupling. In the proposed structure, these unwanted cross-overs are avoided by using a pattern of alternate power splitter and balun instead of using two stage power splitters. Fig. 3.2 the novel balun-based differential power distribution network # 3.2 On-Chip Antenna and Balun Design ### 3.2.1 2×2 Antenna Array Fig. 3.3(a) shows an on-chip dipole antenna with surrounding ground shield, which is integrated in a 32nm SOI CMOS process with a substrate resistivity of $13.5\Omega$ -cm. The high conductivity of the low resistivity substrate of a CMOS process compared to off-chip substrate is one of the most crucial contributors to the poor radiation efficiency of on-chip antenna. The substrate thickness of $300\mu m$ (i.e., the default post fabrication thickness) at 210GHz is close to $34\lambda$ in the substrate, the constructive reflection from the ground underneath the silicon substrate will help boost the radiation efficiency to as high as 24%, as shown in Fig. 3.3(b). Fig. 3.3 on-chip shielded dipole antenna (a) structure (b) radiation efficiency with respect to frequency The dipole antenna, shown in Fig. 3.3(a), is implemented in the topmost metal layer. The length of the dipole is chosen to be 360 $\mu$ m to maximize the radiation efficiency. The width of this antenna is chosen to be 40 $\mu$ m to broaden the bandwidth, while achieving 50- $\Omega$ impedance matching. Moreover, a ground plane is placed underneath the silicon substrate to help improve the radiation efficiency, as mentioned above. Furthermore, in order to shield neighboring circuits from coupling from the antenna, an extra ground ring from the bottom to top metal surrounds each antenna. The simulated antenna gain and return loss versus frequency is shown in Fig. 3.4(a). The antenna gain at 210GHz is -2.5dBi and the antenna bandwidth is 40GHz centered at 200GHz. Fig. 3.4(b) shows the simulated radiation pattern of the shielded dipole antenna. The constructive reflection from bottom ground contributes to another peak in the H-plane at $\theta = 70^{\circ}$ , where $\theta$ denotes the inclination angle in the spherical coordinate system. Fig. 3.4 Antenna parameters (a) gain and return loss (b) pattern at 210GHz A 2×2 antenna array with 0.57λ spacing between elements (i.e., 820μm) at 210GHz is designed to achieve a high directivity [28]. Fig. 3.5(a) demonstrates the radiation pattern of this 2×2 antenna array, where the overall antenna array gain is 4.5dBi. The existing mutual coupling between antennas was simulated as a function of frequency (see Fig. 3.5(b)). For frequencies from 200- to 220-GHz, the simulated antenna coupling in the E- and H-planes stays less than –20dB and –30dB, respectively. This low mutual coupling guarantees a negligible effect on the array's attributes, such as array factor and input impedance. Fig. 3.5 2×2 antenna array parameters (a) 3-D pattern (b) antenna coupling ### 3.2.2 Marchand Balun Owing to its wideband characteristics and ease of implementation, Marchand balun is chosen as an integral part of the proposed power distribution network [29]-[31]. Fig. 3.6 shows the balun structure, incorporating a cascade of two quarter-wavelength couplers. The low-THz signal is fed to Port 1 on the topmost metal layer. By choosing the length and impedance of the coupler delicately during the design, half of the input signal power is coupled through the lower metal layer to Port 2. The remaining half is reflected back at the far-end open terminal, where its phase is reversed, and then coupled through lower metal layer to Port 3. Thus, signals appearing at Port 2 and Port 3 exhibit 180° phase difference. Fig. 3.6 Marchand balun Defining the coupling strength as $\rho_C$ , the complete S-Matrix of the Marchand balun is expressed as [29] $$[S]_{balun} = \begin{bmatrix} \frac{1 - 3\rho_C^2}{1 + \rho_C^2} & j\frac{2\rho_C\sqrt{1 - \rho_C^2}}{1 + \rho_C^2} & -j\frac{2\rho_C\sqrt{1 - \rho_C^2}}{1 + \rho_C^2} \\ j\frac{2\rho_C\sqrt{1 - \rho_C^2}}{1 + \rho_C^2} & \frac{1 - \rho_C^2}{1 + \rho_C^2} & \frac{2\rho_C^2}{1 + \rho_C^2} \\ -j\frac{2\rho_C\sqrt{1 - \rho_C^2}}{1 + \rho_C^2} & \frac{2\rho_C^2}{1 + \rho_C^2} & \frac{1 - \rho_C^2}{1 + \rho_C^2} \end{bmatrix}$$ (3.1) From the S-Matrix, the phase difference of $\angle S_{21}$ and $\angle S_{31}$ is 180° and independent of the coupling strength $\rho_C$ , thus the phase difference is insensitive to the frequency variation. To achieve $50\Omega$ input matching ( $S_{II}=0$ ), $\rho_C$ needs to be designed to be $1/\sqrt{3}$ . Therefore, the odd and even impedances, $Z_{oo}$ and $Z_{oe}$ , of the quarter-wavelength coupler in the balun should be 26- and 96- $\Omega$ , respectively. Although both $Z_{oo}$ and $Z_{oe}$ are varied with the interlayer dielectric thickness, the width of signal line, and the spacing between signal and ground lines, strict constraints exist on these geometrical parameters in a CMOS process. This makes it difficult to design the coupler to achieve the desired $Z_{oo}$ and $Z_{oe}$ values. A vertical coupler structure with overlapping offset is thus employed to realize the Marchand balun. This offset is introduced to provide additional degree of freedom in adjusting $Z_{oo}$ and $Z_{oe}$ . Fig. 3.7 shows simulated S-parameters and phase difference vs. frequency. The balun's bandwidth is greater than 100GHz. Amplitude imbalance between Ports 2 and 3 is only 0.2 dB, while phase imbalance is less than $\pm 2^{\circ}$ around 180° over the 100GHz bandwidth. Fig. 3.7 Marchand balun performance S-Parameters and phase difference # 3.3 Active Circuits Design ## 3.3.1 200GHz Transistor Layout for Power Amplifier For millimeter-wave/THz circuit design, the effect of parasitics in the layout is considered to be one of the critical issues [32]–[34]. As the frequency approaches half- $f_{max}$ of a 32 nm CMOS transistor, the parasitics associated with transistor layout (e.g., $C_{gs,e}$ , $C_{gd,e}$ , $r_{g,e}$ ) are directly absorbed to the transistor's RF model, and thus degrade its performance; i.e., a poor layout may lead to a negative $G_{max}$ at such high frequencies. Prior art have conducted investigations on ways of mitigating the parasitic [35], [36]. Although Mason's U is smaller than $G_{max}$ , its invariance to any externally added lossless network makes it a good candidate to study the layout parasitics. Fig. 3.8(a) shows the effects of layout-induced parasitic capacitors $C_{gs,e}$ and $C_{gd,e}$ on the device's U. From this figure, the U degradation due to $C_{gd,e}$ is around two times that due to extrinsic $C_{gs,e}$ . Therefore, the interconnect routings of source and drain terminals are done in such a way as to minimize $C_{gd,e}$ . For instance, external access to drain terminal is made through top metal layer so as to separate gate and drain metal lines. Fig. 3.8 200GHz transistor layout issues (a) U's sensitivity to $C_{gs}$ and $C_{gd}$ (b) Post layout $f_{max}$ with different finger width Fig. 3.9 Floorplan for transistor layout (a) top view (b) side view Fig. 3.9 shows the transistor layout in IBM 32nm SOI CMOS process, used in 210GHz PA. Multi-finger configuration is used to reduce the finger width, and thus, parasitic gate resistance which is crucial in the U degradation. However, the smaller finger width leads to more finger numbers. As the design of PA requires extra wide transistor's width, two double-sided-gate configurations in parallel is employed to avoid layout to get stretched in one dimension (Fig. 3.9 (a)). Two metal layers M1 and M2 are stacked for the gate interconnection, and the width of the gate line is intentionally widened to reduce its resistance. To minimize $C_{gd,e}$ , external access to drain is made through top layers E1, MA to further separate the gate and drain lines (Fig. 3.9(b)). Middle layers B1, B2 and B3 are used as source interconnection (Fig. 3.9(b)). This layout configuration can also support large current density as drain and source connections use thick metal to account for electromigration. Three MOSFETs (width of 32 $\mu$ m) with different finger widths are laid out and their $f_{max}$ are shown in Fig. 3.8(b). The finger width of 640nm is finally chosen to get the maximum $f_{max}$ . ## 3.3.1 210GHz Power Amplifier Design Fig. 3.10 shows the schematic of the 210GHz CMOS PA, which is comprised of a three-stage differential amplifier using over neutralization technique. Fig. 3.10 210GHz CMOS power amplifier schematic Differential topology eliminates the parasitics' source degenerative effects by providing shorter physical interconnection from its transistors' source terminals to ground compared to a single-ended counterpart, and is also insensitive to the modeling inaccuracy of decoupling capacitors at 210GHz. In addition, $C_{gd}$ -neutralization capacitor is simply realized by $C_{gd}$ of a similar MOSFET to mitigate the mismatch between neutralization capacitor and main transistor's $C_{gd}$ , as shown in Fig. 3.11(a). Fig. 3.11 Over-neutralization technique (a) $C_{gd}$ of similar MOSFET as $C_n$ (b) gain boosting The transistor's intrinsic $G_{max}$ after layout is only 4.5dB and after deducting the loss of matching network (roughly 2.5dB per stage), the achievable power gain per stage is only 2dB. The 2dB-gain per stage raises two concerns, (1) it yields poor power added efficiency (PAE), i.e., only around one third of drain efficiency, and (2) it leads to a large number of cascaded stages for high amplification (e.g., 10dB), which will further lead to higher power consumption and smaller bandwidth. In order to overcome this problem, over-neutralization technique has been employed. The PA's main transistors are intentionally pushed to the edge of stability region, resulting in higher gain shown in Fig. 3.11(b). By choosing proper neutralization capacitance $C_n$ , the $G_{max}$ can be boosted by as much as 4dB. To leave a margin for stability, a 3dB gain boost, corresponding to #### $K_f$ of 1.1, is chosen. Considering that the quarter-wavelength is only around 180 $\mu$ m at 210GHz, the ground-shielded CPW line is utilized for impedance matching. All PA stages are interstage-matched to $50\Omega$ to make the design more robust to process-dependent uncertainties in passive components at this frequency, which, in turn, leads to more flexibility in layout. The PA's output matching network is designed for maximum $P_{sat}$ . The extra loss added by matching network makes the amplifier more stable, the stability factor of the overall PA is greater than unity at all frequencies, which means it is unconditionally stable. Fig. 3.12 Measurement result for PA (a) Pout, Gain and PAE with respect to Pin (b) Power gain with respect to frequency The PA core occupies $150 \times 400 \mu m^2$ of die area (excluding pad). The PA breakout was tested by using a G-band (140-220GHz) RF probe and power meters. To this end, on-chip baluns were used to convert the input and output differential signals to single-ended. The loss of the on-chip balun was calibrated using back-to-back configuration and was de-embedded from the PA output power. The PA circuit exhibits a measured peak gain of 15dB, OP1dB of 2.7dBm, $P_{sat}$ of 4.6dBm, and a peak PAE of 6%, as shown in Fig. 3.12(a). Fig. 3.12(b) exhibits a 3-dB bandwidth of more than 14 GHz. The measured PA bandwidth is limited by the highest measurable frequency (220GHz) of the test equipment. The system measurement demonstrates a wireless link with 10GHz baseband bandwidth, indirectly implying the PA bandwidth to be actually around 20GHz. The PA draws 40mA from a 1V supply. #### 3.3.2 210GHz Low Noise Amplifier Design Common source (CS) and cascode topologies are the most popular topologies for LNA design [37]. However, at frequencies close to $f_T/f_{max}$ , not only the gain of cascode amplifiers drops to almost that of a CS (due to the parasitic capacitance—seen at the intermediate node of the cascode amplifier), but also the noise contribution of the common gate (CG) device to the cascode's NF becomes significant ( $\sim$ 0.5-1dB higher NF compared to a CS amplifier). Nevertheless, a shunt or series inductance [38] can be placed between the CS and the CG devices, as an inter-stage matching network, to increase the impedance seen by CG device, thereby alleviating gain and NF degradation issues. However, CS topology was chosen for this design, since, for the given technology, after accounting for the loss of required inter-stage matching network, the CS topology out-performs the cascode topology in terms of gain and NF; and therefore is chosen for the design. Fig. 3.13 Low noise amplifier design (a) $NF_{min}$ with respect current density (b) simultaneous noise and power match (c) trade-off between reflection and $G_{max}$ (d) $NF_{min}$ and Kf with respect $L_{deg}$ Since the very first stage of an LNA mostly determines its NF, the design goal for the first stage of the LNA was to find the optimum current density ( $J_{opt}$ ) and finger width for NFmin. However, the maximum achievable gain of the device was too low when biased at $J_{opt}$ ( $\sim 2\text{-}3\text{dB}$ including the loss of input/output matching networks). Consequently, the first stage was unable to suppress the noise contribution of the subsequent stages; hence, the overall NF was drastically degraded. Therefore, a higher current density of $0.22\text{mA/}\mu\text{m}$ as opposed to $J_{OPT} = 0.16\text{mA/}\mu\text{m}$ was chosen to minimize the overall NF for a given finger width of 400nm. NFmin of the first stage was only degraded by 0.05dB for this current density shown in Fig. 3.13(a). Inductive degeneration in the LNA design is commonly used to transfer the input impedance $Z_{in}$ of the device to a value close to $Z_{opt}^*$ to achieve simultaneous noise and power match [37]. In this approach, $R_{opt}$ =Re[ $Z_{opt}$ ] is assumed to be independent of the degeneration inductance $L_{deg}$ . However, as the CMOS technology further scales down to nanoscale regime, $R_{opt}$ becomes a stronger function of $L_{deg}$ . Thus, the effect of $L_{deg}$ on both $R_{opt}$ and $B_{opt}$ =Im[ $Z_{opt}$ ] should be taken into account. The goal is, therefore, to minimize the Euclidean distance between $Z_{opt}$ and $Z_{in}^*$ , rather than only focusing on real part of the input impedance. Fig. 3.13(b) shows $Z_{opt}$ and $Z_{in}^*$ on the Smith chart as $L_{deg}$ varies, clearly indicating variation in $R_{opt}$ . The distance between $Z_{opt}$ and $Z_{in}^*$ ( $\Delta\Gamma$ ) is calculated and plotted in Fig. 3.13(c). During the design, existing trade-off between $\Delta\Gamma$ and Gmax was accounted for so as to avoid too much gain degradation. At the same time, $K_f$ is ensured to be greater than unity (i.e., unconditional stability) shown in Fig. 3.13(d). The NFmin variation vs. $L_{deg}$ is also depicted on the same plot. As expected, the inductive degeneration slightly reduces NFmin [39]. Fig. 3.14 210GHz CMOS LNA schematic Fig. 3.14 shows the schematic of the 7 stage differential LNA. It turns out that the gain of the first stage is still insufficient to suppress the noise contribution of the second stage. Therefore, the current density of the second stage is also chosen to be identical to that of the first stage to minimize its noise contribution to the overall NF. The succeeding stage has been biased at current density of $0.56\text{mA/\mu m}$ corresponding to maximum $f_{max}$ . Thus, highest gain per stage is achieved. For the first two stages, simple second order matching networks is used to minimize the noise contribution of lossy on-chip passives. As for the succeeding five stages, 4th order matching networks have been incorporated to achieve a wide 3-dB gain bandwidth (>15GHz). Fig. 3.15 Measurement results for LNA The LNA core occupies only 400×650 µm². It has been tested by a set up composing of G-band (140GHz-220GHz) RF probe and Vector Network Analyzer (VNA). The LNA exhibits a measured peak gain of 18dB with a BW of at least 15GHz, as shown in Fig. 3.15. The measured bandwidth in this design is limited by the highest measurable frequency (220GHz) of the test equipment. Input/output return losses are better than 8dB. The LNA's in-band NF has been estimated from the RX's output SNR measurement. Accounting for measured 18dB gain for the LNA, the receiver's NF is mostly dominated by the LNA's NF. Therefore, the LNA's NF is upper-bounded by the receiver's NF, which approximately varies between 11- and 12-dB, as shown in Fig. 3.15. The LNA draws 44.5mA from 1V supply. ## 3.3.3 210GHz Voltage-Controlled Oscillator Design As was discussed in the previous section, the transconductance $g_m$ degrades significantly as the operation frequency increases towards half- $f_{max}$ of the device. Moreover, varactor loss becomes the dominant contributor to the Q factor degradation of the oscillator. As a consequence, new circuit techniques need to be examined in the design of a fundamental VCO at 200GHz to overcome these limitations. Inductive tuning was demonstrated to be amenable to high frequencies compared to varactor tuning [40]. Capacitive source degeneration at the buffer stage was proposed to increase equivalent negative resistance for millimeter-wave design [41]-[43]. Also, capacitive source degeneration below the cross coupled pair can decrease undesired parasitic capacitance [44], [45]. In this paper, the use of source degeneration in a cross coupled pair has been extended to complex impedance $Z_s$ . Fig. 3.16 Source degeneration for cross coupled pair (a) schematic (b) equivalent circuit By injecting a test voltage source $V_t$ to the cross coupled pair with source degeneration impedance $Z_s$ shown in Fig. 3.16(a), the equivalent admittance is obtained as: $$Y_{in} = \frac{-g_m + sC_{gs}}{2(1 + g_m Z_s + sC_{gs} Z_s)} = \frac{1}{-\frac{2}{g_m} - 2Z_s - \frac{2sC_{gs} Z_s}{g_m}} + \frac{1}{\frac{2}{sC_{gs}} + \frac{2g_m Z_s}{sC_{gs}} + 2Z_s}$$ (3.2) Therefore, the equivalent circuit of the cross coupled pair with source degeneration is composed of parallel combination of branch A and branch B in Fig. 3.16(b), corresponding to two terms in Eq. (3.2). Assuming Zs is purely negative resistance (i.e., Zs = -Rs), the resistance -2/gm in Branch A is partially neutralized by $2R_s$ . The Q factors of branch A (defined as $Q_A$ ) and branch B (defined as $Q_B$ ) are obtained as: $$|Q_A| = \left| \frac{\operatorname{Im}[Z_A]}{\operatorname{Re}[Z_A]} \right| = \left| \frac{2\omega C_{gs} R_s / g_m}{-2/g_m + 2R_s} \right| = \left| \frac{\omega C_{gs} R_s}{1 - g_m R_s} \right|,$$ $$|Q_B| = \left| \frac{\operatorname{Im}[Z_B]}{\operatorname{Re}[Z_B]} \right| = \left| \frac{-2/\omega C_{gs} + 2g_m R_s / \omega C_{gs}}{-2R_s} \right| = \left| \frac{1 - g_m R_s}{\omega C_{gs} R_s} \right|.$$ (3.3) Also, under the condition $R_s < \frac{1}{g_m} \cdot \frac{1}{1 + \omega / \omega_T}$ , $Q_B > 1$ and $Q_A < 1$ . Re[ $Y_{in}$ ] is thus dominated by branch A, i.e., $$Re[Y_{in}] = Re[Y_A] + Re[Y_B] = \frac{1}{1 + Q_A^2} \cdot \frac{1}{Re[Z_A]} + \frac{1}{1 + Q_B^2} \cdot \frac{1}{Re[Z_B]}$$ $$\approx \frac{1}{Re[Z_A]} = -\frac{g_m}{2(1 - g_m R_s)} < -\frac{g_m}{2}$$ (3.4) Therefore, the real part of overall admittance of the equivalent circuit representing cross-coupled pair with negative resistance in the source terminal is enhanced compared to the conventional cross coupled pair. This negative resistance can readily be realized using an additional cross-coupled pair, as will be explained later in this section. Fig. 3.17 Negative resistive source degeneration (a) resistance only (b) with parasitic capacitance In order to verify the above first order analysis, two cross coupled pairs with two different source degenerations, shown in Fig. 3.17, are simulated. Fig. 3.18(a) shows effective parallel resistance $R_p$ defined as $R_p=1/\text{Re}[Y_{in}]$ and effective parallel capacitance $C_p$ defined as $C_p=\text{Im}[Y_{in}]/\omega$ at 200GHz for circuit with source degeneration of only negative resistance $R_s$ in Fig. 3.17(a). Simulation shows that there is an optimum value for $R_p$ when $R_s$ is around 90 $\Omega$ . A proper choice of negative resistance at the source of the cross-coupled pair will increase $R_p$ by three times, leading to higher loop gain. Moreover, a decrease in effective capacitance $C_p$ also helps improve tuning range and relax the choice of tank inductance value. Fig. 3.18 Simulation effective parallel resistance $R_p$ and effective parallel resistance $C_p$ for cross coupled pair with source degeneration (a) sweep $R_s$ , when $C_s = 0$ (b) sweep $C_s$ , when $-R_s = -90\Omega$ One effective way of realizing the negative resistance is by another cross coupled pair. However, the corresponding parasitic capacitance cannot be neglected. Fig. 3.18(b) shows its effective parallel resistance $R_p$ and parallel capacitance $C_p$ at 200GHz for circuit with source degeneration of both negative resistance $R_s$ and parasitic capacitance $C_s$ in Fig. 3.17(b). It indicates that both $R_p$ and $X_p$ (=1/( $j\omega C_p$ )) are lowered as the source parasitic capacitance $C_s$ increases. Therefore, an extra inductor $L_s$ is added between the source terminals of the cross coupled pair to resonate out this undesired parasitic capacitance (Fig. 3.17(b)). Fig. 3.19 210GHz CMOS VCO schematic Fig. 3.19 shows the fundamental double-stacked cross-coupled VCO and the OOK modulator. The overall negative resistance of this oscillator is increased due to an additional negative source degeneration resistance provided by M1-M2. This negative resistance compensates for the excessive varactor loss at very high frequencies, thereby improving overall loop gain. As mentioned above, the 30pH inductor $L_S$ in Fig. 3.19 mitigates the detrimental effect of parasitic capacitance of the bottom cross-coupled pair M1-M2. The interstage matching network between the VCO buffer and the OOK modulator has been realized by transformers, thereby leading to compact layout. The OOK modulator utilizes a cascode topology M7(M8)–M9(M10), where the modulated signal is applied to the gate of transistor M9(M10). The output of the OOK modulator is matched 50 $\Omega$ using transformers. Fig. 3.20 Measurement results for VCO (a) tuning range and output power (b) phase noise The VCO core and modulator occupies $100\times400\mu\text{m}^2$ of die area. The circuit was characterized using a G-band (140GHz-220GHz) RF probe, power meters and sub-harmonic mixer. The VCO exhibits a measured output power of -13.5dBm, a tuning range of 8GHz (204.7-212.7GHz), and a phase noise of -81dBc/Hz at 1MHz offset at 209GHz (Fig. 3.20). The VCO plus the buffer and OOK modulator consumes a total of 42mA from 1V supply. #### 3.3.4 210GHz Power Detector Design Fig. 3.21 shows the schematic of OOK envelope detector followed by a baseband amplifier. The series-shunt CPW transmission lines are used for input matching. Fig. 3.21 The schematic of the 210GHz CMPS OOK envelop detector (*M1* and *M2*) followed by a baseband amplifier As shown in Fig. 3.22(a), the input return loss is less than -10dB at the center frequency of 200GHz over a bandwidth of 20GHz. For the envelop detector, the transistors M1 and M2 are biased in class-AB operation region so as to provide square-law relationship between $V_{in}$ and $V_D$ . When differential voltage $V_{in}$ (= $V_{in+}$ – $V_{in-}$ ) is larger than zero, M2 turns off and M1 turns on, drawing a current which is proportional to $V_{in}^2$ . This square-law characteristic has been verified by simulation of the detector, which is shown in Fig. 3.22(b). $V_D$ is expressed as [41], [46] $$V_D \cong \frac{1}{4} \mu_n C_{ox} \left(\frac{W}{L}\right)_{1,2} R_L V_{in}^2. \tag{3.5}$$ Fig. 3.22 simulation results for detector (a) Input return loss (b) output swing of $V_D$ versus $V_{in}$ for input signal of 200 GHz (c) responsivity versus $P_{in}$ with input carrier frequency of 200 GHz and baseband signal with BW of 5 GHz Operation in the square-law region increases the $V_D$ 's swing and improves the detector's responsivity. Shown in Fig. 3.22(c) is the simulation result of the detector responsivity vs. input power at the input carrier frequency of 200 GHz and baseband signal with bandwidth of 5 GHz. As can be seen, the responsivity varies within 10% around the nominal value of 2.5KV/W as input power $P_{in}$ varies from -30dBm to -15dBm. This result is consistent with square-law operation. The responsivity can be further improved with additional baseband amplifier at the expense of lower dynamic range. Note that the load resistor RL with the parasitic capacitance of the baseband amplifier forms a low pass filter which filters out undesired harmonics generated by M1 and M2 and the 200 GHz ripples from input signal. ## 3.4 Measurement Results The 210GHz TRX chip has been fabricated in a 32nm SOI CMOS with $f_T/f_{max}$ = 250GHz/320GHz. Fig. 3.23 shows the die photo of the TRX chip. The TX and RX occupy $1.4\times2.5$ mm<sup>2</sup> and $0.8\times1.4$ mm<sup>2</sup> of chip areas, respectively (including pads). The on-chip antennas are fabricated on the substrate with resistivity of $13.5\Omega$ -cm without any post process. The array elements are separated by 820µm, which corresponds to approximately $0.57\times\lambda$ at 210GHz. A chip-on-board assembly is used to characterize the performance of the 210GHz TRX chip. Owing to the on-chip antenna integration, low cost assembly was achieved without any millimeter-wave bonding. Fig. 3.23 Die photo of the TRX The TX spectral measurement was carried out by utilizing the set up shown in Fig. 3.24(a). The transmitted signal from the TX chip is captured by a VDI WR5.1 horn antenna and then down-converted by a VDI WR5.1 sub-harmonic mixer. The W-band LO generation chain is composed of a signal generator and a W-band tripler. The down-converted IF signal is monitored by a spectrum analyzer. The IF spectrum exhibits modulated signal with the center tone located at 3.97GHz shown in Fig. 30(b). The harmonic number for the WR5.1 sub-harmonic mixer is two, and the actual RF frequency from the TX is 210.97GHz. Two sidebands in Fig. 3.24(b) represent the OOK modulated signals, which are 1GHz apart from the carrier tone. Since conversion gain of the sub-harmonic mixer was not calibrated, accurate power measurement needs to be done by employing another method, as explained below. Fig. 3.24 Transmitter spectrum measurement (a) test set up (b) measured IF spectrum after down-conversion Considering that the path loss from the TX antenna to the RX antenna for 210GHz signal is quite high (i.e., about 61.8dB for a TX-RX distance of 14cm), the power captured by power sensor is less than $1\mu$ W. This makes it quite difficult to measure the TX power directly using power sensor. Placing the RX antenna closer to the TX can potentially yield higher power. However, the path loss estimated by Friis formula is not accurate when the RX antenna is not located in the far field region. The measurement setup of Fig. 3.25(a) can measure the TX power for this OOK-modulation-based TX. A 1 KHz signal is used to modulate the 210 GHz carrier signal. After being captured by the WR5.1 horn antenna, the signal is detected by a G-band detector. A lock-in amplifier senses the voltage difference between on-state and off-state signals. The responsivity of external G-band detector has been well calibrated using an external G-band source with different input power. Accounting for a responsivity of 440V/W for the detector and 21dBi of gain for VDI WR5.1 horn antenna in Fig. 3.25(a), the captured power translates to a broadside EIRP of 5.13dBm. This 5.13dBm EIRP exhibits 10dB back-off from saturated power of 15.2dBm (4.6dBm Psat of one PA + 6dB combining gain + 4.5dBi antenna gain), which is due to insufficient power from on-chip LO and routing loss. By rotating the horn antenna, the TX radiation pattern is also measured, as shown in Fig. 3.25(b). The measured beamwidth remains almost constant across the band, and it is 57° and 54° at 208GHz and 212GHz, respectively. Fig. 3.25 Transmitter power measurement (a) test set up (b) measurement radiation pattern A modulated continuous wave (CW) wireless testing between TX and RX chip has been performed over 3.5cm distance, as shown in Fig. 3.26(a). A baseband signal is sent to TX chip and then modulated to a 210GHz carrier and radiated out. The RX chip captures the 210GHz signal and detects the baseband signal and then monitors using a spectrum analyzer. Fig. 3.26 Continuous wave wireless link over 3.5cm (a) test set up (b) measured baseband spectrum after receiver Fig. 3.27 Measured output SNR with respect to baseband signal frequency The output noise spectral density was also measured from spectrum analyzer. Accounting for RX bandwidth of 20GHz, the output signal-to-noise ratio (SNR) was thus calculated. An external low noise baseband amplifier is used to help increase the strength of the received signal. The amplifier's noise has negligible contribution to the output SNR, as it is attenuated by the LNA gain. By sweeping the frequency of baseband signal, the output SNR for different CW frequencies has been obtained, as shown in Fig. 3.27. Considering that the TRX system operates in linear region (TX is 8dB back off from OP1dB), the thermal noise limits the performance of the wireless links rather than non-linearity. In this case, with the requirement of SNR of 13dB (corresponding to BER of 10-5 for non-coherent OOK modulation), the maximum baseband frequency is 10GHz, which corresponds to a potential data rate of between 10Gbps for no Root-Raised-Cosine (RRC) filtering and 20Gbps for ideal filtering. The RX sensitivity of -47dBm corresponds to an RX NF of around 12dB. The measured performance of the 210GHz TRX is summarized in Table 3.1. Table 3.1 Performance summary of the 210GHz CMOS transceiver | vco | | LNA | | | |------------------------|-------------------------|---------------------------|---------|--| | Output Power | -13.5dBm | S21 | 18dB | | | Tuning Range | 204.7-212.7 GHz | Bandwidth | >15GHz* | | | Phase noise | -81dBc/Hz @ 1MHz offset | Return Loss | >8dB | | | Power Consumption | 42mW | | | | | PA | | TRX | | | | | | EIRP | 5.13dBm | | | Power Gain | 15dB | RX's minimum NF | 11 dB | | | OP1dB | 2.7dBm | Bandwidth | >14GHz* | | | Psat | 4.6dBm | Beamwidth | 57deg | | | PAE | 6% | TX Power Consumption | 240mW | | | Power Consumption 40mW | | RX Power Consumption 68mV | | | <sup>\*</sup> Measurement limited to 220GHz. Table 3.2 compares this work with other state-of-the-art silicon-based signal sources or TRXs. Since this design is a transceiver operating at fundamental frequency instead of harmonic-based solution, this work exhibits much higher efficiency compared to prior arts. Table 3.2 Comparison table | | [11] | [12] | [13] | [14] | This work | |-------------------------|------------------|------------------|-------------------------|--------------------------------|---------------------------------------| | Technology | 45nm SOI<br>CMOS | 45nm SOI<br>CMOS | 0.13μm BiCMOS | 65nm CMOS | 32nm SOI CMOS | | Frequency | 291GHz | 280GHz | 380GHz | 260GHz | 210GHz | | Architecture | 2×2 DAR | 4×4 DAR | Quadrupler-based<br>TRX | 2×2<br>Quadrupler-based<br>TRX | 2×2 Fundamental Frequency TRX | | Modulation | None | None | FMCW | OOK | OOK | | EIRP [dBm] | -1 | 9 | -13 | 5 | 5.13<br>(15.2 @ P <sub>sat</sub> ) * | | P <sub>DCTX</sub> [mW] | 74.8 | 430 | 182 | 688 | 240 | | EIRP/P <sub>DCTX</sub> | 1.1% | 2% | 0.028% | 0.46% | 1.4%<br>(>6.9% @ P <sub>sat</sub> ) * | | Area [mm <sup>2</sup> ] | 0.64 | 7.29 | 4.18 | 6 | 3.5 (TX) + 1.12 (RX) | <sup>\*</sup> The EIRP if the PAs are fully driven is 15.2 dBm (4.6dBm Psat of one PA + 6dB combining gain + 4.5dBi antenna gain). With a stronger PA driver, the power consumption is assumed to get doubled (a conservative estimation) to be 480mW, and the expected EIRP/P<sub>DCTX</sub> is 6.9%. #### 3.5 Chapter Summary A fully-integrated 210GHz fundamental TRX with on–chip antenna is demonstrated in 32nm SOI CMOS technology. To be best knowledge of authors, this was the first known demonstration on a 210GHz fundamental CMOS TRX. Neutralization technique was revisited and over-neutralization technique is proposed to boost the power gain effectively for transistors in the near- $f_{max}$ region. A first CMOS PA is demonstrated at 210GHz with measured 15dB gain (5dB gain per stage) and 4.6dBm output saturation power. A CMOS LNA with 18dB gain was achieved at 210GHz with more than 15GHz bandwidth. The 2×2 spatial combining TX achieves an EIRP of 5.13dBm at 10dB back-off from saturated power. It achieves an estimated EIRP of 15.2dBm when the PAs are fully driven. This work demonstrates the feasibility of using CMOS technology for chip-to-chip wireless communication with potential data rate of between 10Gbps for no RRC filtering and 20Gbps for ideal filtering. # **Chapter 4 A W-band Passive Imaging with Spatial-Overlapping Super Pixels** #### 4.1 Proposed 9-Element Receiver Array A traditional focal plane array is implemented as an $M \times N$ array of elements, with each element comprising an antenna and its associated LNA. In this traditional array, each element feeds one detector, representing one pixel of an image. Image acquisition time for systems employing focal plane arrays is determined by several factors, including the array size, the pixel spacing, the focusing system and the desired spatial resolution of the object. If the object cannot be imaged with one scan, then decreasing scan time requires one or more of the following actions such as, increased array size, reduced pixel spacing, and changes in the focusing system. A 2×2 array of elements feeding one detector and constituting one pixel is depicted in Fig. 4.1. If wideband delay and amplitude control are incorporated in each RF-path from element to detector, then this 2×2 array becomes a super-pixel, with delay and gain weightings for each RF path. These weightings enable the super-pixel to constrict the field of view of the feed antenna so that more energy illuminates the lens and less is lost to spillover, with a resulting SNR improvement dependent upon the original spillover losses. Fig. 4.1 A 2×2 subarray Fig. 4.2 Conventional array of non-overlapping subarrays Fig. 4.3 New array of overlapping subarrays A straightforward way of aligning super-pixels within a larger array is to place each super-pixel such that the element-to-element spacing of adjacent super-pixels remains at d/2, as shown in Fig. 4.2. In this case, although the element spacing remains the same when compared to a traditional array, the pixel spacing is increased resulting in a physically larger array for the same number of pixels. While the optics can be chosen to match this array design, the increased physical size of the array may be a problem in some applications. A hybrid architecture has been developed to take advantage of the traditional focal plane array pixel spacing and the super-pixel capability. In this new architecture (Fig. 4.3), the super-pixels overlap and share neighboring elements, and hence, are referred to as spatial-overlapping super-pixels. Each detector is still fed by four elements as in a conventional array of super-pixels, but now each element feeds up to four detectors depending on the element's location within the array (Fig. 4.3). In an imaging receiver, trade-off exists between scene coverage in the far field and efficiency due to spillover losses. Ideally, one would like to produce patterns that exactly fill the primary optics without losing energy on the sides (i.e., spillover) while simultaneously covering the field of view without gaps. While the feed element spacing depends on the geometry of the primary optics, specifically the F-number (defined as the ratio of the focal length to the lens diameter), the use of overlapped super-pixels will reduce spillover losses of the primary optics. Moreover, spillover and taper efficiency can be controlled by coherently combining the outputs of a number of array elements [46] (e.g., four in this work), hence, the concept of overlapping super-pixels. The spatial-overlapping super-pixels can provide subarray patterns that cover the field of view (FOV), thereby reducing spillover losses of the primary optics. The spatial-overlapping super-pixels will help improve feed efficiency, while maintaining good far field coverage. Fig. 4.4 Amplitude and delay weighting coefficient diagram Suppose that in an $M \times N$ -element array, $V_{in}(k,l,t)$ denotes the signal received by each element (k,l), $1 \le k \le M$ and $1 \le l \le N$ ; and $V_C(m,n,t)$ is the signal at the combiner's output (m,n), $1 \le m \le M-1$ and $1 \le n \le N-1$ (cf. Fig. 4.4). From each element (k, l), four paths will be emanating to four separate super-pixels $(\mu,\nu)$ where $k-1 \le \mu \le k$ and $l-1 \le \nu \le l$ , each of which has its own amplitude and time-delay weighting coefficient, as shown in Fig. 4.4. $V_C$ is related to $V_{in}$ of each element, as shown in Eq. (4.1): $$V_C(m,n,t) = \sum_{k=m}^{m+1} \sum_{l=n}^{n+1} A_{k,l}^{(m,n)} V_{in}(k,l,t-\tau_{k,l}^{(m,n)}).$$ (4.1) where $A_{k,l}^{(m,n)}$ and $\tau_{k,l}^{(m,n)}$ represent the gain and delay weights associated with element (k, l) going to super-pixel (m,n). As it is clear from (3) and Fig. 4.4, each super-pixel $V_C$ shares two signals $V_{in}$ with adjacent super-pixels. This means that each super-pixel is now "intentionally" correlated with its adjacent super-pixels, which leads to image smoothing mechanism in the RF domain prior to power detection. This averaging mechanism can be turned off with proper selection of the element weightings. More precisely, turning off three RF paths forces the detector to process only one of the elements, thereby disabling the super-pixel functionality. In addition to an element's RF path being either on or off, there are multiple gain and delay states, which allows flexibility in signal combining. The use of amplitude and phase (or time delay for a wideband system) controls offer a variety of application-specific advantages: (1) In an imaging array receiver with a reflecting lens, the focus spot starts broadening out as the received beam angle moves away from the reflector boresight, resulting in a wider spot size [46]-[50]. Therefore, it is not possible to collect all the energy by just stepping through feed antennas successively further from the boresight. However, it is possible to collect the energy in the broadened focus spot by applying proper phase and amplitude weightings to the array elements [46]-[51]. More precisely, if the focal system tries to image an object away from the main axis, the corresponding image appears distorted and loss of spatial resolution may occur. "Super-pixels" are able to compensate for these effects with phase and amplitude adjustments and potentially increase the FOV of the focal plane array with a focusing system without resorting to mechanical scans [46], [48], [50]. (2) The concept of using a phased array within the focal plane of a reflector has been used in radio astronomy [46], [52] and array-fed reflectors [51]. The primary motivation for focal plane phased array feeds (PAFs) in reflector antennas in these systems is to achieve wide fields of view with multiple, electronically steered beams so as to form a radio image with a single dish pointing at the target. (3) The use of phase and amplitude control in radio astronomy systems reduces side lobes and positions the location of nulls to reject interfering signals, which leads to an improvement of SNR. (4) The use of phase and amplitude control allows for compensation of mutual couplings between elements, as stated in [46]. (5) For a pixel realized using a single element, directivity decreases with increasing scan angle (relative to pixels located away from the focal point). Instead, using super-pixels with the ability to control amplitude and phase, the directivity can be kept approximately constant with increasing scan angle. Fig. 4.5 Single RF path circuit diagram To successfully leverage any correlation present among the element signals, the gain and delay of each RF path from element to detector must be wideband and adjustable. This requires an adjustable wideband true time delay (TTD) circuit and a wideband VGA in each RF path. Taking all of this into consideration, a single RF path can be defined as shown in Fig. 4.5. The 1:4 power splitter is required so that each element may feed up to four power detectors, and the 4:1 power combiner is required so that each power detector is fed by four elements. The entire RF path before the detector is constructed as a ground-shielded coplanar waveguide (CPW). The TTD circuit has been incorporated into the splitter, providing a significant area savings. A single super-pixel is therefore constructed by combining four RF paths to feed one detector. A 3-D diagram of a complete super-pixel circuit is shown in Fig. 4.6. Noticeably missing from Fig. 4.6 is a Dicke switch. Taking into account (1) the high pre-detector gain (i.e., 40–48dB for LNA+VGA), (2) the low 1/f noise corner typical of SiGe radiometers [53], and (3) the use of a cooling system to thermally stabilize the chip, the traditional Dicke architecture was not used in this design. However, the ability to electronically chop noise by modulating the input to the detector through software control was included as a backup. Knowing the advantage of a Dicke switch based on our prior research [54], [41], in this work we focus on introducing the new concept of spatially overlapping super-pixels in the design. Fig. 4.6 Single super-pixel circuit diagram Fig. 4.7 Complete array block diagram The complete $3\times3$ -element array block diagram corresponding to a $2\times2$ super-pixel array with spatial-overlapping super-pixels is shown in Fig. 8. The detectors corresponding to the four super-pixels are labeled 1-4. Two unused groups of combiners and detectors have been circled. They are not used in the $3\times3$ design, but are included for layout symmetry and to demonstrate the architecture extendability to any arbitrary $M\times N$ array of elements. ## 4.2 Design of a W-band Power Splitter with Built-in True Time Delay Circuit The details of building blocks are included in [55]. This session will mainly focus on one of most important building block – True Time Delay circuit. As illustrated in the previous session, each element of the proposed imaging array receiver of Fig. 4.5 feeds up to four detectors via a wideband 1:4 power splitter. Therefore, the super-pixels in this imaging system need to provide a wideband delay control mechanism. Owing to inherently wideband characteristics, a distributed architecture is considered as a favorable choice for the realization of the TTD circuit [56], [57]. Fig. 4.8 Three-stage distributed true time delay circuit Shown in Fig. 4.8 is an example of the distributed TTD circuit comprising three unit cells with variable delay distributed along CPW-based T-lines. Although the prototype unit cell is based on [57], there are distinct differences between them, namely, the inductors are replaced with CPW T-lines, a series capacitor is added at the base of the $g_m$ transistor $Q_3$ , and an additional T-Line $T_e$ is added at the common emitter nodes of the differential pair to increase bandwidth by resonating out the undesired parasitic capacitance at these nodes. The unit cell's output phase is [57]: $$\angle V_{o1} = -\tan^{-1} \frac{\sin(\omega \tau_d)}{a_2/a_1 + \cos(\omega \tau_d)}.$$ (4.2) with $$\tau_d = \sqrt{L_{eff}C_{eff}}, \quad a_1 = \frac{g_{m1}}{g_{m1} + g_{m2}}, \quad a_2 = \frac{g_{m2}}{g_{m1} + g_{m2}}.$$ (4.3) where $\tau_d$ denotes the unit delay of each unit cell [ $L_{eff}$ and $C_{eff}$ are the inductance and capacitance of the T-line section within the cell, respectively], $a_1$ and $a_2$ are current gains of the differential pair transistors ( $a_1 + a_2 = 1$ ). Hence, the group delay $T_{delay}$ of this unit cell is $$T_{delay}(\omega) = -\frac{d \angle V_{o1}(\omega)}{d\omega} = \tau_d \frac{1 + a_2 / a_1 \cdot \cos(\omega \tau_d)}{1 + (a_2 / a_1)^2 + 2a_2 / a_1 \cdot \cos(\omega \tau_d)}.$$ (4.4) For flat delay across frequency, the group delay should be frequency independent, i.e., $$\frac{dT_{delay}(\omega)}{d\omega} = 0. (4.5)$$ $$=> \tau_d^2 \frac{-a_2/a_1 \sin(\omega \tau_d) \cdot (a_2^2/a_1^2 - 1)}{\left[1 + (a_2/a_1)^2 + 2a_2/a_1 \cdot \cos(\omega \tau_d)\right]^2} = 0.$$ $$=> a_1 = 0, \text{ or } a_2 = 0, \text{ or } a_1 = a_2$$ (4.6) Eq. (4.6) indicates that only three bias conditions ( $a_1$ =0, $a_2$ =0 or $a_1$ = $a_2$ ) for the differential pair of Fig. 4.8 will lead to a broadband flat group delay. Furthermore, because the condition $a_1$ = $a_2$ is more vulnerable to process/voltage/temperature variation, binary tuning ( $a_1a_2$ =01 or $a_1a_2$ =10) is used to realize binary broadband delay (0 or $\tau_d$ ). The delay of each cell within the distributed TTD is controlled by complementary digital signals applied at the inputs of the differential pair in each cell. This ensures that for all delay settings the $g_m$ transistor's DC current and parasitic capacitor $C_\pi$ will remain constant, and in so doing maintain constant characteristic impedance of the input/output T-lines. Fig. 4.9 1:2 Active power splitter with true time delay control Fig. 4.10 1:4 Active power splitter with true time delay control A 1:2 distributed active power splitter with TTD control (Fig. 4.9) is constructed by sharing the input base T-lines of two identical distributed TTD circuits, while a 1:4 power splitter with TTD control is constructed by cascading two 1:2 power splitters. The input power is equally split among the 4 paths, with independent delay control of each path. The circuit layout of this power splitter is designed symmetrically; therefore, the routing loss through the splitter is the same for every path. Fig. 4.11 Chip micrograph of the 1:4 active power splitter with true time delay control Fig. 4.12 Measured true time delay circuit s-parameters across all delay states. (a) $S_{11}$ (b) $S_{22}$ (c) $S_{21}$ A breakout circuit of the TTD/splitter block is shown in Fig. 4.11. The measurements (Fig. 4.12) are done using on-wafer probe testing and a VNA, and show less than 3dB gain roll-off across all the delay settings from 80–105GHz; absolute gain is -2.2dB ±0.6dB at 100GHz. The gain variation across all delay states is compensated by a subsequent VGA stage; therefore the gain variation does not significantly affect the antenna pattern. Fig. 4.13 (a) Measured phase vs. frequency response and best-fit straight lines of the distributed active power splitter with true time delay for 7 delay settings. (b) Error of measured delay with respect to best-fit data versus frequency Fig. 4.13(a) shows the measured phase response with respect to frequency for the distributed active power splitter and true time delay breakout circuit along with best-fit straight lines for 7 different delay settings. This measured linear response for each delay setting across the 80-105GHz range demonstrates the nearly constant group delay versus frequency characteristic for each delay setting. The error of the measured delay with respect to the best-fit straight line data versus frequency across the 80-105GHz range is shown in Fig. 4.13(b). The measured wideband variable delay of each path is controllable from 0ps to $\pm 2$ ps with 0.5ps steps, corresponding to $0^{\circ}$ to $\pm 18^{\circ}$ for beam steering with $4.5^{\circ}$ steps. #### 4.3 Measurement Results The proposed 3×3-element array was fabricated in a SiGe BiCMOS process and a die photo is shown in Fig. 4.14. The entire imaging chip occupies an area of 6×6mm<sup>2</sup>, including all the pads. Fig. 4.14 Die photo of the 9-element imaging array receiver A super-pixel is used for antenna beam steering measurements to demonstrate the wideband delay and amplitude control of the super-pixel. The super-pixel was programmed for beam angles from -18° to 18° in 4.5° steps. For each setting the transmit power and frequency remained constant, the receiver-to-transmitter angle was varied from -60° to +60° in 1° increments and the received power was measured. A single RF path was then turned on and the test was repeated. Fig. 4.15 Subarray beam steering radiation patterns compared with single antenna radiation patterns Fig. 4.15 shows super-pixel beam steering radiation patterns compared with a single antenna radiation pattern. Next, with a single RF path on and with constant transmit power and frequency, the receiver-to-transmitter angle was varied from -60° to 60° in 1° increments. The VGA gain settings were stepped from 3 to 9dB (i.e., 6dB of gain tuning range) in 1dB increments at each angle and the received power was measured. Fig. 4.16 shows a single antenna's radiation pattern in one quadrant under these settings. Fig. 4.16 Single antenna beam pattern for various VGA gain settings Fig. 4.17 shows the images from the outputs of four individual super-pixels, obtained by 4mm spatial sampling, compared with the image obtained by combining the four overlapping super-pixels' data, resulting in 2mm sampling. It can be visibly seen that the 2mm over-sampled image results in a sharper spatial resolution. Fig. 4.17 Reconstructed images from the outputs of the four individual 2×2 subarrays, along with the composite image obtained by combining the four overlapping subarray images The complete measured performance of the proposed 9-element imaging array receiver is summarized in Table 4.1. The proposed 6mm×6mm BiCMOS chip with 9 on-chip antennas forming a 2×2 array of super-pixels displays a measured NETD of 0.45K and a minimum NEP of 0.28fW/Hz<sup>1/2</sup>. Table 4.2 shows comparisons between this work and other W-band imaging receiver chips. The proposed imaging receiver reports the lowest measured NETD and NEP of silicon-based imaging receivers along with the highest level of integration of any technology. Table 4.1 Summary of the Receiver Array Performance | Pre-detection Gain | 43dB | | | |-------------------------------------------------------------------------|-----------------------------------------------------------------------------------|--|--| | Receiver Array 3-dB Bandwidth | 87-108GHz | | | | Minimum/Maximum Subarray Beam-Steering Angle | -18°/+18° | | | | Beam-Steering Angular Resolution | 4.5° | | | | Measured Average Incoherent Responsivity (Passive Hot/Cold Measurement) | 800MV/W | | | | Measured NETD (Averaged Across Multiple Hot/Cold | 0.45K | | | | Measurements) Measured NEP | 0.28fW/Hz <sup>1/2</sup> | | | | Power Dissipation (W) | 2.03W (225mW for each of the 9 elements) | | | | Process Technology | 0.18μm SiGe BiCMOS | | | | Level of Integration | 9 On-Chip Antennas forming 4 Overlapping 2x2 Super-pixels with Baseband Circuitry | | | | Die Area | 6mm×6mm | | | Table 4.2 Performance Comparison of W-Band Imagers | Reference | Technology | Level of<br>Integration | Responsivity | Integration<br>Time | Min NEP | NETD | |-----------|-----------------------|-------------------------|---------------|---------------------|--------------------------|--------------| | [2] | 0.13μm SiGe | Dicke Switch + | 5MV/W | 30ms | 21fW/Hz <sup>1/2</sup> | 0.83K | | | BiCMOS | LNA + Detector | | | | (Calculated) | | [3] | 65nm CMOS | Dicke Switch + | 0.09MV/W | 30ms | 200fW/Hz <sup>1/2</sup> | 10K | | | | LNA + Detector | | | | (Calculated) | | [5] | 0.18µm SiGe<br>BiCMOS | Dicke Switch + | 45MV/W | 30ms | 10fW/Hz <sup>1/2</sup> | 0.4K | | | | LNA + Detector | | | | ****** | | | | + Baseband | | | | (Calculated) | | [6] | 0.18μm SiGe<br>BiCMOS | 4 Element Focal | 48MV/W | 30ms | 8.1fW/Hz <sup>1/2</sup> | | | | | Plane Array with | (285MV/W | | | 3K | | | | On-chip | not including | | | (Calculated) | | | | Antennas and | on-chip | | | (Calculated) | | | | PLL | antenna loss) | | | | | [7] | InP HEMT | LNA + Detector | 0.5MV/W | 3.125ms | Not Reported | 0.45K | | | | Chipset | | | | (Measured) | | This work | 0.18μm SiGe<br>BiCMOS | 9-Element Array | 800MV/W | 20ms | 0.28fW/Hz <sup>1/2</sup> | | | | | with On-chip | | | | 0.45K | | | | Antennas and | | | | (Measured) | | | | Baseband | | | | | #### 4.4 Chapter Summary A highly integrated 9-element imaging receiver comprising a 2×2 super-pixel array with spatial-overlapping super-pixels with integrated antennas was designed and fabricated in a low cost BiCMOS process. A unique overlapping super-pixel concept was demonstrated, which employed mechanisms for 7-step amplitude and 9-step delay control settings. Passive hot/cold tests were performed and thermal resolution of 0.45K was measured. To the authors' knowledge, this is the first reported measured NETD of a fully-integrated silicon radiometer containing integrated antennas directly measuring an object's temperature. ## **Chapter 5 Conclusions** In this dissertation, a CMOS solution of a fully-integrated 210GHz fundamental transceiver with on-chip antenna is presented. To be best knowledge of authors, this was the first known demonstration on a 210GHz fundamental CMOS TRX. Neutralization technique was revisited and over-neutralization technique is proposed to boost the power gain effectively for transistors in the near- $f_{max}$ region. A first CMOS PA is demonstrated at 210GHz with measured 15dB gain (5dB gain per stage) and 4.6dBm output saturation power. A CMOS LNA with 18dB gain was achieved at 210GHz with more than 15GHz bandwidth. The 2×2 spatial combining TX achieves an EIRP of 5.13dBm at 10dB back-off from saturated power. It achieves an estimated EIRP of 15.2dBm when the PAs are fully driven. This work demonstrates the feasibility of using CMOS technology for chip-to-chip wireless communication with potential data rate of between 10Gbps for no RRC filtering and 20Gbps for ideal filtering. A W-band direct-detection-based receiver array is also presented using a new concept of spatial-overlapping super-pixels for millimeter-wave imaging applications in an advanced 0.18µm BiCMOS process. The receiver chip achieves a measured peak coherent responsivity of 1,150MV/W, an incoherent responsivity of 1,000MV/W, a minimum NEP of 0.28fW/Hz<sup>1/2</sup> and a front-end 3-dB bandwidth from 87–108GHz, while consuming 225mW per receiver element. The measured NETD of the SiGe receiver chip is 0.45K with a 20ms integration time. ### **Bibliography** - [1] P. Siegel., "Terahertz Technology," *IEEE Trans. Microwave Theory & Tech.*, vol. 50, no. 3, pp. 910-928, Mar. 2002. - [2] D. Woolard, et al., "Terahertz Frequency Sensing and Imaging: A Time of Reckoning Future Applications?" *IEEE Proceedings*, vol. 93, no. 10, pp. 1722-1743, Oct. 2005. - [3] W. Withayachumnankul, et al., "T-Ray Sensing and Imaging," *IEEE Proceedings*, vol. 95, no. 8, pp. 1528-1558, Aug. 2007. - [4] K. Huang, et al., "Terahertz Terabit Wireless Communication," *IEEE Microwave Magazine*, vol. 12, no. 4, pp. 108-116, Jun. 2011. - [5] I. Hosako, et al., "At the Dawn of a New Era in Terahertz Technology," *IEEE Proceedings*, vol. 95, no. 8, pp. 1611-1623, Aug. 2007. - [6] U. R. Pfeiffer, E. Öjefors, A. Lisauskas, and H. G. Roskos, "Opportunities for silicon at mmWave and Terahertz frequencies," in *Bipolar/BiCMOS Circuits and Technology Meeting*, *BCTM* 2008, Oct. 2008, pp. 149–156.. - [7] C. Jastrow, et al., "Wireless digital data transmission at 300 GHz," *IEEE Electron. Lett.*, vol. 46, no. 9, pp. 661–663, Apr. 2010. - [8] A. Hirata, et al., "120-GHz wireless link using photonic techniques for generation, modulation, and emission of millimeter-wave signals," *IEEE J. Lightw Technol.*, vol. 21, no. 10, pp. 2145–2153, Oct. 2003. - [9] H. Song, et al., "Multi-gigabit wireless data transmission at over 200-GHz," in 34th Int. Conf. on Infrared, Millim., and Terahertz Waves (IRMMW-THz 2009), pp. 1–2, 2009. - [10] T. Nagatsuma, et al., "Millimeter-wave photonic integrated circuit technologies for high-speed wireless communications applications," *IEEE ISSCC Dig. Tech. Papers*, pp. 448–449, Feb 2004. - [11] I. Kallfass, et al., "All Active MMIC-Based Wireless Communication at 220 GHz," *IEEE Trans. Terahertz Science and Technology*, vol. 1, no. 2, pp. 477-487, Nov. 2011. - [12] K. Sengupta, et al., "Distributed active radiation for THz signal generation," *IEEE ISSCC Dig. Tech. Papers*, pp. 288-289, Feb. 2011. - [13] K. Sengupta, et al., "A 0.28THz 4×4 power-generation and beam-steering array," *IEEE ISSCC Dig. Tech. Papers*, pp. 256-258, Feb. 2012. - [14] J. Park, et al., "A 0.38THz fully integrated transceiver utilizing quadrature push-push circuitry," *IEEE Symp. VLSI Circuits*, pp. 22-23, June 2011. - [15] J. Park, et al., "A 260 GHz Fully Integrated CMOS Transceiver for Wireless Chip-to-Chip Communication," *IEEE Symp. VLSI Circuits*, pp. 48-49, June 2012. - [16] *International Technology Roadmap for Semiconductors*: 2008 Edition. [On-line: http://www.itrs.net]. - [17] Z. Deng, et al. "A layout-based optimal neutralization technique for mm-wave differential amplifiers," *IEEE RFIC Symp. Dig.*, pp. 355-358, Jun. 2010. - [18] W. Chan, et al. "A 58-65 GHz Neutralized CMOS Power Amplifier With PAE Above 10% - at 1-V Supply, "IEEE J. Solid-States Circuits, vol. 45, no. 3, pp. 554-564, Mar. 2010. - [19] D. Chowdhury, et al. "Design Considerations for 60 GHz Transformer-Coupled CMOS Power Amplifiers," *IEEE J. Solid-States Circuits*, vol. 44, no. 10, pp. 2733-2744, Oct. 2009. - [20] S. Mason, "Power gain in feedback amplifier," *IEEE Trans. Circuit Theory*, vol. CT-1, no. 2, pp. 20-25, 1954. - [21] D. Pozar, *Microwave engineering*, 4th ed., Hoboken: Wiley, 2012. - [22] M. Gupta, "Power gain in feedback amplifiers, a classic revisited," *IEEE Trans. Microwave Theory & Tech.*, vol. 40, no. 5, pp. 864-879, May 1992. - [23] A. Singhakowinta and A. R. Boothroyd, "On linear two-port amplifier", *IEEE Trans. Circuit Theory*, vol. CT-11, no.1, pp. 169, 1964 - [24] A. Singhakowinta and A. R. Boothroyd, "Gain Capability of Two-port Amplifiers", International Journal of Electronics, vol. 21, no.6, pp. 549-560, 1966 - [25] R. Spence, *Linear Active Networks*, Wiley-Interscience, 1970. - [26] Z. Wang, et al. "A 210GHz fully integrated differential transceiver with fundamental-frequency VCO in 32nm SOI CMOS," *IEEE ISSCC Dig. Tech. Papers*, pp. 136-137, Feb. 2013. - [27] Z. Wang, et al. "A CMOS 210-GHz Fundamental Transceiver With OOK Modulation," *IEEE J. Solid-States Circuits*, vol. 49, no.3, pp. 564 580, Mar. 2014. - [28] C. Balanis, Antenna theory: analysis and design, 2nd ed., New York: Wiley, 1997. - [29] K. Ang, et al. "Analysis and design of impedance-transforming planar Marchand baluns," *IEEE Trans. Microwave Theory & Tech.*, vol. 49, no. 2, pp. 402-406, Feb. 2001. - [30] M. Chongcheawchamnan, et al. "On miniaturization isolation network of an all-ports matched impedance-transforming Marchand Balun," *IEEE Microwave and Wireless Components Letters*, vol. 13, no. 7, pp. 281-283, Jul 2003. - [31] A. Chen, et al. "A Novel Broadband Even-Mode Matching Network for Marchand Baluns," *IEEE Trans. Microwave Theory & Tech.*, vol. 57, no. 12, pp. 2973-2980, Dec. 2009. - [32] C. Liang, et al. "Systematic transistor and inductor modeling for millimeter-wave design," *IEEE J. Solid-State Circuits*, vol. 44, no. 2, pp. 450–457, Feb. 2009. - [33] C. Chan, et al. "Wiring effect optimization in 65-nm low-power NMOS," *IEEE Electron Device Lett.*, vol. 29, no. 11, pp. 1245–1248, Nov. 2008. - [34] C. Liang, et al. "A layout technique for millimeter-wave PA transistors," *IEEE RFIC Symp. Dig.*, pp. 1-4, Jun. 2011. - [35] H. Jhon, et al. "fmax Improvement by Controlling Extrinsic Parasitics in Circuit-Level MOS Transistor," *IEEE Electron Device Lett.*, vol. 30, no. 12, Dec. 2009. - [36] C. Chan, et al. "Wiring Effect Optimization in 65-nm Low-Power NMOS," *IEEE Electron Device Lett.*, vol. 29, no. 11, pp. 1245-1248, Nov. 2008. - [37] S. Voinigescu, *High Frequency Integrated Circuits*, Cambridge University Press, 2013. - [38] P. Heydari, "Design and Analysis of a Performance-Optimized CMOS UWB Distributed LNA," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 1892-1905, Sept. 2007. - [39] J. Goo, et al. "A noise optimization technique for integrated low-noise amplifiers," *IEEE J. Solid-State Circuits*, vol. 37, no. 8, pp. 994-1002, Aug. 2002. - [40] P. Chiang, et al. "A Highly Efficient 0.2 THz Varactor-Less VCO with -7 dBm Output Power in 130nm SiGe", *IEEE CSICS Symp.*, pp. 1-4, Oct. 2012. - [41] Z. Chen, et al. "A BiCMOS W-Band 2×2 Focal-Plane Array With On-Chip Antenna," IEEE J. Solid-State Circuits, vol. 47, no. 10, pp. 2355-2371, Oct. 2012. - [42] Z. Chen, et al. "W-band frequency synthesis using a Ka-band PLL and two different frequency triplers," *IEEE RFIC Symp. Dig.*, pp. 1-4, Jun. 2011. - [43] C. Wang, et al. "W-Band Silicon-Based Frequency Synthesizers Using Injection-Locked and Harmonic Triplers," *IEEE Trans. Microwave Theory & Tech.*, vol. 60, no. 5, pp. 1307-1320, May 2012. - [44] B. Jung, et al. "High-frequency LC VCO design using capacitive degeneration," *IEEE J. Solid-States Circuits*, vol. 39, no. 12, pp. 2359-2370, Dec. 2004. - [45] J. Hong, et al. "Low Phase Noise Gm-Boosted Differential Gate-to-Source Feedback Colpitts CMOS VCO," *IEEE J. Solid-States Circuits*, vol. 44, no. 11, pp. 3079-3091, Nov. 2009. - [46] J. Lee, et al. "A Low-Power Low-Cost Fully-Integrated 60-GHz Transceiver System With OOK Modulation and On-Board Antenna Assembly," *IEEE J. Solid-State Circuits*, vol. 45, no.2, pp. 264-275, Feb. 2010. - [47] J. Richard Fisher, "Phased Array Feeds for Low Noise Reflector Antennas," National - Radio Astronomy Observatory, Green Bank, West Virginia, 1996. - [48] R. M. Davis, et al., "A Scanning Reflector Using an Off-Axis Space-Fed Phased-Array Feed," *IEEE Transactions on Antenna and Propagation*, 1991. - [49] A. V. Mrstik, et al., "Scanning Capabilities of Large Parabolic Cylinder Reflector Antennas with Phased-Array Feeds," *IEEE Transaction on Antenna and Propagation*, 1981. - [50] S. J. Blank, et al., "Array Feed Synthesis for Correction of Reflector Distortion and Vernier Beamsteering," *IEEE Transaction on Antennas and Propagation*,1998. - [51] Y. Rahmat-samii, et al., "Directivity of Planar Array Feeds for Satellite Reflector Applications," *IEEE Transactions on Antennas and Propagation*, 1983. - [52] J. Landon, et al., "Phased Array Feed Calibration, Beamforming, and Imaging", *The Astronomical Journal*, 139:1154–1167, 2010 March. - [53] E. Dacquay, et al., "D-Band Total Power Radiometer Performance Optimization in an SiGe HBT Technology," *IEEE Transactions on Microwave Theory and Techniques*, vol. 60, no. 3, pp. 813-826, March 2012 - [54] L. Gilreath, V. Jain, and P. Heydari, "Design and Analysis of a W-Band SiGe Direct-Detection-Based Passive Imaging Receiver," *IEEE J. Solid-State Circuits*, vol. 46, no. 10, pp. 2240-2252, Oct. 2011. - [55] F. Caster, et al., "Design and Analysis of a W-band 9-Element Imaging Array Receiver Using a New Concept of Spatial-Overlapping Super-Pixels in Silicon," *IEEE J. Solid-State Circuits*, vol. 49, no. 6, Jun. 2014. - [56] P. Heydari, "Design and Analysis of a Performance-Optimized CMOS UWB Distributed LNA," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 1892-1905, Sept. 2007 - [57] A. Safarian, L. Zhou, and P. Heydari, "CMOS Distributed Active Power Combiners and Splitters for Multi-Antenna UWB Beamforming Transceivers," *IEEE J. Solid-State Circuits*, vol. 42, no. 7, pp. 1481-1491, July 2007