Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Targeted Learning of High-dimensional Parameters and Its Finite Sample Inference

Abstract

Targeted maximum likelihood estimator (and semiparametric efficient estimators in general) involves deriving the efficient influence function of target parameters and adjusting an estimate of the data distribution towards the target estimand. This adjustment step requires fitting a least favorable submodel on the initial estimator with the same dimensionality of the parameter, which can become unstable for high-dimensional target parameters.

Another direction that will vastly improve the credibility of these semiparametric estimators is to improve the finite-sample coverage of confidence intervals.

In this dissertation, we first study the robust estimation of high-dimensional target parameters. Then we investigate how to perform finite sample inference in a large semi-parametric model. We also build an estimator that is simultaneously efficient for a large family of target parameters by undersmoothing a single regression.

In Chapter 1, we propose using universal least favorable submodel to robustly estimate high-dimensional target parameters, with applications to survival analysis. We establish a novel connection between a universal least favorable submodel and moving along a sparse local least favorable submodel, and demonstrate the extensions in survival analysis when the whole survival curve needs to be nonparametrically estimated and given statistical inference. We assess the finite sample performance in both a simulation study and an observational study on monoclonal gammopathy.

In Chapter 2, we theoretically develop and extend nonparametric bootstrap inference for the targeted maximum likelihood estimator (TMLE). We establish a formal theorem showing that the nonparametric bootstrap is an asymptotically valid procedure for finite sample TMLE inference using highly-adaptive LASSO (HAL) as the nuisance parameter estimator and demonstrate superior coverage than existing influence-function-based methods. This article explores the problem of applying semiparametric models and machine learning algorithms to small datasets and still have honest causal and statistical inference. Prior to this work, one either has to run nonparametric bootstrap by assuming small parametric models or do estimation in a large semiparametric model where the nonparametric bootstrap has no theoretical guarantee. We propose an effective tuning parameter selection method that optimizes confidence interval coverage (rather than estimation precision) which shows good coverage even for non-doubly robust causal parameters.

In Chapter 3, we propose two efficient estimators based on highly-adaptive LASSO (HAL): targeted HAL and undersmoothed HAL.

Using undersmoothed HAL to estimate the likelihood gives us an efficient estimator for a large family of target parameters. The key is to propose a strategy to choose the tuning parameter that results in a sectional variation norm larger than the one selected using cross-validation. In this chapter, we propose a `multi-task tuning' method that can be generally applied to a wide range of target parameters.

The second method called targeted HAL solves the efficient score equations by including an additional covariate into the LASSO design matrix that targets the statistical parameter of interest.

We provide examples of our methods for estimating the average treatment effect and illustrate using two simulations where one favors inverse probability weighting methods (such as estimating equations and TMLE) and another challenging design where there is practical violation of the positivity assumption.

We demonstrate the outstanding performance of the undersmoothed HAL in both scenarios.

We also show theoretical results that shed light on why undersmoothed HAL is performing well in data generating distributions where positivity assumption is violated.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View