This dissertation introduces a time-varying unobserved group-period fixed effect estimator designed to address specific challenges in causal inference. The proposed estimator accommodates scenarios where treated individuals can transition between unobserved groups following treatment. Developed within a difference-in-differences framework, it is particularly valuable for controlling violations of parallel trends arising from unobserved group changes. For example, when estimating the impact of job loss on health outcomes without observing insurance status, the estimator helps account for the confounding effect of losing insurance (due to job loss) on health. Additionally, the approach proves useful for estimating the average treatment effect on the treated (ATT) when treatment compliance is unobserved.
The second chapter introduces a mixed integer optimization (MIO) procedure for estimating individual group assignments. While prior literature has often relied on K-means clustering for identifying unobserved group membership, this approach lacks asymptotic guarantees, and finite sample performance in the presence of non-spherical distributions and outliers. The MIO formulation, by contrast, provides global optimality and asymptotic guarantees, ensuring accurate estimation of group membership and convergence to our theoretical characterization of the estimator's distribution. However, due to computational limitations, the MIO approach becomes infeasible for datasets with more than 200 entities.
To address MIO's computational constraints, the third chapter presents a novel branch-and-bound algorithm leveraging proof that our estimators decision boundary is linear. Instead of directly searching over individual group memberships, the algorithm searches for the linear decision boundary that determines group assignments. This method significantly improves computational efficiency, allowing it to handle large-scale problems. For instance, while the MIO formulation may take months to solve a problem with 1,000 entities, the branch-and-bound algorithm can solve it within seconds. Although the current implementation is limited to low-dimensional settings with two unobserved groups, the framework holds promise for extension to high-dimensional settings involving multiple groups.