Even though compliance issues are central to taxation policies in emerging economies, convincing empirical research on tax compliance has been scarce. Through the three chapters of my dissertation, I bridge this gap by using detailed value-added tax (VAT) micro-data from Delhi, India.
A key advantage of VAT type systems is that they allow for corroboration of transactions using returns of interacting firms. In chapter 1, co-author Aprajit Mahajan and I evaluate the effect of a technology reform that improved the Delhi tax authority’s ability to cross-check buyer reports against seller reports within the VAT system. Before the technology change, such cross-checks could only be accomplished by auditing both parties, a relatively rare and time-consuming activity. After the policy change, the tax authority could (and did) relatively easily cross-check information, declared by buyers with the corresponding information from sellers, directly on its own servers without initiating an audit. We use a difference-in-difference approach to show that the policy had a large and significant effect on wholesalers relative to retailers. A wholesaler is more likely to sell to registered firms whereas a retailer is more likely to sell to final customers where the paper trail breaks down. Therefore, on the output side, the self-enforcing mechanism of the VAT is more likely to break down for retailers compared to wholesalers. We also find significant heterogeneity with almost the entire increase being driven by changes in the behavior of the largest tax-paying firms. This result sheds light on limits of third-party verification in a context with limited audit resources and where the majority of firms do not remit any tax.
In low compliance environments, a common strategy to manipulate the third-party verification system is to establish fraudulent (“bogus”) firms. Bogus firms help genuine firms in reducing their tax burden by issuing fake receipts. A tax authority determines the existence of bogus firms by first filtering down based on a few preliminary indicators, and then undertaking physical inspections. Given the authority’s limited resources, these inspections are only done sporadically. A key challenge in improving tax compliance then is to regularly, cheaply and reliably identify such bogus firms. In chapter 2, coauthors Aprajit Mahajan, Ofir Reich and I apply a machine learning classifier to the same tax dataset to identify bogus firms which can be further targeted for physical inspections. We face a nonstandard applied machine learning scenario. First, one-sided labels: firms that are not caught as bogus are of unknown class: bogus or legitimate, and we need to not only use them to train the classifier but also make predictions on them. Second, multiple time-periods: each firm files several periodic VAT returns but its class is fixed so prediction needs to be made at the firm, not firm-period, level. Third, point in time simulation: we estimate the revenue saving potential of our model by simulating the implementation of our system in the past. We do this by rolling back the data to the state of knowledge at a specific time and calculating the revenue impact of acting on our model’s recommendations and catching the bogus firms and estimate US$40 million in recovered revenue.
Tax authorities commonly apply size based regulations to firms. If firms are concerned about compliance costs, then such regulations create adverse incentives for firms to stay small. These regulations also increase the monitoring effort needed from tax officials. In the first two years of our dataset, the Delhi tax system had multiple turnover based filing frequency thresholds. Firms with declared turnover (in the previous year) less than Rs. 1 million had to file returns annually, between Rs. 1 and 5 million - semiannually, between Rs. 5 and 50 million - quarterly, and more than Rs. 50 million - monthly. In the years 3, 4 and 5 of our dataset, this turnover based filing policy was first weakened and then completely disbanded. In chapter 3, coauthor Jan Luksic and I first show that this policy resulted in bunching of firms below the thresholds at all levels. Using the change in these reporting policies, we provide further evidence that such sharp bunching indeed occurs due to the VAT reporting frequency thresholds. Second, we calculate the VAT revenue losses due to such bunching and document the longer-term impact of the VAT reporting frequency thresholds. Finally, the subsequent withdrawal of the policy allows us to show that in a regime with size-dependent reporting requirements, more frequent reporting does not lead to greater levels of VAT collection.