Author : Elan Segarra
Publisher :
ISBN 13 :
Total Pages : 0 pages
Book Rating : 4.:/5 (13 download)
Book Synopsis Essays on the Econometrics of Data Quality by : Elan Segarra
Download or read book Essays on the Econometrics of Data Quality written by Elan Segarra and published by . This book was released on 2021 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation consists of three essays which explore scenarios where data quality issues interfere with the goals of empirical research. These situations motivate closer analysis of existing econometric methods and even the provision of new methods to account for the deficiencies present in the data. In all three cases the work presented aims to provide clarity and advice to aid researchers so they may accomplish their primary objective while simultaneously managing the shortcomings in their data. In the first chapter I consider survival analysis when durations are subject to mismeasurement due to record linkage errors that manifest during data collection and processing. Panel data have a long history of use across the social sciences; however, they can be imperfect representations of reality when record linkage methods are employed during their creation. When conducting survival analysis (e.g. firm death, mortality, or emigration), missed linkages induce error in the observed lifetime durations, and thus inconsistency in standard survival estimators. New methods are developed which restore consistency of the estimators of parameters without correcting the linkages. This work makes three distinct theoretical contributions under increasingly relaxed assumptions. First, under the strong assumption of a known independent linkage error process I show that the marginal distribution of time to death is nonparametrically identified from linkage error induced durations. Second, when data on start and end dates are introduced, I show that nonparametric point identification of the joint distribution of lifetimes and linkage error is typically achieved. Third, when no restriction is placed on the dependence structure, I apply partial identification methods to derive sharp informative bounds on the marginal distribution of lifetimes. New estimators and inference methods are introduced across all scenarios and their validity is established formally. The methods are applied to longitudinal business data (where linkage error occurs due to establishment relocation), and show that establishment death rates in the first 3 years can be overestimated by as much as 10 percentage points with naive methods, while those proposed here are able to recover true rates of survival from mis-linked data. The second chapter investigates the estimation of discrete choice models when market size is unobserved or mismeasured. Estimates of elasticities are a common output of interest in discrete choice models, however they can besignificantly biased when the population size is misspecified. In this chapter we decompose the bias in elasticity estimates in the logit model into a direct effect and an indirect effect coming from bias in the structural parameter estimates. Since these effects can go in opposite directions addressing bias from the indirect channel, via market fixed effects, will have an indeterminate effect on the total bias in the elasticity. We provide a complete characterization of when including market fixed effects will mitigate versus exacerbate elasticity bias. Our results reveal that for own characteristic elasticities products with small shares will typically benefit most from market fixed effects while the benefit (or detriment) for cross characteristic elasticities is independent of share. The third chapter explores instrumental variables estimation in the presence of outcome attrition and presents a novel estimator to handle this missingness. Instrumental variables (IV) methods are a ubiquitous tool for estimating causal effects. However, when data are subject to missingness the exclusion restriction can be violated leading to significant bias in IV estimators. This work proposes a new method, termed the missingness instrumental variables (MIV) estimator, to recover causal effects in the presence of outcome attrition. The method leverages statistical independences to replace the infeasible moments of the IV estimator with moments that can be estimated using data subject to missingness. Just like IV methods with complete data, MIV is able to estimate many causal effects of interest including average treatment effects, local average treatment effects, and marginal treatment effects. The method is compared with inverse probability weighting methods and multiple imputation methods, and Monte Carlo simulations highlight how MIV fares better than alternative methods when positivity is violated or under misspecification of error distributions.