ALI-Tax-Analyzer: Model Validation

Model Validation Summary

The validation process employed here is a type of unguided differential fuzzing in which the same sample of filing units is processed by the ALI-Tax-Analyzer and by an Excel PIT model that was recently developed for the Albania government to conduct tax expenditure analysis. The resulting tax liabilities for each individual are then compared to see if they are the same. Non-small differences between the liability amounts are investigated. Investigating such differences can expose bugs in either or both models, so this is a useful model validation process for both models.

In this validation work we use an Excel model (called PIT_model_v4.1.xlsx) that simulates current-law policy for a sample of 9196 individuals who filed an annual declaration for tax year 2018. And we use the exact same sample of individuals as input into the ALI-Tax-Analyzer. The data preparation involves exporting the 'input data' sheet of that Excel model as an CSV file and converting that CSV file into an ALI-Tax-Analyzer input file using the prep_diva18.py script.

When comparing the annual PIT liability for an individual generated by the two models, we define as small any difference that is less than or equal to four lekë in absolute value. All non-small differences among the 9196 individuals were investigated.

The initial comparison of the tax liabilities from each model for the 9196 individuals revealed 7845 non-small differences, which represents about 85 percent of the sample. Most of these individuals had differences in the tax on employment income, while only 127 individuals had a non-small difference in the tax on non-employment income.

After considerable investigation it was found that the Excel model had two flaws that prevented it from accurately simulating 2018 PIT policy.

The first flaw was that the sheet that computes taxes under current-law (or 'base') policy had inadvertently omitted a column of formulas that should have copied capital gains on owned property (Box 7 on the annual declaration form) from the 'input data' sheet to the 'base case' sheet. It turns out that 127 individuals in the sample had such capital gains income, and therefore, the Excel model produced non-employment taxes for those 127 individuals that were too small. The amount of missing income was about 0.65 billion lekë, which is a small faction of the total gross income for this sample (64.80 billion lekë).

The second flaw was more serious: there was an error in the formula used to compute the tax on employment income. The formula inadvertently caused the marginal tax rate in the top bracket to be 36 percent instead of 23 percent. The flawed formula did not stop using the 13% tax rate in the 23% bracket, and hence, the top rate was 13% plus 23% (not simply 23% as stated in the law and as the Excel model makes clear was the model's intention).

In order to compare the results of the two models, we specified a "reform" that emulates these two Excel flaws. This "reform" has two provisions: (1) exclude capital gains on owned property from gross income, and (2) raise the top bracket tax rate on employment income from 23% to 36%. Such a "reform" emulates the known problems with the Excel model.

Under this "reform" the two models produce the same results: there are no non-small differences among the 9196 individuals and the aggregate PIT revenue estimates from the two models are the same: 15.637 billion lekë. When not emulating these two known flaws in the Excel model, the ALI-Tax-Analyzer produces an aggregate PIT revenue estimate of 11.597 billion lekë. So, the two flaws in the Excel spreadsheet caused aggregate PIT revenue to be overestimated by 4.040 billion lekë, which is an overestimate of almost 35 percent.

Note that the preparation of the 2018 validation data and the comparison of results generated by the two models is conducted on the diva18 branch of the source code repository.