Using Data Analysis Software to Conduct Custom Analysis
This document describes using procedural programming to conduct custom analysis. A declarative programming approach uses SQL to construct custom tables.
The GUI and CLI provide basic tables and several kinds of graphs that show the effects of a reform, so it can meet the needs of most tax reform analysis projects.
However, sometimes there is a need to produce a table or graph that is specific to the reform being analyzed. The tax model itself has no way of knowing in advance what kind of custom table or graph is needed for a particular reform, so its approach is to provide the basic information required for any custom analysis and let users perform custom analysis using the software of their choice.
The model provides this basic information by writing out a CSV-formatted dump output file containing, for each tax filing unit in the sample, the input variables and the variables calculated by the model under the specified reform. Two such output files, typically one for current-law policy and the other for some reform policy, can support any kind of custom analysis of the effects of a reform.
The key to this approach to using the tax model is having experience
with some kind of data analysis software. Python and R are among the
most popular open-source data analysis environments, while Stata and
SAS are among the most popular proprietary tools. If you don't had
any prior experience with this kind of software and you want to do
custom tax analysis, the most sensible approach is to learn how to use
Python and its matplotlib
graphing package, which are already
installed on your computer. And there is code in the
Tax-Analyzer-Framework analyzer.py
and utils.py
modules
that can help you get started making a time-series graph or a
cross-section graph
We illustrate this approach using Python, but users of other data analysis tools will see immediately how to conduct this kind of custom tax analysis in the software of their choice (e.g., R, Stata, SAS, etc.).
The basic approach is to use the GUI or CLI to produce a CSV-formatted dump output file for each of two tax policies, and then to use the data analysis software to produce the custom tables and graphs needed to understand the effects of moving from the first policy to the second policy.
Producing the dump output files could be done by using the GUI or by using the CLI at the command line. A second approach is to create the dump output files by calling the CLI as a preliminary step in the custom data analysis program. The advantage of the second approach is that all the information about the nature of the two policies being compared are contained in one place. Either approach is fine; we illustrate the second all-in-one-place approach below so that users can decide how they want to work.
Examples of doing this with a dump output file produced by the Mayalsia PIT model, which is called MYI-Tax-Analyzer, can be seen in this documentation.