Q&A about the Malaysia Personal Income Tax Microsimulation Model
Model overview is available on the model home page.
Questions posed in 2020-11-29 MOF memo and answered below:
- How to handle missing PCB returns?
- How to handle missing data on B/BE returns?
- How to identify late B/BE returns?
- Can model use either all data or sample data?
- What projections can be used for growth factors?
- What kinds of reforms can the model simulate?
- Can a decrease in a gift/relief cap be simulated?
- Can an increase in a gift/relief cap be simulated?
- What skills do model users need?
- Who is responsible for maintaining the model?
- What skills do model maintainers need?
Question: How to handle missing PCB returns?
The MOF memo states:
The set of data previously submitted to WBG consists only of information on taxpayers who submitted return forms both electronically and manually. Salaried individuals who elect their monthly tax deduction (PCB) as final tax are not required to submit return forms. Therefore, the information on this group is excluded.
Subsequently, MOF staff supplied WBG staff with data available for 2018 PCB returns and information about how employers are to calculate the tax withheld and sent to the government.
The PCB return data contain the 2018 PIT amount withheld by the employer, but no information about the salary or gift or relief amounts the employer used to compute the withheld amount. There are 387,332 of these PCB returns and the aggregate total of withheld taxes is about 0.594 billion ringgit. (This 0.6 billion compares with the 26.1 billion PIT among those filing Form B or BE returns.)
If the MOF wants to include these PCB filers in the microsimulation model, all the missing input variables (required to compute the tax amount in the model) would need to be imputed. One approach to imputation would be to use the variable values for a Form BE return that has the same PIT amount. If there is no exact match, the variable values from the BE return with the closest PIT amount would be used. If there are multiple exact matches, one BE return would be selected at random.
This imputation project should be undertaken only if the MOF judges its benefits to exceed its costs. Besides the obvious cost of completing this work, the imputation approach would introduce errors into the complete data set analogous to the sampling error introduced by using a stratified random sample of all the returns. The benefits would be that the model results would be comprehensive rather than being for just Form B and Form BE filers.
Given the modest relative size of taxes contributed by the PCB returns, it would be prudent to gain some experience using the initial version of the model before deciding whether or not to undertake this imputation project.
Question: How to handle missing data on B/BE returns?
The model calculates all computed variables (for example, business income net of prior-year losses) from the basic variables provided on the form (in the net business income example, gross business income and prior-year losses). So, the only missing data relevant to the model are basic variables.
We have identified four sets of variables where missing basic
variables could reasonably be imputed to correct missing basic
variables: businc_net
(the example used above), agginc
,
gift_total
, and relief_total
. In each of those four cases, it
would be possible to impute the value of a zero basic variable in
cases where all of the basic variables are zero and the computed
variable is positive. So, for example, if on a Form BE return,
empinc
, rentinc
, and intinc
are all zero, but agginc
is
positive, it would be possible to assign the positive agginc
value
to one of its component elements, most likely the empinc
variable.
But this kind of data inconsistency occurs in only a very small number of returns and the money amounts of the discrepancies are very small. The details of tabulations showing this are presented in the missing data section of the data-preparation document. Our judgment is fixing these few discrepancies is a low priority project.
Question: How to identify late B/BE returns?
We could add a variable to the data input file that indicates whether or not a return was filed late. Did the Form B and Form BE data files provided by the MOF contain such a variable? If so, let us know which variable indicates a late return and we will add it to the model input data file.
Question: Can model use either all data or sample data?
Yes, the model will be able to use either the complete data or a stratified random sample. The small sampling error and the ten-times speed-up when using the random sample are described in the sampling section of the data-preparation document.
Question: What projections can be used for growth factors?
On data sources for growth projections, the memo says:
IRBM suggests using 5-year data to calculate income growth and make future projections. Income growth projection inputs can be obtained from:
- Growth of new work force by the Department of Statistics Malaysia
- Provident fund and social security contribution data from KWSP and Perkeso
These are very helpful suggestions. We have pursued this suggested approach using MOF projections provided by Yew Keat Chong.
Question: What kinds of reforms can the model simulate?
The model can simulate a wide range of tax reforms because more than fifty aspects of PIT policy are characterized by policy parameters that can be changed in one or more years during the simulation period. The simulation output includes the policy parameter values, a set of three standard tables, and dump output containing information on each individual in the simulated sample. These outputs provide basic estimates for each year in the simulation period of the aggregate and distributional effects of a reform, and support any kind of custom tabulation required.
In addition to these reform capabilities, the model can easily be told to use alternative economic projections allowing sensitivity testing of the effects of an economic recession.
Question: Can a decrease in a gift/relief cap be simulated?
Yes, the caps on gifts and relief amounts are policy parameters that can be lowered in a reform by any amount for any combination of gifts and relief amounts.
Our understanding is that all the cap amounts expressed in ringgit are not indexed for inflation (that is, they do not automatically increase from year to year according to some price index). Please let us know if this is a correct understanding of the Malaysia PIT law.
The model has the capability of simulating reforms in which one or more policy parameters are switched in any year from being not indexed to being indexed for inflation.
Question: Can an increase in a gift/relief cap be simulated?
Yes, the policy parameters that represent the cap amount can be increased. But because the variable values for the gift and relief amounts are capped at the current-law maximums, this kind of reform will not reduce the amount of tax owed. So, the problem is not with the model, but with the data used as input to the model.
There are several ways of fixing this kind of data problem and each approach involves making certain assumptions about the distribution of gift/relief amounts above the cap.
Perhaps the easiest way to impute amounts that are capped in the tax
return data is to assume the the distribution of uncapped values can
be used to impute uncapped amounts for the returns that contain capped
amounts. This approach uses the return data to estimate a
right-censored regression model. Then the estimated regression model
is used to impute an uncapped amount for each return on which the
amount is capped. An example of estimating such a model (for the
relief_medexps
variable) is in the medexps.R
script, the output of which is contained in the
medexps.exp
file.
Question: What skills do model users need?
The model has been designed so that it can be used in a variety of ways depending on the skills and preferences of the user. The most important point is that the full capabilities of the model are available to users via the model's command line interface (CLI) in a way that requires no computer programming. More advanced analysis of model output (for example, custom tabulations and graphing) can be done with the user's favorite data analysis tool. There is a list of the various ways to use the model that provides links to more detailed explanations and usage examples.
Question: Who is responsible for maintaining the model?
The initial development has been the responsibility of the WBG. The goal of that initial development work is to provide a PIT microsimulation model that is sufficiently realistic that MOF staff can be trained to use it to conduct useful tax analysis.
If that goal is achieved, the WBG is prepared to train a subset of the trained users so that they have the skills needed to develop the model. By develop, we mean do things like: add new policy parameters and associated tax logic enabling the analysis of novel tax reforms, impute missing and/or capped values for tax variables, upgrade the model when new (post-2018) tax return data become available, and distribute new model versions and data to users.
Once the initial training of users and developers is completed, the WBG hopes that the MOF will take on the responsibility of maintaining the model. But the WBG is committed to long-term support of both model users and developers if unexpected questions or problems arise after the initial training period.
Question: What skills do model maintainers need?
The tax model has been developed using the open-source Tax Analyzer Framework, which enables the development of tax microsimulation models with minimal coding. Development of a model involves specifying parameters in more than a dozen JSON files and preparing tax-filing-unit data in three CSV files. The logic of tax calculation is written in code that is very simple and would be easy to write for anyone with experience in any programming language.
Given the use of the Tax Analyzer Framework, the Malaysia PIT model can be maintained by people experienced with any procedural programming language. (Developers with some familiarity with Python would be ideal, but not required.) For safe and efficient model development, developers would also need to know how to use a version control system and a project Makefile. All these things would be included in the WBG training of the developers.
The most productive computer environments in which to conduct model development are the Linux and Mac OS X operating systems. Windows could be used, but it is not well suited for this kind of development work as has been described in this document.