I’m a little late getting to this paper, seeing as it came out a couple years ago, but I figured it still relevant given the rapid acceleration of the decline of the west. New ideas are desparately needed, and perhaps this paper can yield some fruitful discussion that can, at the very least, slow down the decline of the west and unlock some hidden productivity gains.
In 2021 a paper dropped, from the folks over at Salesforce, that applied deep reinforcement learning to designing economic policy. More specifically, they looked at both simple and complex economies comprised of rational actors as well as a central planner tasked with setting tax policy. The overarching goals being to simultaneously maximize economic productivity and social mobility. The idea being that limited social mobility reduces quality of life, and it faces a trade off with maximum productivity.
Simulations in a realistic yet reduced economy reveal some interesting behaviors. In particular, reinforcement learning agents learn to specialize in activities that align with their natural talents, as well as to develop strategies to game the tax system. This is a positive sign, as it’s precisely what we see in the real world, and it shows that perhaps there is some merit to using such simulations when considering what the central planners ought to do.
Economics is a big departure from typical applications, but one that may give us insight into what our future robot overlords may be like. Will they bring about a dystopian nightmare or will the ice cold rationality of the AI mind lead us into an economic utopia?
How It Works
The Economy
Two different economies are modeled. The first is a simple labor based economy in which each agent has some inherent skill level that determines how much they can earn. This is effectively their hourly rate, and combined with the number of hours worked determines their income. In this scenario, agents can choose how much to work, and thus control their income. They are penalized with an income tax, determined the social planner, where the total income tax collected the tax man is evenly redistributed to all the agents in the simulation.
Of course, this is a simplistic test case but it does show that the AI economist can recover analytic solutions from classical economic theory. This is shown in figure 3 from the paper.
Here we can see two important facets of the results. First, the AI economist is able to match the inverse income weighted utility (IIWU) of the Saez (best known analytical solution) formula. Since I had to look up what the IIWU is, I’ll summarize it here: it’s the difference between how much the agent earned and how much effort it had to exert to get that income. The idea being that we want to incentivize highly productive labor; high IIWU means the agents performed highly productive labor.
In panel B, we see something that probably comes off as shocking to most of you. Optimal tax rates penalize low earners. In the United States, as is usual, we do precisely the opposite of what is optimal. We penalize the higher earners, with a “progressive” (rather, regressive) tax schedule.
The core idea here is that we want to penalize inefficient workers so that they can become more efficient. Each agent will work to minimize their tax burden, and will thus strive to either work more hours or become more productive to maximize their utility.
Now, this is only a simple economy that ignores the overwhelming majority of real world variables that impact real economic outcomes. Take this with a heaping grain of salt. Perhaps an entire salt mine.
Since the simple one step economy is well, so simple, the authors don’t stop here; they go on to develop a more intricate economy that is based around both production and labor. This new environment is called Gather-Trade-Build.
Gather-Trade-Build is a two dimensional grid world where agents have more options than simply working. They can move around their grid world, gather resources (either stone or wood) trade amongst themselves using a fiat currency, or engage in building a house (which produces income).
The world itself can be laid out in a couple different configurations. It can be set up to have four quadrants, each with a different distribution of agents and resources, or it can be comprised of two completely sealed off halves. These design choices have significant impacts on post tax income equality across the various taxation scenarios. For our discussion, we’re going to focus on the outcomes in the four quadrant scenario.
The Outcome
First, let’s take a look at how the agents fared during the free market simulations.
Figure 2 from the paper shows the results over time for four different agents, each given different gifts in life. Maybe the agent with build skill 11 is really handsome, and build skill 22 is really ugly. I don’t know, but I’m hoping that at least the lowest skilled agent didn’t get shafted with a totally rotten hand in life.
In any event, we see precisely what one would expect in a perfect free market. The most skilled agent realizes it can grind and get that paper, while reaping huge rewards. Over time it accummulates the most coin and utility. It follows a strategy of building and then trading with other agents to acquire the resources it needs to build more; it doesn’t waste time gathering its own resources.
If you check out the flow chart on the right panel, you can see how resources flowed between agents. What I haven’t shown is the initial configuration for the environment, where we see that the orange agent spawns in a quadrant with only stone, and the purple agent spawns in a quadrant with only wood. Hence they both specialize in both these materials; this isn’t some emergent specialization, it’s just a consequence of geography.
All this goes to show that the economy and agents act as one would expect in a totally free market. Agents learn to specialize, and those agents with the most talent rise to the top.
So what happens when we introduce a financial planner and some tax strategies into the mix? Let’s take a look at figure 5 from the paper.
Here we can see the comparison of tax policies in the 4 agent open quadrant grid world. What jumps out right away are the tax brackets, shown in panel a. The AI economist penalizes the lowest earning workers themost. Intuititively, this is to provide a strong incentive to reach the middle income brackets where the tax burdern drops significantly. This isn’t a monotonic relationship between income and tax rate, however. There is some strange up and down behavior as we go from low to middle income. I think this is just machine logic where it’s optimizing for some objective, without regard for how a human may interpret the strategy.
In panel b we can see how the income distribution measures up. This chart can be a little confusing, but the idea here is that we want to focus on the area under the curve. This gives us an idea of the expectation value of income for an agent. We can see that the middle class has the largest area under the curve, meaning that the middle class is doing really well. Of course, there is still some poverty, though the area under those curves is quite small, meaning that the minimum number of agents live in it. This just goes to show that even in the “best” systems, the poor will always be among us. It’s unfortunate, but it’s a reality in any world in which there is a distribution of skill among economic actors.
Looking at c and d, we again see what we would expect. In all systems, the least skilled workers earn the least, while the highest skilled earn the most. Consequently, they end up paying the lion’s share of the taxes and tranferring their wealth to the less rich agents.
On the bottom we see how training time affects results. Interestingly, more training results in larger penalities to the poorest agents and some inversions in “tax brackets” towards the lower end of the spectrum. Income frequencies don’t change huge amounts, however.
When looking at wealth transfer, we see that more training results in a greater wealth transfer across the board, and incomes don’t vary much as a function of build skill and training.
Next we can take a look at some complex emergent phenomena that we wouldn’t otherwise see in a simplified simulation.
Let’s start with specialization. Looking at the two left most panels, we see that the lowest skilled agents derive almost no income from building, and that’s consistent across tax strategies. Conversely, the highest skilled workers derive huge incomes from building and lose income from trading. Thus, workers learn to specialize in behaviors that are consistent with their talents.
Interaction effects are quite fascinating. Here the authors looked at how the income for the two lowest skilled agents changes when the middle tax bracket is varied. There should be no dependence, according to classical thinking, because the lowest skilled workers aren’t directly affected this bracket. However, we see a few things that jump out immediately.
First of all, the free market is best for the lowest skilled workers. This is because they have no tax, so whatever they earn, they keep. They have both a higher floor (outlier not-withstanding) and higher ceiling to their earnings potential. The AI economist can occasionally produce reasonably high incomes for its least fortunate agents, but under performs the free market.
Second, we do see a dependence of the income for the poorest workers on the tax rates of the second highest bracket. The more we penalize the upper middle class, the less income the poor are able to earn. This is because the highest earners have less income available to trade, driving down both the frequency and average price per trade. IT would appear that the much maligned “job creator myth” wasn’t such a myth after all.
Finally, we observe some clear evidence of tax gaming strategies. In the top right, we can see that the agents learn to stagger their earnings across years to minimize tax burdern where appropriate. The overall effect (bottom) is a significant underpaying of taxes (evidenced the points under the dashed curve).
Conclusions
What can we learn from all this?
First, AI can be used to generate novel insights into how we should (perhaps) run our economy. Current tax schedules in the west result in sub optimal results, when trying to optimize for equality, relative to using some more creative solutions.
Second, worse than being sub optimal, our system (that is to say, progressive tax schemes) are probably counter productive. It doesn’t take a genius to see that we have a problem with systemic multi generational poverty, and this is generally a result of people reacting to the incentives around them.
Third, some people aren’t going to like our future robot overlords. The poor are not going to like being whipped into submission high tax rates. But hey, get good.
My own comments on the short comings of this research are the following:
- It fails to capture the effects of monetary policy (read: inflation)
- Worker skills in the real world aren’t fixed
- Tribal preferences among workers
And of course, there are many more criticisms to be raised. However, this isn’t a short coming of the work. Rather, these are issues for further studies and I’m curious how the AI would deal with these problems.