Abstract: |
When the current demand shock is observable, with a high discount factor,
Q-learning agents predominantly learn to implement symmetric rigid pricing,
i.e., they charge constant prices across demand states. Under this pricing
pattern, supra-competitive profits can still be obtained and are sustained
through collusive strategies that effectively punish deviations. This shows
that Q-learning agents can successfully overcome the stronger incentives to
deviate during the positive demand shocks, and consequently algorithmic
collusion persists under observed demand shocks. In contrast, with a medium
discount factor, Q-learning agents learn that maintaining high prices during
the positive demand shocks is not incentive compatible and instead proactively
charge lower prices to decrease the temptation for deviating, while
maintaining relatively high prices during the negative demand shocks. As a
result, the countercyclical pricing pattern becomes predominant, aligning with
the theoretical prediction of Rotemberg and Saloner (1986). These findings
highlight how Q-learning algorithms can both adapt pricing strategies and
develop tacit collusion in response to complex market conditions. |