The term intentional stance was coined by philosopher Daniel Dennett to view an entity’s behaviour in terms of mental properties. In philosophical terms, an intentional stance involves figuring out an agent’s belief, given its place in the world and its overall purpose. A rational agent always acts to further its goals towards its beliefs. This also means that one can predict what that agent would do if an agent’s desires and beliefs are known.
The same principle can be applied to machine learning. Machine learning systems are usually trained to optimise a loss or reward function. In turn, these objectives induce incentives for events that would further help in approaching the objective function. For example, the reward function in the ATARI game Pong results in an incentive to move the paddle towards the ball or the loss in the variational autoencoder results in an incentive for forming relevant high-level abstractions.
These incentives are dependent not only on the objectives but also on the environment. For example, an event that contributes to an objective in one environment can lead to failure in another environment or render useless in a completely different situation/environment. It is safe to say that incentives can be unpredictable at times.
That is why it is important to describe agent objectives and how it interacts with its environment to correct agent’s incentives. To this end, researchers often utilise causal influence diagrams (CID), a graphical model that has a special decision, utility, and chance nodes. In CID, all arrows encode causal relationships and provide a flexible and precise tool for describing both agent objectives and agent-environment interactions simultaneously.
Causal influence diagrams
Causal influence diagrams consist of a directed acyclic graph over a finite set of node square decision nodes that represent agent decisions and diamond utility nodes representing the agent’s optimisation objective. It is similar to Bayesian networks and causal graphs.
The diagram below is an example of CID for a one-step Markov decision process (MDP). Here S1 is a random variable representing the state at time 1; A1 is agent action; S2 is the state at time 2; R2 is the reward. Action A1 is modelled with a decision node and R2 is modelled as a utility node. The two states S1 and S2 are normal chance nodes.
S1 and A1 influence S2, as demonstrated by a causal link in the diagram; S2 determines R2.
Overall, CID specifies agent decisions, agent objectives, causal relationships in the environment, and agent information constraints. These pieces of information are important in determining agent’s incentives. Achieving an objective depends on its causal relation to other aspects in the environment. On the other hand, an agent’s optimisation is restricted by what information it has access to. Generally, the qualitative judgements by CID is enough to determine important aspects of incentives. It’s almost impossible to infer incentives with less information than is expressed by a CID, making CIDs natural representations for different types of incentive analysis. Since CID is built on well-researched topics such as causality and influence diagrams, it allows researchers to leverage the deep thinking already done in these fields.
Limitations of CID
- The graphical definitions can overestimate the presence of observation or intervention incentive. This happens because not all probability distributions induce an incentive just because the graph permits it.
- The CID must be known for the methods to be applicable.
- CIDs and graphical models, on the whole, are not ideal for modelling structural changes such as cases where the outcome of a previous node determines a part of the graph.
- CIDs assume that agents follow causal decision theory as no information flows back from the decision nodes. However, not all agents reason this way causally and it is possible that another theory of incentives could be developed for agents that reason in non-causal ways.