## Actions of Monoids and Automata

In theoretical computer science there are a lot of related concepts of automata, each with a slight variation in their definition. Most of them can be unified to one conceptual framework using monoid actions:

## Actions of Monoids and Transition Functions

Given a category \(\mathcal{C}\), an object \(X\) in \(\mathcal{C}\) and a monoid \(M\), an action of \(M\) on \(X\) is a monoid homomorphism \(\phi: M \to \text{End}_\mathcal{C}(X)\) with reverse function composition. In other words, each element \(m \in M\) determines a map \(\phi(m) : X \to X\) in \(\mathcal{C}\). Allowing \(\mathcal{C}\) to vary will allow us to vary the type of transition functions we allow.

If we slightly tilt our head, we can view this as a transition function on a state space \(X\), given inputs in \(M\). Requiring the action to be a monoid homomorphism gives us a really handy property: When we have two consecutive inputs, we can concatenate the two transition functions and arrive at the same transition we get from feeding the entire input into the action. In formulas:

\[\forall m, m' \in M. \phi(m \cdot m') = \phi (m') \circ \phi (m)\]

An important special case arises when we restrict the monoid to be the free monoid \(\Sigma^*\) generated by an alphabet or character set \(\Sigma\). That way we can derive the simplest form of automaton: If we consider an action of the free monoid \(\Sigma^*\) for an alphabet \(\Sigma\) in \(\mathcal{Set}\) on a set \(X\), we recover the type of deterministic transition functions (modulo currying), \(\Sigma^* \times X \to X.\) If \(X\) is finite, these are transition functions for finite deterministic automata.

But why monoids in the first place? Considering monoids and monoid homomorphisms, we can capture the essence of non reversible dynamic processes; a lot of computation will fall into this category so this might be the most interesting case for computer science. But of course this is amenable to generalization: By using group actions on the automorphism group the resulting processes must be reversible. Topological group actions are useful in algebratic topology and Lie group actions find various applications in physics (e.g. general relativity).

## Effectful Transition Functions

There are a lot of automatons out there that have effectful transition functions, e.g. non-deterministic automata, probabilistic automata, stack automata and mealy machines. If you are familiar with Haskell, you probably know that most effects can be modeled using monads. A monad \(T: \mathcal{C} \to \mathcal{C}\) gives rise to a Kleisli category \(\mathcal{C}_T\) in which the objects are the objects of \(\mathcal{C}\) and the arrows \(X \to Y\) correspond to arrows \(X \to TY\) in \(\mathcal{C}\); composition of these arrows is called Kleisli composition, as modelled by `>=>`

in Haskell, and corresponds to executing the actions encapsulated by \(T\) in sequence.

We can define a monad \(N : \mathcal{Set} \to \mathcal{Set}\) to model non deterministic computation by

\[ N(X) = \mathcal{P}(X)\\ N(f)(S) = \bigcup_{x \in S} f(x)\\ \eta_N(x) = \{ x \}\\ \mu_N(x) = \bigcup x \]

Monoid actions \(M \to \text{End}_{\mathcal{Set}_N}(X)\) in the Kleisli category for \(N\) exactly model non deterministic transition functions. In Haskell, one could use the list `[]`

monad or one of the monads in `logict`

to achieve a similar result.

A similar feat can be accomplished using a probabilistic monad and yields something akin to a Markov model of some sort. Also stack machines and Turing machines can be recovered with a refined state monad that carries a stack or a tape respectively. Using a writer monad we get a mealy machine.

This allows for a nice intuition: We can choose a basis category \(\mathcal{C}\) to denote the type of functions we consider as computable (every function, primitive recursive functions, functions on finite sets etc.) and add extra features to that category by considering the Kleisli category on \(\mathcal{C}\) for a particular monad.

## Start States

Most formulations of automata include a starting state and a set of substates. We can generalize this notion to fit into our framework. Throughout this section, let \(\mathcal{C}\) be a category with a terminal object \(1\), \(X\) an object in \(\mathcal{C}\) and \(M\) a monoid. Note that when a category \(\mathcal{C}\) has a terminal object, its Kleisli category might not neccessarily (is this true? E.g. for `IO`

in Hask, we have several inhabitants of `A -> IO ()`

).

In usual automata theory, a starting state \(q_0\) is an arbitrary element \(q_0 \in Q\), where \(Q\) is the set of states. To model this generally, first observe that this is equivalent to a map \(\phi_0: 1 \to Q\) in \(\mathcal{Set}\). Generalizing, a starting state is just a map \(1 \to X\) in \(\mathcal{C}\).

Note that this allows us to choose a starting state with an effect in a Kleisli category: Using a probabilistic monad this formalizes the notion of a starting distribution, a monad that models non determinism lets us begin at multiple states at once, etc.

## Accept States

Accept states usually are a subset \(F \subseteq Q\) of the state set. Subsets in a more general setting are called subobjects and are modelled by classes of monomorphisms: In a category \(\mathcal{C}\) a subobject is a monomorphism \(A \to T\) up to isomorphism in the slice category \(\mathcal{C}/T\). In the special case of \(\mathcal{Set}\), these subobject maps are the canonical inclusions for subsets. In our setting, we consider subobjects \(F \to X\) of the state object \(X\).

When \(\mathcal{C}\) is an (elementary) topos, we get the characteristic map \(X \to \Omega\) of the subobject \(F \to X\). In \(\mathcal{Set}\) this would be a function \(X \to 2\) which checks whether a state \(x \in X\) is in \(F\). This captures the computational meaning of checking for an accepting state, so it might be worthwhile to use this notion even when not in a topos by the definition: Accept states are a map \(X \to 2\).

This definition, again, allows us to incorporate effects: In a probabilistic setting, a map \(X \to 2\) is a Bernoulli distribution for each state in \(X\) that determines whether a process ends at that point or doesn’t. This might provide a worthwhile generalization of the geometric distribution which then models the special case when there is only one state in \(X\). In the case of stack machines, we can also check the state of the stack, like in the usual requirement that the stack must be empty in order for the process to complete. Unfortunately, it doesn’t seem to make much sense with non-determinism and writers.

## Conclusion

There are multiple questions I would like to defer at the moment, but that might be interesting to investigate: How much structure is required in a category in order to compose a transition map, an initial state and accept states into one well-behaved function \(M \to 2\) that “runs” the automaton? Maybe CCC? Do automata in this formulation form some kind of algebraic structure (like categories) on their own? Is there a compositional theory? How does Krohn-Rhodes theory relate to this formulation?

We have seen that using a bit of category theory it is possible to bring together various types of automata and machines and build a conceptual framework for all sorts of computation. This might be useful from a theoretical viewpoint (theorems for large classes of automata) as well as a very practical one in languages like Haskell that embrace category theory quite thoroughly.