In “The Basic AI Drives” Steve Omohundro has argued there is scope for predicting the goals of post-singularity entities able to modify their own software and hardware to improve their intellects. For example, systems that can alter their software or physical structure would have an incentive to make modifications that would help them achieve their goals more effectively as have humans have done over historical time. A concomitant of this, he argues, is that such beings would want to ensure that such improvements do not threaten their current goals:
So how can it ensure that future self-modifications will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modifications are unlikely to preserve them. Systems will therefore be motivated to reflect on their goals and to make them explicit (Omohundro 2008).
I think this assumption of ethical self-transparency is interestingly problematic. Here’s why:
Omohundro requires that there could be internal systems states of post-singularity AI’s whose value content could be legible for the system’s internal probes. Obviously, this assumes that the properties of a piece of hardware or software can determine the content of the system states that it orchestrates independently of the external environment in which the system is located. This property of non-environmental determination is known as “local supervenience” in the philosophy of mind literature. If local supervenience for value-content fails, any inner state could signify different values in different environments. “Clamping” machine states to current values would entail restrictions on the situations in which the system could operate as well as on possible self-modifications.
Local supervenience might well not hold for system values. But let’s assume that it does. The problem for Omohundro is that the relevant inner determining properties are liable to be holistic. The intrinsic shape or colour of an icon representing a station on a metro map is arbitrary. There is nothing about a circle or a squire or the colour blue that signifies “station”. It is only the conformity between the relations between the icons and the stations in metro system it represents which does this (Churchland’s 2012 account of the meaning of prototype vectors in neural networks utilizes this analogy).
The moral of this is that once we disregard system-environment relations, the only properties liable to anchor the content of a system state are its relations to other states of the system. Thus the meaning of an internal state s under some configuration of the system must depend on some inner context (like a cortical map) where s is related to lots of other states of a similar kind (Fodor and Lepore 1992).
But relationships between states of the self-modifying AI systems are assumed to be extremely plastic because each system will have an excellent model of its own hardware and software and the power to modify them (call this “hyperplasticicity”). If these relationships are modifiable then any given state could exist in alternative configurations. These states might function like homonyms within or between languages, having very different meanings in different contexts.
Suppose that some hyperplastic AI needs to ensure a state in one of its its value circuits, s, retains the value it has under the machine’s current configuration: v. To do this it must avoid altering itself in ways that would lead to s being in an inner context in which it meant some other value (v*) or no value at all. It must clamp itself to those contexts to avoid s assuming v** or v***, etc.
To achieve clamping, though, it needs to select possible configurations of itself in which s is paired with a context c that preserves its meaning.
The problem for the AI is that all [s + c] pairings are yet more internal systems states and any system state might assume different meanings in different contexts. To ensure that s means v* in context c it needs to do to have done to some [s + c] what it had been attempting with s – restrict itself to the supplementary contexts in which [s + c] leads to s having v* as a value and not something else.
Now, a hyperplastic machine will always be in a position to modify any configuration that it finds itself in (for good or ill). So this problem will be replicated for any combination of states [s + c . . . + . . ..] that the machine could assume within its configuration space. Each of these states will have to be repeatable in yet other contexts, etc. Since a concatenation of system states is a system state to which the principle of contextual variability applies recursively, there is no final system state for which this issue does not arise.
Clamping any arbitrary s requires that we have already clamped some undefined set of contexts for s and this condition applies inductively for all system states. So when Omohundro envisages a machine scanning its internal states to explicate their values he seems to be proposing an infinite task has already completed by a being with vast but presumably still finite computational resource.
Block, Ned (1986). Advertisement for a semantics for psychology. Midwest Studies in Philosophy 10 (1):615-78.
Churchland, Paul. 2012. Plato’s Camera: How the Physical Brain Captures a Landscape of Abstract Universals. MIT Press (MA).
Omohundro, S. M. 2008. “The basic AI drives”. Frontiers in Artificial Intelligence and applications, 171, 483.