Subscribe to our mailing list to receive updates about forthcoming issues. JCMS will not disclose your details to any third party.
Volume 1, Issue 1, September 2016
artificial neural networks, DeepDream, phenoumena, sound, transcendental
We propose a framework for sonic creativity via computational methods and artificial intelligence research. We extend Norman Sieroka’s theory of sound from a 3-fold to a 4-fold hierarchy so that sound now becomes characterised in terms of the acoustic, physiological, speculative and phenomenal layers. We describe how manipulations in the proposed speculative layer can directly act on how one orients the very apprehension of sound. Black box algorithms, in particular deep neural networks as instantiated by Google’s DeepDream, are then discussed as an illustrative example that can be used to manipulate the speculative layer. We caution that DeepDream offers a warning of how easily our tools capture our phenomenal apprehensions, potentially obfuscating what is just beyond our perception in the process.
Perhaps every sound technology in human history contains within it some model or script for hearing...
Given a specific way of experiencing sound, there is a certain manner in which sound is apprehended. Or rather, there is a certain way that the listener orientates towards what is audible, and there are a number of ways of changing that orientation. Our focus is on the use of contemporary technology and computational research to change this orientation, and potentially reveal that which was not previously a part of the listener’s imagination.
Philosopher Norman Sieroka splits sound into what he dubs three manifestations: (i) acoustic (unprocessed), (ii) physiological (processed) and (iii) phenomenal (perceived) (Sieroka, 2015, p. 124). The acoustic is defined as physical and mathematical manifestations: pressure waves with a specific amplitude and frequency profile. The physiological is defined as the interfacing components that constitute an auditory apparatus (e.g., ear, brain). Finally, what is heard or presented during auditory experience – often investigated through phenomenology – is the phenomenal. We do not hear unprocessed soundwaves, nor neurophysiological activity: these are “concepts from physics and neurophysiology” in Sieroka’s view.
Our focus is on the second part of this sequence (the physiological). It is an oversimplification to define this processing step purely in terms of our own physiology. Thus, we will describe this level as a model level in order to facilitate a more generalised description. However, it is important to note that we are not trying to define a neo-functionalism nor overplay the importance of multiple realisability (i.e., that any function can be run on any system). Functionalism goes too far: if function is what is solely emphasised then one assumes that there is a trivial coupling between a function and its substrate (Buechner, 2008). For example, the mapping of mind to brain can easily collapse into either a ramified caricature of Cartesian dualism (mental irreducibility) or a physicalist reductionism, where there is an isomorphic relationship between mental states and their constitutive substrate. Our goal is to perform neither of these theoretical collapses while still being sensitive to both sets of concerns, as well as other similar concerns. We do not wish to homogenise the complexity of this topic, but it will not be discussed in detail here.
In order to initiate a framework that allows for the suggested sensitivity, we turn to Immanuel Kant’s conception of the transcendental. Sieroka’s three distinctions of sound can map onto the Kantian distinctions between object (acoustic), transcendental (model) and subject (phenomenal).(1) Our focus here is on the transcendental level.
Theoretical physicist and philosopher Gabriel Catren defines Kant’s notion of the transcendental as the structuration of a particular intentional (i.e., subject-to-object) relation. The resulting structure is conditioned such that the subject-pole delimits the possible relations to the object-pole, and not vice versa (see Fig. 1). In this case, the object-pole solely means what is aimed at by experience and the subject-pole is the vector that aims at this object-pole (Legrand, 2011). For example: when experiencing a sound the listener becomes the subject-pole, the sound heard by the listener takes the position of the object-pole, and the listener’s transcendental mediates as well as potentiates the resulting relations. The concept of the transcendental in this case enables the articulation of the interplay between a sound and any listener.
However, the transcendental, for Kant, is the conditions for the possibility of experience in its most general sense and is not specific to any particular sensory modality. It constrains experience by specifying a fixed and unchangeable parameter-space within which phenomena can be experienced. Catren names this structuration “transcendental transcendence,” and argues that Kant essentialises and privileges a particular transcendental model or frame (Catren, 2015, p. 67). Out of all possible frames (e.g., insects, animals, aliens), Kant privileges a single universal anthropomorphic schema of the world. To quote Catren:
if we grant that human experience is transcendentally framed, it does not follow that we cannot vary the... transcendental conditions of experience, that we cannot go beyond [Kantian] transcendental transcendence. A critical subject, i.e., a subject capable of performing a critical reflection on its own transcendental structure, can afford the possibility of varying, deforming, or perturbing the very a priori structure of its experience (Catren, 2015, p. 67).
Catren calls the experience of such a deformation a phenoumenodelic experience so as to emphasise that it is enacted at a level that can be differentiated from the strictly phenomenal.(2)
Catren differentiates transcendental transcendence from what he dubs a speculative transcendence by problematising Kant’s essentialisation. Speculative transcendence defines a more plastic notion of the transcendental where each transcendental delimits the possible subject– and/or object – transformations between the poles of their intentional relation (see Fig. 2). There is no single universal trascendental that applies to any experiencing subject whatsoever, contrary to Kant. Catren’s speculative transcendence allows for one to “escape” the Kantian “claustrophobia” or prohibition that states that we can never “go beyond” our human transcendental structuration – the anthropic manifold(3) consisting of physiological, neurophysiological, conceptual-categorical, imaginary, linguistic, historical-cultural-sociological, existential and so on (Catren, 2015, pp. 66–67). Instead, there are multiple, unique transcendental structurations that can be explored via phenoumenodelic variations.
Returning to sound, the Kantian notion of the transcendental affords specific ways of experiencing sound and limits ways in which both the listening subject and the sonic object can be transformed. Catren adds to this the possibility to modulate the very structuration of a sonic experience (i.e., modify the intentional subject-to-object relation itself). The ways in which something is audible multiplies as an effect of there being varied transcendentals. Using this framework, we are able to expand Sieroka’s initial ternary description to include another layer: the speculative. To speak of this speculative framework in terms of the sonic would be to speak of the means for manipulating how, rather than what, we think of sound.
It must be insisted that it is not the case that every perceptual modification can be said to be a phenoumenodelic experience. While the former might only be a perspectival shift, the latter only occurs in the rare instance when the available modifications of an intentional relation are altered. These alterations of experience extend or reduce the intentional space of possibility. For example, octave equivalence is an identity relation specified within a transcendental structure that allows for the experience of A4 (440 Hz) as identical to A5 (880 Hz) with respect to the pitch class of A. Another example of this is when the utterance “dog” is heard as equivalent to the sound “dɒg” as well as “dↄg”.(4) In terms of these examples, phenoumenodelic variations fundamentally change one’s internal structuring of pitch or phonemes such that the described distinctions disappear or are extended to include unfamiliar elements (e.g., B4 (493 Hz) or “bɒg”). Through this action the horizon of potential phenomenal transformations changes such that new experiences become possible, while others are no longer.
Neural networks are computational systems that modify their internal structure in order to learn.(5) This modification is designed to replicate the types of physical changes that occur in living systems. Deep neural networks expand on this idea but increase the number of layers of neurons between their input – and output – nodes. The number of layers between the input – and output – nodes define the degree of non-linearity and abstraction afforded by the model (Bengio, 2009).
If we imagine the space of possible relations for a given system, then the deep network learns to associate specific features of that space with particular inputs and outputs. This method enables the classification and identification of highly complex objects, although their complexity is usually overlooked. For example, consider a rather simple sonic object: a phoneme or word. The correct identification requires handling many variations in palate techtonics, accent, timing and so on (see Fig. 3). More complex sonic objects such as sentences, music genre and environmental noise tend to far exceed this simple identification task. This is also the case when it comes to other modalities such as vision and kinaesthetics.
The challenge of all neural networks, but especially deep nets is that they are black box algorithms (Bengio, 2013; Coates et al., 2011; Zahavy, 2016). What this means is that no one knows what exactly the network is using to identify, classify or perform other tasks on the desired objects of interest. Even if the tasks of these black box algorithms are theoretically computable by hand, and thus analytically tractable, the dimensionality of the problem space is often so great that these tasks are epistemologically intractable for humans.
In order to ameliorate this intractability, researchers in computer science began developing what can be called transmutative tools that would allow them to explore the internal structure of these black box algorithms (Zeiler & Fergus, 2014). For example, Nguyen et al. (2014) generate “speculative” images using evolutionary algorithms that select visual patterns of a population of images mutated based on the “highest prediction value” of a trained deep net. Similarly, Dosovitskiy & Brox (2015) train their deep networks to predict the pre-image that could have produced a given set of features, “inverting” the standard training procedure. They then perturb the pre-image in an effort to understand the inner workings of the deep net. In both cases, the mechanisms by which the algorithms select their own features for recognition during training were everted and made more legible. In effect, the resulting tools transmute the unknown internal structure of these black boxes into infograms that can be read, or heard. But, a black box is not always easy to hear.
What was most salient about the images resulting from opening the black box were the anomalies (Nguyen et al., 2014). For example, the freight car image in Fig. 4 has no resemblance to what would be typically recognised as a freight car. This suggests, in light of our discussion of Catren’s work, that the underlying “transcendental framework” of this deep net (i.e., its internal black box processes) is different from that of a human transcendental. Anomalies in the image express that salient features for recognition can be entirely different in the context of the machine and of the human. In this sense, the criteria used to recognise these anomalous images within the deep net escape the Kantian frame and require a kind of speculative transcendence. This is not limited to the visual.
There are few examples of transmutative tools used within an audio context. And yet, Kereliuk et al. (2015) note that deep neural networks are increasingly being applied towards music content.(6) However, the deep neural network trained by Kereliuk et al. (2015) was easily biased such that songs, recognisable as pertaining to a specific genre, were misclassified.(7) These songs were used to “show how it is relatively easy to fool the same state-of-the-art [deep neural net]-based music content analysis system” (Kereliuk et al., 2015). In other words, the application of these black box algorithms towards music content analysis is quite fragile and much more research needs to be undertaken within the auditory domain.
The difference previously discussed between perspectival shifts and phenoumenodelic variations is useful here. The mischaracterisation of features exemplified by these content analysis deep nets express minute differentiations within a single transcendental – or, more succinctly, a perspectival shift. There is a substantial gap between biasing an algorithm towards a particular mischaracterisation of features and the transmutative tools that we propose to understand what might be the alternate transcendental structures of black box algorithms. It would seem possible to construct situations such that, following the anomalous examples shown in Fig. 4, we are able to explore sonorous structurations and generate what could be called sonic pre-images.(8)
In the Summer of 2015, a post entitled “Inceptionism: Going deeper into neural networks” was released on Google’s research blog (Mordvintsev et al., 2015). It popularises Google’s own research into what we have been calling transmutative tools (Szegedy et al., 2015). Their text outlines a deep net process called gradient ascent and the associated system, both of which were engineered to examine and improve black box image processing structures. Google’s blog entry describes this mechanism in some detail under the name “Inceptionism”, which later came to be known as “DeepDream”. Yet, the figures of DeepDream are easily misinterpreted. Unlike the examples above, where the visualisations correspond with the internal structuration of the underlying networks, the popularisation of the images of Google’s DeepDream lost any sense of the implicit transcendental structure that was being visualised. In this way, these images were aestheticised as images rather than infographic data visualisations. DeepDream’s images were not read, they were merely seen.
An aestheticisation of the transmutative output (i.e., a fixation on the images of DeepDream) encourages a misinterpretation of possible speculations. When images are devoid of their infographic legibility the representation of internal features is lost. Without the use of transmutative tools to preserve the infographic context, the DeepDream images can become illusive abstractions of the autonomous creativity of machines. In an extreme case, a rogue algorithm that classifies a song into any genre can be mistakenly construed as a creative listener.(9)
Our claim is that DeepDream, as an idea rather than software, has come to represent a problematic view of the capacities of deep learning in popular culture, especially in terms of creativity. DeepDream describes only what it is trained on: millions of images of animals, eyeballs, places and so on. Gradient ascent, a process used by DeepDream, recursively accentuates existing patterns in the image based on this training. It amounts to a “brute force” search technique of a highly non-linear feature space of images. By “brute force” we do not mean that the system examines every possible permutation before selecting the best result for a given set of criteria. Deep neural nets of this kind force the convergence of prior patterns and the input in a brute manner. The brutality in this approach is circumscribed within a schema of human creativity, which masks the internal mechanisms of the algorithmic process. Yet, a brutal expansion of our scope of perception via phenoumenodelic variations – potentiating greater creativity – is an alternative view of this process. In terms of what was discussed above, the misrecognition of the algorithm’s output as an object of creativity can be seen as confining a non-human transcendental (i.e., the software of DeepDream) within an anthropic manifold (i.e., the schema of creativity). We name this accommodation misanthroposchematism.
Disinceptionism is the systematic disillusionment of the aestheticisation of black box algorithms and their output. It is not our intent to limit the impact of these absorbing images (see Fig. 5); instead it is to unbox the transcendental mechanisms involved in DeepDream. Particular processes of transmutation can be constructed such that researchers and programmers can gain insight into how to further develop deep nets. The pedagogical role of Google’s inceptionism provides the user with a new means towards speculative transcendence. The deep net, by affording a different set of transcendentals, can allow for the exploration of as yet impossible phenomenologies – or what Catren calls phenoumenology. In effect, transmutative tools had to be made to train the trainer to better train the algorithm.
It is in the contrast between our (human) transcendental framework and the speculative (machine) transcendence that inchoate patterns become the focus of investigations beyond our anthroposchematic tendencies. These latter tendencies exclude, confine or delimit the noumena as imperceptible rather than allowing us to gain traction on the infra_perceptible phenoumena.(10) Following philosopher Chris Swoyer, this process of gaining salience via an alternative domain is named “surrogate reasoning” by Sieroka (2015).(11) Disinceptionism aims to increase sensitivity to possible transcendental modifications in a similar way, thereby eliciting the possibility of a phenoumenodelic experience (Catren, 2015). It is worth noting that these experiences do not necessarily require the use of technology as per the phonetic example used prior. We merely need to directly or indirectly act on how one models the world.
The situation we have described could be distilled as follows. (1) An object was speculatively registered outside the anthropic manifold. (2) It was necessary to generate a means to perceive this region. (3) Attention is (potentionally mis)directed towards the codified output. (4) Disinceptionism refocuses attention on the process of transmutation, which makes things infra_perceptible (rather than imperceptible) in speculative modes.
In the first half of the article, we extended Norman Sieroka’s 3-fold distinction of sound (acoustic-physiological-phenomenal) to encompass a fourth layer: the speculative. We achieved this subtle distinction by integrating Sieroka’s physiological level with Immanuel Kant’s transcendental level and then partitioning it according to Gabriel Catren’s notion of phenoumenadelic speculative transcendence. We closed the section by highlighting some of the important differentiations between perspectival (phenomenal) shifts and transcendental (phenoumenal) shifts with some examples.
In the second half of the article, we provided a brief, intuitive introduction to the notion of neural networks, particularly in their “deep” instantiations. Through this discussion, we outlined how the internal structure of these systems is opaque to the user, which encouraged the development of techniques – transmutative tools – for increasing their transparency. Google’s DeepDream application was then introduced as an illustrative example of both potential speculative transcendence and the inherent inertia of anthroposchematism where a fundamentally alien, black box system is viewed in terms of human creativity and imagination. In an effort to accelerate out of this inertia, we encourage a disinceptionism of the user whereby creativity is achieved by exploring exactly how creativity fails in Google’s deep inception.
In future work, deep neural networks will be used to facilitate a comparative analysis of acoustic or even archaeoacoustic spaces where the algorithm is trained on a variety of built environments. The resulting system could then be used to explore the sonic pre-image produced by these environments. We propose that the audition of the transcendental structurations resulting from these soundscapes could potentiate a sonic field wherein which processes of transmutation become the means for the practice of a new form of auscultation (Barcelos, 2016).(12) DeepDream is a warning of how easily our tools capture our phenomenal apprehensions, potentially obfuscating the infra_perceptible in the process.