11 March 2012

A taxonomy for field recordings

LAST MONTH I put down some ideas on classifying field recordings. In hindsight they’re not up to much, but you’ve got to start somewhere and a blog post is as good a place as any.

If you’re interested in field recording and you’ve not yet joined the Phonography group on Yahoo, then you should. I gave the blog post a puff there and was pleased to receive thoughtful replies from Robin Parmar, Udo Noll and Peter Cusack. Thanks also to Joseph Young for mentioning the Futurist composer Luigi Russolo and his classification scheme for ‘noise-sounds’.

There was scepticism in three areas. First, that it is impossible to classify sounds according to a single dimension or factor. Second, classification schemes are possible, but they’ll inevitably reflect the priorities or interests of the person devising them. The unstated outcome of this must be that no sound classification scheme has a chance of being anything other than idiosyncratic. Third, specific areas of sound research are likely to demand their own tailored classification schemes.

I agree with the first point, think the second may well be true but you don’t know by how much until you try, and consider the third to be more critical for metadata schemes than a taxonomy. Below you can see a field recording taxonomy I’ve drawn up since the earlier post on classification. It’s too big to fit the column width, hence the scrollbar, but you can also see it on its own page by clicking here.

Field recording taxonomy diagram

It’s a taxonomy for field recordings rather than the individual sound types that you might find in a commercial sound effects library. Of course, many sound effects are field recordings, only they just aren’t described as such and they usually lack metadata. In this taxonomy, single-subject recordings exist as subsets of all possible auditory scenes. Recordings are sorted by the semantic and functional aspects of the auditory scene rather than by its perceptual qualities – what’s happening or where before what it sounds like.

The term auditory scene is borrowed from the work of the psychologist Albert Bregman, whose Auditory Scene Analysis: The perceptual organisation of sound was a landmark in the science of perception ranking alongside David Marr’s 1982 book Vision. An auditory scene is much the same as Pauline Oliveros’s definition of a soundscape: All of the waveforms faithfully transmitted to our auditory cortex by the ear and its mechanisms.

An aim of the taxonomy is to gain good discriminative power from a fairly small number of classification decisions, and to do so in ways which make sense to the hypothetical average listener. Many of the intermediate categories in the taxonomy are self-explanatory. But some probably aren’t, and they include the distinction between focused and diffuse recordings, between signals and incidental sounds, and between unified and contextual human sound signals.


This distinction reflects the taxonomy’s goal of classifying recordings of auditory scenes over isolated sounds. A focused recording has a clearly defined subject which consists of one or more sound sources. The latter may be multiple instances of the same kind, such as the calls of ticket touts outside a concert venue. Or it may feature different sound sources which are somehow interdependent or fulfill distinct roles within an overall scheme. For example, the sounds of tannoy announcements, referee’s whistle and spectators’ chants co-occur not by chance but because such actions are part of the performance of a football match.

Diffuse recordings have their locations as their subject and in more familiar language they’re labelled as atmospheres, ambience or, using a different definition to the one of Oliveros mentioned earlier, soundscapes. The sounds comprising them may be intercorrelated, in the way that seagull cries are often heard along with the sound of the sea, but they have few or no direct causal links with one another. Actions and the sounds arising from them are more likely to be causally related as their proximity increases, but this isn’t always the case. The sounds of household activities made by occupants following their own separate agendas could comprise a diffuse recording.


A signal is defined here as an auditory act which alters the behaviour of other organisms, which evolved or was designed because of that effect, and which is effective because the receiver’s response has also evolved or been learned. (Echolocation stands somewhat apart from this definition because the signal is self-directed.) Incidental sounds are the by-products of an organism’s activities, such as feeding, locomotion or nest-building.

There’s no sign at all that wildlife recordists need such a distinction enshrined in separate terminal categories rather than simply written in a metadata title or description field. Animal sounds in collections are typically organised by species, sometimes by habitat as well, and the large majority of recordings are of evolved signals. It’s more useful for classifying sounds made directly or indirectly by people.

Many sounds, particularly in urban environments, are the unintended result of human activity. It is their lack of design and function which distinguishes them from signals, not whether they are more or less informative. People may use incidental sounds as sources of information, for example a car mechanic who listens to an engine for diagnostic purposes or a doctor pressing a stethoscope to a patient’s chest. But when this happens, the sounds are better thought of as cues rather than signals.


Sound signals are often designed to capture attention by dominating the auditory scene. Familiar examples include fire alarms, church bells, street traders’ cries, and public oratory. Sometimes these are the results of individual actions, and at other times they exist as arrays of similar and co-ordinated signals, such as applause and the chanting of slogans. These are categorised as unified sound signals because it is easy and generally accurate to describe them in terms of single proximate goals, whether it’s to summon help, clear the way, express approval, or sell a product.

It becomes much harder to apply the label of a single goal when a recording captures the sounds made by people taking different roles during an event. A policeman stops and searches a youth in a busy street. The youth complains and swears, the policeman tells him to stop swearing, the youth’s mother appears and berates the policeman, the policeman tries a conciliatory approach with her, onlookers gather and some demand to know what’s going on, the policeman gets on his radio to summon assistance, a police siren is heard faintly and grows louder.

Such exchanges often take a predictable course and may even have a ritualised aspect to them. They have boundaries in where they happen and when they begin and end. We think of them most readily as events or occasions which follow a kind of script, either formal or informal, or else have a common focus of attention, even when those who are heard have conflicting interests. Recordings of such events are categorised as contextual because they can’t be labelled according to any single goal.

