Occasional posts on subjects including field recording, London history and literature, other websites worth looking at, articles in the press, and news of sound-related events.

06 February 2012

Thoughts on classifying field recordings

RECENTLY AN EMAIL arrived from Eric Leonardson of the World Listening Project, the motivating force behind World Listening Day, which is due to be marked again on July the 18th this year.

Eric kindly asked what categories I’d recommend for classifying field recordings. This is a subject which has been nagging at me intermittently over the past couple of years. For example, the ‘Social’ high-level category in the Sound Actions section here might as well be titled ‘Miscellaneous’.

It’s little more than a ragbag collection of recordings linked only by the common feature of having several or many people making a noise at once. Eric’s question prompted me to try to think things through again, and here’s the reply I sent him (plus a bit of tidying up):

You asked what categories might be considered for the classification of field recordings. The reply could well be titled ‘Problems of classification’.

The question is more easily answered if one assumes that you are classifying recordings made by yourself, by close associates, or by recordists who have provided detailed metadata.

In such cases the intentions behind any recording will be understood and one will know which elements of the auditory scene should be attended to in order to derive a classification. This is a point which will be visited again later.

It is worth considering what the overall sorting or classification strategy might be, since this will need to be taken into account when devising metadata fields for the individual recordings.

If the recordings are intended for internal organisational use, for dedicated scholarship, or where the collection is not very large, then a hierarchical taxonomy may be sufficient without even a search function, and presumably there will be metadata fields corresponding to each taxonomic level.

But you may intend to make recordings publicly available through a website. In that case you could consider a tag-driven folksonomy, such as the one used on Soundcloud. Then the metadata scheme may need to include a tag field.

There is a good case for having a hybrid of taxonomic and folksonomic approaches. Further questions and possibilities arise from such a model, including whether tags should influence taxonomic categories, or whether taxonomic categories should influence the range of available tags, or whether the two should co-exist independently.

On the subject of a taxonomy, it perhaps best to start with some thoughts on a fairly simple and pragmatic approach. However I can only guess at what intentions you have, and it is these which must drive the classification system.

I will assume that you are not looking to create a sound effects library, nor will you necessarily be seeking to replicate the curatorial divisions of labour found in museums and other instititions.

If one is instead looking at how to organise a collection of field recordings, then the taxonomy might reflect differences in the salience of the three basic factors of agency, place and time or, if you prefer, what, where and when.

Agency will tend to be most salient with recordings in which either a single sound source dominates the auditory scene, or in which many sound sources of the same kind are present.

Obvious sub-divisions of agency are human, animal, and inanimate. Human and animal sounds often share the quality of being directed towards some goal, whereas inanimate agents, better thought of simply as causes, have no goal and are epiphenomenal, such as the sounds of the wind or of water.

Animal sounds lend themselves to further sub-division along biological taxonomic lines, with the functions of particular vocalisations, such as contact calls or warning sounds, perhaps left to a level of categorisation below that of an individual species.

Human sounds, more so than animal sounds, can be considered as both functional, in the sense of verbal and non-verbal sound signals, and also as the epiphenomenal by-products of many human activities, such as traffic, noises from building sites and so on.

It is not easy to think of a good way of classifying all functional human sound signals according to a single scheme. Perhaps a distinction between their proximate and ultimate goals would be easier. But it will be hard to make it work in practice, since it is not clear which, on balance, yields the greater number of useful categories.

For example, both the proximate goal of causing the listener to move somewhere, and the ultimate goal of relieving the listener of their money, can both be applied to so many disparate sound signals that they cease to have much discriminative power.

A good classification system will be one in which each individual recording is well described simply by the category labels applying to it, without having to read the description field in the recording’s metadata.

In practice, some kind of fudge between proximate goals, ultimate goals and context is the likely outcome, and will be justified in retrospect by an appeal to how an idealised average listener tends to think about the world.

Intellectual fields of enquiry such as linguistics, musicology and the philosophy of language have fared rather better in sorting the phenomena they study.

For example, the distinctions in the speech act theory of John L Austin and John Searle are useful in thinking of utterances as ways of doing things with words, but it is not obvious how they can be developed to describe all possible sound signals.

Despite these problems, agency as a factor in classification allows for a high level of concordance between the perceptions of the recordist, classifier and listener.

Categorising according to place may often reflect the recordist’s goal in representing a soundscape. It is also a useful way of organising recordings in which there are many independent sound sources, to the extent that no single sound source or type of sound source can be considered the recording’s definitive subject.

Categorising according to time will be most relevant with historical recordings, in which the era they are from is judged to be as interesting as their subject matter or, rather, hard to distinguish from it, a factor that increases as they get older.

A taxonomy has the advantage of matching how sound files may be organised into folders and sub-folders on a hard drive. It demands some precision in thinking about the contents of the recordings. It can also be the same structure presented to the end-user, who can then search by scrutinising collapsible lists, categories or sound maps (which are a form of unstructured list) and so make serendipitous discoveries.

Folksonomies and tag-driven searches have the advantages of finding similarities across domains, for example by identifying particular acoustic qualities, such as droning, buzzing or echoes. Or they could allow a comparison of recordings across animal and human categories, such as sound signals which work to elicit parental care or raise the alarm.

If multiple users provide input then the aggregate weighting of shared features between recordings can be measured through multivariate statistics or subjected to sorting techniques such as cluster analysis. There’s usually a nice graph in there somewhere.

It is quite possible to combine both taxonomy and folksonomy within a single interface. On the subject of interfaces, I would like to express dismay at the prevalence of lists compiled by popularity, since the display of these is likely to cause feedback effects which canalise the choices made by site visitors when exploring. A decent interface should not need such gimmicks.