Ben Raymond, firstname.lastname@example.org
Peter Neish, @peterneish
Unstructured data such as free text, images, sounds, and other media are less commonly used in biodiversity studies than structured information such as species occurrences. Nevertheless, these sources represent a rich stream of information, some of which is not generally available through more conventional sources. This information is of value for scientific, research, and conservation management use, as well as for communication and outreach to public and other audiences.
A relatively recent development in the GBIF portal and API is the registration of media (sound, images, or video) in occurrence records. Here, we use this facility to reconstruct the "soundscapes" of particular regions by compiling the bird and frog sounds from those regions.
This second version of GBIF Soundscape builds upon our previous entry through a redesigned user interface. Users can now build their own soundscapes by adding and removing taxa through the site. Users can also generate soundscapes based on location, type of organism (bird or frog) and season (winter or summer). We have also included additional localities.
We do the data manipulation in R, using the rgbif package to interact with GBIF. The data processing source code is available here. Note for users not familiar with R: if you get errors saying "there is no package called blah" then you just need install the package:
Our aim is to find the sounds associated with the birds and frogs present in a region of interest. Ideally we could just search for occurrence records from our region of interest that also have associated media files. Unfortunately, though, most occurrence records are simple observations without associated media.
Our strategy instead is to use a general occurrence search to discover the taxa that are present in our region of interest, and then a secondary search for sound files for those same taxa (regardless of location). These media will almost certainly have been recorded at locations other than our region of interest, but in the same way images are used across different localities, we assume that a given species makes the same sounds worldwide — note that this might not necessarily always be the case.
Since there aren't very many occurrences of birds and frogs with sound media, the easiest approach is just to grab the lot, cache them locally, and filter them later according to other criteria. This also means that we only need to hit the GBIF servers once for this part of the processing.
We choose a region of interest and find out which bird and frog taxa occur there, then intersect that list of taxa with the list for which we have sound media. One media item per taxon is used, and we also retrieve an image for each taxon to use in the web interface.
For data-rich regions, we can apply additional filters according to the attributes of the data — for example, by time of year, allowing seasonally-varying results. We investigated using time of day, but found that it did not give consistent results.
The processed data are summarised into a JSON-formatted file to be used by the web interface, including the appropriate citation details for each image and audio file. We have used a local cache of images and sounds to avoid latency and cross-site request issues with media files.
A set of pre-processed regions are available for the user to select, either from the map or from a drop-down list:
On selecting a region, the map zooms to the region extent (i.e. the spatial extent used to define the species list for the region). An initial soundscape is created from a random selection of the taxa present. The user is then able to add additonal taxa by clicking on any of taxa listed down the right hand side. Taxa can also be easily removed by clicking remove. At any time a user can generate a new soundscape based on all taxa, just frogs or birds, or by season (winter or summer). Global audio controls also allow the user to play or stop all sounds:
A copy of the interface has been included in the submission. Use Firefox if running from a local copy: Chrome doesn't allow access to local json files with Ajax.
The sound component of video media could be used to supplement the available audio files, although we didn't do it here.Some sound media contain human voices, which strongly detract from the overall results. We excluded such audio clips here by avoiding all media from any provider that tended to have voices present — rather a brutal approach! We could have removed the first part of the sound file, when voices are usually present, or a more elegant solution would be to use some digital signal processing to identify periods of unwanted sounds (e.g. human voices, traffic noise). Periods of silence could also be removed, reducing long recordings down to shorter snippets of still-relevant audio.
The demonstration interface provides sounds and images for a number of pre-computed regions. We explored the option of allowing the user to select an arbitrary region of interest, however the the GBIF API does not provide a way to efficiently extract a species list in an arbitrary region of interest, and so this is not currently a practical option.
It's worth noting that, at the time of writing, other sound media were available through GBIF, but were not returned when the mediatype was specified. This was due to the required metadata not being recorded in the source. This is being followed up by the GBIF team.