2. Material and methods
2.1. Song recordings
Song recordings were collected on the humpback whale breeding grounds in French Polynesia off the island of Mo'orea, and in northern Ecuador off the coast of Esmeraldas (figure 2a; electronic supplementary material, figure S1). Recordings were made during the austral winter breeding season (July–November) from 2016 to 2018 during opportunistic boat-based surveys and with a moored autonomous recorder. Boat-based recordings were made in French Polynesia in 2016 using a HTI 96MIN hydrophone connected to a H4N Pro Zoom recorder (WAV format, 16 bit, sampling rate 44.1 kHz). Passive acoustic recordings were collected in 2016, 2017 and 2018 using an Ocean Instruments SoundTrap STD300 (WAV format, 16 bit, sampling rate 24 kHz, duty cycle 30 min in every 120 min) in 30 m water (recorder 4 m above the sea floor) at 17°32.860 S and 149°46.148 W (electronic supplementary material, figure S1b). All recordings in Ecuador were boat-based and made in the ‘Bajos de Atacames’ up to 5–10 km offshore from the Esmeraldas River (0°59054.1″ N, 79°38037.7″ W) to Punta Galera (0°49010.15″ N, 80°02055.67″ W; electronic supplementary material, figure S1c). Songs were recorded using a H2a-XLR omnidirectional hydrophone (sensitivity of –180 dBV/uPa +4 dB, from 20 Hz to 100 kHz) and a DolphinEar/Pro omnidirectional hydrophone (sensitivity 15 Hz to 20.000 Hz ± 3 dB) connected to a TASCAM DR-40 recorder (WAV format, 16 bit, sampling rate 44.1 kHz).
The data represent a snapshot of song sung in each of these populations each year but are broadly representative of each population due to the strong song matching among individuals [41,58]. The three highest quality recordings were selected from each year (representing the start, middle and end of the season, where possible) in French Polynesia and Ecuador (table 1, N = 18; see electronic supplementary material, S1). One recording (from Ecuador 2017, table 1) had multiple singers present in the recording; it was uncertain that the same individual remained in the foreground consistently, so the recording was subdivided into sections where the song could consistently be followed. This resulted in four sections of song which were treated as ‘individual singers’, but two of these were not included in song comparisons due to being less than 10 min in length and thus may not be representative of a full song. As a result, a total of 21 singers from 18 recordings were transcribed, with 19 singers in total being included in song comparison analyses (table 1).
Table 1.
Song recordings from two South Pacific breeding populations, French Polynesia and Ecuador. A total of 18 song recordings were analysed between 2016 and 2018. The number of song cycles (no repetition of a theme but allowing repetition of phrase variants/types if consecutive), the song type (song type 1 = blue, song type 2 = green, song type 3 = orange, song type 4 = grey) and the sequence of themes sung (*theme descriptions are in electronic supplementary material, table S2) are noted per singer. H = hybrid singer combining themes from two song types [60].
2.2. Song transcription and unit classification
All songs were analysed as spectrograms in Raven Pro 1.6.1 (fast Fourier transform 2048, Hann window, 50% overlap, showing 0–5 kHz and 20 s increments). Song transcription was conducted at the unit level by a human classifier (J.N.S.) following Garland et al. [40,41,57,71]. Each unit was classified based on its aural and visual characteristics such as frequency range, duration and contour. To ensure unit classifications were consistent and repeatable, a subset of units were measured for 11 acoustic parameters following previous humpback unit classification analyses [42,71,72] (see electronic supplementary material, S1, for further information). The subset of units selected for measurement (n = 859) were all units from one high-quality example of each phrase type present for each location and year, plus any rare unit type not included in those selected phrases [72]. A random forest analysis was run in R (v. 3.5.3) [73] using the randomForest package [74] (mtry = 6, 1000 trees) which resulted in an out-of-bag error rate of 27.47% indicating an adequate level of agreement between quantitative and qualitative classification of unit types.
2.3. Assigning unit strings to themes
Units were transcribed into phrases. Repeating unit strings (phrases) were grouped into themes following the typical theme classification rule of ‘similar sounds in similar positions' [37,66], and given a number (e.g. 1, 2, 3). Variations of the same theme which were comprised of the same or similar structure, but with slightly different units (e.g. ‘squeaks’ versus ‘high squeaks’; electronic supplementary material, table S2), or added/deleted units, were allocated letters (e.g. a, b, c) to indicate this variation (referred to as a ‘phrase type’, e.g. 7a). To ensure the qualitative assignment of phrases to themes was robust and repeatable for all themes regardless of the location or year they were recorded, a LSI analysis was undertaken, following previous studies [40,57,58,71,72]. The Levenshtein distance is a metric that quantifies the similarity between sequences through grouping or clustering those that are the most similar and has frequently been used to analyse the similarity of humpback whale song [51,72,74–77]. LSI produces a proportion of similarity between any two strings of data by counting the number of changes (additions, deletions or substitutions) to convert one string into another which is standardized for string length [40,66,71,78,79]. The LSI analysis was conducted in R using custom-written code (package leven, available at http://github.com/ellengarland/leven) to compare all phrase strings (n = 1457). The LSI theme similarity matrix was hierarchically clustered and visualized as a dendrogram using average-linkage (UPGMA) clustering to validate the qualitative assignment of phrases to themes (both within and between all years and populations). The cophenetic correlation coefficient (CCC) was also calculated to ensure the clustering method chosen provided the best representation of the connections within the data (considered ‘good’ if CCC > 0.8) [71,72,80]. The CCC using average-linkage clustering was 0.88 indicating our theme assignments were robust and were a good representation of the connections within the data. Once theme assignments were confirmed, a single set median unit string was calculated per theme for each location/year combination [40,78]. The calculation sums all similarity scores within the theme and selects the string with the highest score (i.e. similarity) to act as the most representative unit string for each theme, per location and year (electronic supplementary material, table S1).
Pages: 1 · 2 · 3 · 4 · 5 · 6
News and Issues, Culture Watch, Health and Science, Learning, Travel, Senior Women Web, Articles, Sightings, What's New