mir_eval
Documentation¶
mir_eval
is a Python library which provides a transparent, standaridized, and straightforward way to evaluate Music Information Retrieval systems.
If you use mir_eval
in a research project, please cite the following paper:
Raffel, B. McFee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang, and D. P. W. Ellis, “mir_eval: A Transparent Implementation of Common MIR Metrics”, Proceedings of the 15th International Conference on Music Information Retrieval, 2014.
Installing mir_eval
¶
The simplest way to install mir_eval
is by using pip
, which will also install the required dependencies if needed.
To install mir_eval
using pip
, simply run
pip install mir_eval
Alternatively, you can install mir_eval
from source by first installing the dependencies and then running
python setup.py install
from the source directory.
If you don’t use Python and want to get started as quickly as possible, you might consider using Anaconda which makes it easy to install a Python environment which can run mir_eval
.
Using mir_eval
¶
Once you’ve installed mir_eval
(see Installing mir_eval), you can import it in your Python code as follows:
import mir_eval
From here, you will typically either load in data and call the evaluate()
function from the appropriate submodule like so:
reference_beats = mir_eval.io.load_events('reference_beats.txt')
estimated_beats = mir_eval.io.load_events('estimated_beats.txt')
# Scores will be a dict containing scores for all of the metrics
# implemented in mir_eval.beat. The keys are metric names
# and values are the scores achieved
scores = mir_eval.beat.evaluate(reference_beats, estimated_beats)
or you’ll load in the data, do some preprocessing, and call specific metric functions from the appropriate submodule like so:
reference_beats = mir_eval.io.load_events('reference_beats.txt')
estimated_beats = mir_eval.io.load_events('estimated_beats.txt')
# Crop out beats before 5s, a common preprocessing step
reference_beats = mir_eval.beat.trim_beats(reference_beats)
estimated_beats = mir_eval.beat.trim_beats(estimated_beats)
# Compute the F-measure metric and store it in f_measure
f_measure = mir_eval.beat.f_measure(reference_beats, estimated_beats)
The documentation for each metric function, found in the mir_eval section below, contains further usage information.
Alternatively, you can use the evaluator scripts which allow you to run evaluation from the command line, without writing any code. These scripts are are available here:
mir_eval
¶
The structure of the mir_eval
Python module is as follows:
Each MIR task for which evaluation metrics are included in mir_eval
is given its own submodule, and each metric is defined as a separate function in each submodule.
Every metric function includes detailed documentation, example usage, input validation, and references to the original paper which defined the metric (see the subsections below).
The task submodules also all contain a function evaluate()
, which takes as input reference and estimated annotations and returns a dictionary of scores for all of the metrics implemented (for casual users, this is the place to start).
Finally, each task submodule also includes functions for common data pre-processing steps.
mir_eval
also includes the following additional submodules:
mir_eval.io
which contains convenience functions for loading in task-specific data from common file formatsmir_eval.util
which includes miscellaneous functionality shared across the submodulesmir_eval.sonify
which implements some simple methods for synthesizing annotations of various formats for “evaluation by ear”.mir_eval.display
which provides functions for plotting annotations for various tasks.
The following subsections document each submodule.
mir_eval.alignment
¶
Alignment models are given a sequence of events along with a piece of audio, and then return a sequence of timestamps, with one timestamp for each event, indicating the position of this event in the audio. The events are listed in order of occurrence in the audio, so that output timestamps have to be monotonically increasing. Evaluation usually involves taking the series of predicted and ground truth timestamps and comparing their distance, usually on a pair-wise basis, e.g. taking the median absolute error in seconds.
Conventions¶
Timestamps should be provided in the form of a 1-dimensional array of onset times in seconds in increasing order.
Metrics¶
mir_eval.alignment.absolute_error()
: Median absolute error and average absolute errormir_eval.alignment.percentage_correct()
: Percentage of correct timestamps,
where a timestamp is counted
as correct if it lies within a certain tolerance window around the ground truth timestamp
* mir_eval.alignment.pcs()
: Percentage of correct segments: Percentage of overlap between
predicted segments and ground truth segments, where segments are defined by (start time,
end time) pairs
* mir_eval.alignment.perceptual_metric()
: metric based on human synchronicity perception as
measured in the paper “User-centered evaluation of lyrics to audio alignment”,
N. Lizé-Masclef, A. Vaglio, M. Moussallam, ISMIR 2021
References¶
- 1
N. Lizé-Masclef, A. Vaglio, M. Moussallam. “User-centered evaluation of lyrics to audio alignment”, International Society for Music Information Retrieval (ISMIR) conference, 2021.
- 2
M. Mauch, F: Hiromasa, M. Goto. “Lyrics-to-audio alignment and phrase-level segmentation using incomplete internet-style chord annotations”, Frontiers in Proceedings of the Sound Music Computing Conference (SMC), 2010.
- 3
G. Dzhambazov. “Knowledge-Based Probabilistic Modeling For Tracking Lyrics In Music Audio Signals”, PhD Thesis, 2017.
- 4
H. Fujihara, M. Goto, J. Ogata, H. Okuno. “LyricSynchronizer: Automatic synchronization system between musical audio signals and lyrics”, IEEE Journal of Selected Topics in Signal Processing, VOL. 5, NO. 6, 2011
- mir_eval.alignment.validate(reference_timestamps: numpy.ndarray, estimated_timestamps: numpy.ndarray)¶
Checks that the input annotations to a metric look like valid onset time arrays, and throws helpful errors if not.
- Parameters
- reference_timestampsnp.ndarray
reference timestamp locations, in seconds
- estimated_timestampsnp.ndarray
estimated timestamp locations, in seconds
- mir_eval.alignment.absolute_error(reference_timestamps, estimated_timestamps)¶
Compute the absolute deviations between estimated and reference timestamps, and then returns the median and average over all events
- Parameters
- reference_timestampsnp.ndarray
reference timestamps, in seconds
- estimated_timestampsnp.ndarray
estimated timestamps, in seconds
- Returns
- maefloat
Median absolute error
- aae: float
Average absolute error
Examples
>>> reference_timestamps = mir_eval.io.load_events('reference.txt') >>> estimated_timestamps = mir_eval.io.load_events('estimated.txt') >>> mae, aae = mir_eval.align.absolute_error(reference_onsets, estimated_timestamps)
- mir_eval.alignment.percentage_correct(reference_timestamps, estimated_timestamps, window=0.3)¶
Compute the percentage of correctly predicted timestamps. A timestamp is predicted correctly if its position doesn’t deviate more than the window parameter from the ground truth timestamp.
- Parameters
- reference_timestampsnp.ndarray
reference timestamps, in seconds
- estimated_timestampsnp.ndarray
estimated timestamps, in seconds
- windowfloat
Window size, in seconds (Default value = .3)
- Returns
- pcfloat
Percentage of correct timestamps
Examples
>>> reference_timestamps = mir_eval.io.load_events('reference.txt') >>> estimated_timestamps = mir_eval.io.load_events('estimated.txt') >>> pc = mir_eval.align.percentage_correct(reference_onsets, estimated_timestamps, window=0.2)
- mir_eval.alignment.percentage_correct_segments(reference_timestamps, estimated_timestamps, duration: Optional[float] = None)¶
Calculates the percentage of correct segments (PCS) metric.
It constructs segments out of predicted and estimated timestamps separately out of each given timestamp vector and calculates the percentage of overlap between correct segments compared to the total duration.
WARNING: This metrics behaves differently depending on whether “duration” is given!
If duration is not given (default case), the computation follows the MIREX lyrics alignment challenge 2020. For a timestamp vector with entries (t1,t2, … tN), segments with the following (start, end) boundaries are created: (t1, t2), … (tN-1, tN). After the segments are created, the overlap between the reference and estimated segments is determined and divided by the total duration, which is the distance between the first and last timestamp in the reference.
If duration is given, the segment boundaries are instead (0, t1), (t1, t2), … (tN, duration). The overlap is computed in the same way, but then divided by the duration parameter given to this function. This method follows the original paper [#fujihara2011] more closely, where the metric was proposed. As a result, this variant of the metrics punishes cases where the first estimated timestamp is too early or the last estimated timestamp is too late, whereas the MIREX variant does not. On the other hand, the MIREX metric is invariant to how long the eventless beginning and end parts of the audio are, which might be a desirable property.
- Parameters
- reference_timestampsnp.ndarray
reference timestamps, in seconds
- estimated_timestampsnp.ndarray
estimated timestamps, in seconds
- durationfloat
Optional. Total duration of audio (seconds). WARNING: Metric is computed differently depending on whether this is provided or not - see documentation above!
- Returns
- pcsfloat
Percentage of time where ground truth and predicted segments overlap
Examples
>>> reference_timestamps = mir_eval.io.load_events('reference.txt') >>> estimated_timestamps = mir_eval.io.load_events('estimated.txt') >>> pcs = mir_eval.align.percentage_correct_segments(reference_timestamps, estimated_timestamps)
- mir_eval.alignment.karaoke_perceptual_metric(reference_timestamps, estimated_timestamps)¶
Metric based on human synchronicity perception as measured in the paper “User-centered evaluation of lyrics to audio alignment” [#lizemasclef2021]
The parameters of this function were tuned on data collected through a user Karaoke-like experiment It reflects human judgment of how “synchronous” lyrics and audio stimuli are perceived in that setup. Beware that this metric is non-symmetrical and by construction it is also not equal to 1 at 0.
- Parameters
- reference_timestampsnp.ndarray
reference timestamps, in seconds
- estimated_timestampsnp.ndarray
estimated timestamps, in seconds
- Returns
- perceptual_scorefloat
Perceptual score, averaged over all timestamps
Examples
>>> reference_timestamps = mir_eval.io.load_events('reference.txt') >>> estimated_timestamps = mir_eval.io.load_events('estimated.txt') >>> score = mir_eval.align.karaoke_perceptual_metric(reference_onsets, estimated_timestamps)
- mir_eval.alignment.evaluate(reference_timestamps, estimated_timestamps, **kwargs)¶
Compute all metrics for the given reference and estimated annotations. Examples ——– >>> reference_timestamps = mir_eval.io.load_events(‘reference.txt’) >>> estimated_timestamps = mir_eval.io.load_events(‘estimated.txt’) >>> duration = max(np.max(reference_timestamps), np.max(estimated_timestamps)) + 10 >>> scores = mir_eval.align.evaluate(reference_onsets, estimated_timestamps, duration)
- Parameters
- reference_timestampsnp.ndarray
reference timestamp locations, in seconds
- estimated_timestampsnp.ndarray
estimated timestamp locations, in seconds
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
mir_eval.beat
¶
The aim of a beat detection algorithm is to report the times at which a typical human listener might tap their foot to a piece of music. As a result, most metrics for evaluating the performance of beat tracking systems involve computing the error between the estimated beat times and some reference list of beat locations. Many metrics additionally compare the beat sequences at different metric levels in order to deal with the ambiguity of tempo.
- Based on the methods described in:
Matthew E. P. Davies, Norberto Degara, and Mark D. Plumbley. “Evaluation Methods for Musical Audio Beat Tracking Algorithms”, Queen Mary University of London Technical Report C4DM-TR-09-06 London, United Kingdom, 8 October 2009.
- See also the Beat Evaluation Toolbox:
Conventions¶
Beat times should be provided in the form of a 1-dimensional array of beat
times in seconds in increasing order. Typically, any beats which occur before
5s are ignored; this can be accomplished using
mir_eval.beat.trim_beats()
.
Metrics¶
mir_eval.beat.f_measure()
: The F-measure of the beat sequence, where an estimated beat is considered correct if it is sufficiently close to a reference beatmir_eval.beat.cemgil()
: Cemgil’s score, which computes the sum of Gaussian errors for each beatmir_eval.beat.goto()
: Goto’s score, a binary score which is 1 when at least 25% of the estimated beat sequence closely matches the reference beat sequencemir_eval.beat.p_score()
: McKinney’s P-score, which computes the cross-correlation of the estimated and reference beat sequences represented as impulse trainsmir_eval.beat.continuity()
: Continuity-based scores which compute the proportion of the beat sequence which is continuously correctmir_eval.beat.information_gain()
: The Information Gain of a normalized beat error histogram over a uniform distribution
- mir_eval.beat.trim_beats(beats, min_beat_time=5.0)¶
Removes beats before min_beat_time. A common preprocessing step.
- Parameters
- beatsnp.ndarray
Array of beat times in seconds.
- min_beat_timefloat
Minimum beat time to allow (Default value = 5.)
- Returns
- beats_trimmednp.ndarray
Trimmed beat array.
- mir_eval.beat.validate(reference_beats, estimated_beats)¶
Checks that the input annotations to a metric look like valid beat time arrays, and throws helpful errors if not.
- Parameters
- reference_beatsnp.ndarray
reference beat times, in seconds
- estimated_beatsnp.ndarray
estimated beat times, in seconds
- mir_eval.beat.f_measure(reference_beats, estimated_beats, f_measure_threshold=0.07)¶
Compute the F-measure of correct vs incorrectly predicted beats. “Correctness” is determined over a small window.
- Parameters
- reference_beatsnp.ndarray
reference beat times, in seconds
- estimated_beatsnp.ndarray
estimated beat times, in seconds
- f_measure_thresholdfloat
Window size, in seconds (Default value = 0.07)
- Returns
- f_scorefloat
The computed F-measure score
Examples
>>> reference_beats = mir_eval.io.load_events('reference.txt') >>> reference_beats = mir_eval.beat.trim_beats(reference_beats) >>> estimated_beats = mir_eval.io.load_events('estimated.txt') >>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats) >>> f_measure = mir_eval.beat.f_measure(reference_beats, estimated_beats)
- mir_eval.beat.cemgil(reference_beats, estimated_beats, cemgil_sigma=0.04)¶
Cemgil’s score, computes a gaussian error of each estimated beat. Compares against the original beat times and all metrical variations.
- Parameters
- reference_beatsnp.ndarray
reference beat times, in seconds
- estimated_beatsnp.ndarray
query beat times, in seconds
- cemgil_sigmafloat
Sigma parameter of gaussian error windows (Default value = 0.04)
- Returns
- cemgil_scorefloat
Cemgil’s score for the original reference beats
- cemgil_maxfloat
The best Cemgil score for all metrical variations
Examples
>>> reference_beats = mir_eval.io.load_events('reference.txt') >>> reference_beats = mir_eval.beat.trim_beats(reference_beats) >>> estimated_beats = mir_eval.io.load_events('estimated.txt') >>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats) >>> cemgil_score, cemgil_max = mir_eval.beat.cemgil(reference_beats, estimated_beats)
- mir_eval.beat.goto(reference_beats, estimated_beats, goto_threshold=0.35, goto_mu=0.2, goto_sigma=0.2)¶
Calculate Goto’s score, a binary 1 or 0 depending on some specific heuristic criteria
- Parameters
- reference_beatsnp.ndarray
reference beat times, in seconds
- estimated_beatsnp.ndarray
query beat times, in seconds
- goto_thresholdfloat
Threshold of beat error for a beat to be “correct” (Default value = 0.35)
- goto_mufloat
The mean of the beat errors in the continuously correct track must be less than this (Default value = 0.2)
- goto_sigmafloat
The std of the beat errors in the continuously correct track must be less than this (Default value = 0.2)
- Returns
- goto_scorefloat
Either 1.0 or 0.0 if some specific criteria are met
Examples
>>> reference_beats = mir_eval.io.load_events('reference.txt') >>> reference_beats = mir_eval.beat.trim_beats(reference_beats) >>> estimated_beats = mir_eval.io.load_events('estimated.txt') >>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats) >>> goto_score = mir_eval.beat.goto(reference_beats, estimated_beats)
- mir_eval.beat.p_score(reference_beats, estimated_beats, p_score_threshold=0.2)¶
Get McKinney’s P-score. Based on the autocorrelation of the reference and estimated beats
- Parameters
- reference_beatsnp.ndarray
reference beat times, in seconds
- estimated_beatsnp.ndarray
query beat times, in seconds
- p_score_thresholdfloat
Window size will be
p_score_threshold*np.median(inter_annotation_intervals)
, (Default value = 0.2)
- Returns
- correlationfloat
McKinney’s P-score
Examples
>>> reference_beats = mir_eval.io.load_events('reference.txt') >>> reference_beats = mir_eval.beat.trim_beats(reference_beats) >>> estimated_beats = mir_eval.io.load_events('estimated.txt') >>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats) >>> p_score = mir_eval.beat.p_score(reference_beats, estimated_beats)
- mir_eval.beat.continuity(reference_beats, estimated_beats, continuity_phase_threshold=0.175, continuity_period_threshold=0.175)¶
Get metrics based on how much of the estimated beat sequence is continually correct.
- Parameters
- reference_beatsnp.ndarray
reference beat times, in seconds
- estimated_beatsnp.ndarray
query beat times, in seconds
- continuity_phase_thresholdfloat
Allowable ratio of how far is the estimated beat can be from the reference beat (Default value = 0.175)
- continuity_period_thresholdfloat
Allowable distance between the inter-beat-interval and the inter-annotation-interval (Default value = 0.175)
- Returns
- CMLcfloat
Correct metric level, continuous accuracy
- CMLtfloat
Correct metric level, total accuracy (continuity not required)
- AMLcfloat
Any metric level, continuous accuracy
- AMLtfloat
Any metric level, total accuracy (continuity not required)
Examples
>>> reference_beats = mir_eval.io.load_events('reference.txt') >>> reference_beats = mir_eval.beat.trim_beats(reference_beats) >>> estimated_beats = mir_eval.io.load_events('estimated.txt') >>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats) >>> CMLc, CMLt, AMLc, AMLt = mir_eval.beat.continuity(reference_beats, estimated_beats)
- mir_eval.beat.information_gain(reference_beats, estimated_beats, bins=41)¶
Get the information gain - K-L divergence of the beat error histogram to a uniform histogram
- Parameters
- reference_beatsnp.ndarray
reference beat times, in seconds
- estimated_beatsnp.ndarray
query beat times, in seconds
- binsint
Number of bins in the beat error histogram (Default value = 41)
- Returns
- information_gain_scorefloat
Entropy of beat error histogram
Examples
>>> reference_beats = mir_eval.io.load_events('reference.txt') >>> reference_beats = mir_eval.beat.trim_beats(reference_beats) >>> estimated_beats = mir_eval.io.load_events('estimated.txt') >>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats) >>> information_gain = mir_eval.beat.information_gain(reference_beats, estimated_beats)
- mir_eval.beat.evaluate(reference_beats, estimated_beats, **kwargs)¶
Compute all metrics for the given reference and estimated annotations.
- Parameters
- reference_beatsnp.ndarray
Reference beat times, in seconds
- estimated_beatsnp.ndarray
Query beat times, in seconds
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
Examples
>>> reference_beats = mir_eval.io.load_events('reference.txt') >>> estimated_beats = mir_eval.io.load_events('estimated.txt') >>> scores = mir_eval.beat.evaluate(reference_beats, estimated_beats)
mir_eval.chord
¶
Chord estimation algorithms produce a list of intervals and labels which denote the chord being played over each timespan. They are evaluated by comparing the estimated chord labels to some reference, usually using a mapping to a chord subalphabet (e.g. minor and major chords only, all triads, etc.). There is no single ‘right’ way to compare two sequences of chord labels. Embracing this reality, every conventional comparison rule is provided. Comparisons are made over the different components of each chord (e.g. G:maj(6)/5): the root (G), the root-invariant active semitones as determined by the quality shorthand (maj) and scale degrees (6), and the bass interval (5). This submodule provides functions both for comparing a sequences of chord labels according to some chord subalphabet mapping and for using these comparisons to score a sequence of estimated chords against a reference.
Conventions¶
A sequence of chord labels is represented as a list of strings, where each label is the chord name based on the syntax of 5. Reference and estimated chord label sequences should be of the same length for comparison functions. When converting the chord string into its constituent parts,
Pitch class counting starts at C, e.g. C:0, D:2, E:4, F:5, etc.
Scale degree is represented as a string of the diatonic interval, relative to the root note, e.g. ‘b6’, ‘#5’, or ‘7’
Bass intervals are represented as strings
Chord bitmaps are positional binary vectors indicating active pitch classes and may be absolute or relative depending on context in the code.
If no chord is present at a given point in time, it should have the label ‘N’,
which is defined in the variable mir_eval.chord.NO_CHORD
.
Metrics¶
mir_eval.chord.root()
: Only compares the root of the chords.mir_eval.chord.majmin()
: Only compares major, minor, and “no chord” labels.mir_eval.chord.majmin_inv()
: Compares major/minor chords, with inversions. The bass note must exist in the triad.mir_eval.chord.mirex()
: A estimated chord is considered correct if it shares at least three pitch classes in common.mir_eval.chord.thirds()
: Chords are compared at the level of major or minor thirds (root and third), For example, both (‘A:7’, ‘A:maj’) and (‘A:min’, ‘A:dim’) are equivalent, as the third is major and minor in quality, respectively.mir_eval.chord.thirds_inv()
: Same as above, with inversions (bass relationships).mir_eval.chord.triads()
: Chords are considered at the level of triads (major, minor, augmented, diminished, suspended), meaning that, in addition to the root, the quality is only considered through #5th scale degree (for augmented chords). For example, (‘A:7’, ‘A:maj’) are equivalent, while (‘A:min’, ‘A:dim’) and (‘A:aug’, ‘A:maj’) are not.mir_eval.chord.triads_inv()
: Same as above, with inversions (bass relationships).mir_eval.chord.tetrads()
: Chords are considered at the level of the entire quality in closed voicing, i.e. spanning only a single octave; extended chords (9’s, 11’s and 13’s) are rolled into a single octave with any upper voices included as extensions. For example, (‘A:7’, ‘A:9’) are equivlent but (‘A:7’, ‘A:maj7’) are not.mir_eval.chord.tetrads_inv()
: Same as above, with inversions (bass relationships).mir_eval.chord.sevenths()
: Compares according to MIREX “sevenths” rules; that is, only major, major seventh, seventh, minor, minor seventh and no chord labels are compared.mir_eval.chord.sevenths_inv()
: Same as above, with inversions (bass relationships).mir_eval.chord.overseg()
: Computes the level of over-segmentation between estimated and reference intervals.mir_eval.chord.underseg()
: Computes the level of under-segmentation between estimated and reference intervals.mir_eval.chord.seg()
: Computes the minimum of over- and under-segmentation between estimated and reference intervals.
References¶
- exception mir_eval.chord.InvalidChordException(message='', chord_label=None)¶
Bases:
Exception
Exception class for suspect / invalid chord labels
- mir_eval.chord.pitch_class_to_semitone(pitch_class)¶
Convert a pitch class to semitone.
- Parameters
- pitch_classstr
Spelling of a given pitch class, e.g. ‘C#’, ‘Gbb’
- Returns
- semitoneint
Semitone value of the pitch class.
- mir_eval.chord.scale_degree_to_semitone(scale_degree)¶
Convert a scale degree to semitone.
- Parameters
- scale degreestr
Spelling of a relative scale degree, e.g. ‘b3’, ‘7’, ‘#5’
- Returns
- semitoneint
Relative semitone of the scale degree, wrapped to a single octave
- Raises
- InvalidChordException if scale_degree is invalid.
- mir_eval.chord.scale_degree_to_bitmap(scale_degree, modulo=False, length=12)¶
Create a bitmap representation of a scale degree.
Note that values in the bitmap may be negative, indicating that the semitone is to be removed.
- Parameters
- scale_degreestr
Spelling of a relative scale degree, e.g. ‘b3’, ‘7’, ‘#5’
- modulobool, default=True
If a scale degree exceeds the length of the bit-vector, modulo the scale degree back into the bit-vector; otherwise it is discarded.
- lengthint, default=12
Length of the bit-vector to produce
- Returns
- bitmapnp.ndarray, in [-1, 0, 1], len=`length`
Bitmap representation of this scale degree.
- mir_eval.chord.quality_to_bitmap(quality)¶
Return the bitmap for a given quality.
- Parameters
- qualitystr
Chord quality name.
- Returns
- bitmapnp.ndarray
Bitmap representation of this quality (12-dim).
- mir_eval.chord.reduce_extended_quality(quality)¶
Map an extended chord quality to a simpler one, moving upper voices to a set of scale degree extensions.
- Parameters
- qualitystr
Extended chord quality to reduce.
- Returns
- base_qualitystr
New chord quality.
- extensionsset
Scale degrees extensions for the quality.
- mir_eval.chord.validate_chord_label(chord_label)¶
Test for well-formedness of a chord label.
- Parameters
- chordstr
Chord label to validate.
- mir_eval.chord.split(chord_label, reduce_extended_chords=False)¶
- Parse a chord label into its four constituent parts:
root
quality shorthand
scale degrees
bass
- Note: Chords lacking quality AND interval information are major.
If a quality is specified, it is returned.
If an interval is specified WITHOUT a quality, the quality field is empty.
Some examples:
'C' -> ['C', 'maj', {}, '1'] 'G#:min(*b3,*5)/5' -> ['G#', 'min', {'*b3', '*5'}, '5'] 'A:(3)/6' -> ['A', '', {'3'}, '6']
- Parameters
- chord_labelstr
A chord label.
- reduce_extended_chordsbool
Whether to map the upper voicings of extended chords (9’s, 11’s, 13’s) to semitone extensions. (Default value = False)
- Returns
- chord_partslist
Split version of the chord label.
- mir_eval.chord.join(chord_root, quality='', extensions=None, bass='')¶
Join the parts of a chord into a complete chord label.
- Parameters
- chord_rootstr
Root pitch class of the chord, e.g. ‘C’, ‘Eb’
- qualitystr
Quality of the chord, e.g. ‘maj’, ‘hdim7’ (Default value = ‘’)
- extensionslist
Any added or absent scaled degrees for this chord, e.g. [‘4’, ‘*3’] (Default value = None)
- bassstr
Scale degree of the bass note, e.g. ‘5’. (Default value = ‘’)
- Returns
- chord_labelstr
A complete chord label.
- mir_eval.chord.encode(chord_label, reduce_extended_chords=False, strict_bass_intervals=False)¶
Translate a chord label to numerical representations for evaluation.
- Parameters
- chord_labelstr
Chord label to encode.
- reduce_extended_chordsbool
Whether to map the upper voicings of extended chords (9’s, 11’s, 13’s) to semitone extensions. (Default value = False)
- strict_bass_intervalsbool
Whether to require that the bass scale degree is present in the chord. (Default value = False)
- Returns
- root_numberint
Absolute semitone of the chord’s root.
- semitone_bitmapnp.ndarray, dtype=int
12-dim vector of relative semitones in the chord spelling.
- bass_numberint
Relative semitone of the chord’s bass note, e.g. 0=root, 7=fifth, etc.
- mir_eval.chord.encode_many(chord_labels, reduce_extended_chords=False)¶
Translate a set of chord labels to numerical representations for sane evaluation.
- Parameters
- chord_labelslist
Set of chord labels to encode.
- reduce_extended_chordsbool
Whether to map the upper voicings of extended chords (9’s, 11’s, 13’s) to semitone extensions. (Default value = False)
- Returns
- root_numbernp.ndarray, dtype=int
Absolute semitone of the chord’s root.
- interval_bitmapnp.ndarray, dtype=int
12-dim vector of relative semitones in the given chord quality.
- bass_numbernp.ndarray, dtype=int
Relative semitones of the chord’s bass notes.
- mir_eval.chord.rotate_bitmap_to_root(bitmap, chord_root)¶
Circularly shift a relative bitmap to its asbolute pitch classes.
For clarity, the best explanation is an example. Given ‘G:Maj’, the root and quality map are as follows:
root=5 quality=[1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0] # Relative chord shape
After rotating to the root, the resulting bitmap becomes:
abs_quality = [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1] # G, B, and D
- Parameters
- bitmapnp.ndarray, shape=(12,)
Bitmap of active notes, relative to the given root.
- chord_rootint
Absolute pitch class number.
- Returns
- bitmapnp.ndarray, shape=(12,)
Absolute bitmap of active pitch classes.
- mir_eval.chord.rotate_bitmaps_to_roots(bitmaps, roots)¶
Circularly shift a relative bitmaps to asbolute pitch classes.
See
rotate_bitmap_to_root()
for more information.- Parameters
- bitmapnp.ndarray, shape=(N, 12)
Bitmap of active notes, relative to the given root.
- rootnp.ndarray, shape=(N,)
Absolute pitch class number.
- Returns
- bitmapnp.ndarray, shape=(N, 12)
Absolute bitmaps of active pitch classes.
- mir_eval.chord.validate(reference_labels, estimated_labels)¶
Checks that the input annotations to a comparison function look like valid chord labels.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- mir_eval.chord.weighted_accuracy(comparisons, weights)¶
Compute the weighted accuracy of a list of chord comparisons.
- Parameters
- comparisonsnp.ndarray
List of chord comparison scores, in [0, 1] or -1
- weightsnp.ndarray
Weights (not necessarily normalized) for each comparison. This can be a list of interval durations
- Returns
- scorefloat
Weighted accuracy
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> # Here, we're using the "thirds" function to compare labels >>> # but any of the comparison functions would work. >>> comparisons = mir_eval.chord.thirds(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.thirds(reference_labels, estimated_labels)¶
Compare chords along root & third relationships.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0]
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.thirds(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.thirds_inv(reference_labels, estimated_labels)¶
Score chords along root, third, & bass relationships.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0]
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.thirds_inv(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.triads(reference_labels, estimated_labels)¶
Compare chords along triad (root & quality to #5) relationships.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0]
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.triads(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.triads_inv(reference_labels, estimated_labels)¶
Score chords along triad (root, quality to #5, & bass) relationships.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0]
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.triads_inv(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.tetrads(reference_labels, estimated_labels)¶
Compare chords along tetrad (root & full quality) relationships.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0]
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.tetrads(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.tetrads_inv(reference_labels, estimated_labels)¶
Compare chords along tetrad (root, full quality, & bass) relationships.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0]
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.tetrads_inv(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.root(reference_labels, estimated_labels)¶
Compare chords according to roots.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.root(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.mirex(reference_labels, estimated_labels)¶
Compare chords along MIREX rules.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0]
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.mirex(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.majmin(reference_labels, estimated_labels)¶
Compare chords along major-minor rules. Chords with qualities outside Major/minor/no-chord are ignored.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.majmin(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.majmin_inv(reference_labels, estimated_labels)¶
Compare chords along major-minor rules, with inversions. Chords with qualities outside Major/minor/no-chord are ignored, and the bass note must exist in the triad (bass in [1, 3, 5]).
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.majmin_inv(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.sevenths(reference_labels, estimated_labels)¶
Compare chords along MIREX ‘sevenths’ rules. Chords with qualities outside [maj, maj7, 7, min, min7, N] are ignored.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.sevenths(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.sevenths_inv(reference_labels, estimated_labels)¶
Compare chords along MIREX ‘sevenths’ rules. Chords with qualities outside [maj, maj7, 7, min, min7, N] are ignored.
- Parameters
- reference_labelslist, len=n
Reference chord labels to score against.
- estimated_labelslist, len=n
Estimated chord labels to score against.
- Returns
- comparison_scoresnp.ndarray, shape=(n,), dtype=float
Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> est_intervals, est_labels = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, ref_intervals.min(), ... ref_intervals.max(), mir_eval.chord.NO_CHORD, ... mir_eval.chord.NO_CHORD) >>> (intervals, ... ref_labels, ... est_labels) = mir_eval.util.merge_labeled_intervals( ... ref_intervals, ref_labels, est_intervals, est_labels) >>> durations = mir_eval.util.intervals_to_durations(intervals) >>> comparisons = mir_eval.chord.sevenths_inv(ref_labels, est_labels) >>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
- mir_eval.chord.directional_hamming_distance(reference_intervals, estimated_intervals)¶
Compute the directional hamming distance between reference and estimated intervals as defined by 5 and used for MIREX ‘OverSeg’, ‘UnderSeg’ and ‘MeanSeg’ measures.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2), dtype=float
Reference chord intervals to score against.
- estimated_intervalsnp.ndarray, shape=(m, 2), dtype=float
Estimated chord intervals to score against.
- Returns
- directional hamming distancefloat
directional hamming distance between reference intervals and estimated intervals.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> overseg = 1 - mir_eval.chord.directional_hamming_distance( ... ref_intervals, est_intervals) >>> underseg = 1 - mir_eval.chord.directional_hamming_distance( ... est_intervals, ref_intervals) >>> seg = min(overseg, underseg)
- mir_eval.chord.overseg(reference_intervals, estimated_intervals)¶
Compute the MIREX ‘OverSeg’ score.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2), dtype=float
Reference chord intervals to score against.
- estimated_intervalsnp.ndarray, shape=(m, 2), dtype=float
Estimated chord intervals to score against.
- Returns
- oversegmentation scorefloat
Comparison score, in [0.0, 1.0], where 1.0 means no oversegmentation.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> score = mir_eval.chord.overseg(ref_intervals, est_intervals)
- mir_eval.chord.underseg(reference_intervals, estimated_intervals)¶
Compute the MIREX ‘UnderSeg’ score.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2), dtype=float
Reference chord intervals to score against.
- estimated_intervalsnp.ndarray, shape=(m, 2), dtype=float
Estimated chord intervals to score against.
- Returns
- undersegmentation scorefloat
Comparison score, in [0.0, 1.0], where 1.0 means no undersegmentation.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> score = mir_eval.chord.underseg(ref_intervals, est_intervals)
- mir_eval.chord.seg(reference_intervals, estimated_intervals)¶
Compute the MIREX ‘MeanSeg’ score.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2), dtype=float
Reference chord intervals to score against.
- estimated_intervalsnp.ndarray, shape=(m, 2), dtype=float
Estimated chord intervals to score against.
- Returns
- segmentation scorefloat
Comparison score, in [0.0, 1.0], where 1.0 means perfect segmentation.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> score = mir_eval.chord.seg(ref_intervals, est_intervals)
- mir_eval.chord.merge_chord_intervals(intervals, labels)¶
Merge consecutive chord intervals if they represent the same chord.
- Parameters
- intervalsnp.ndarray, shape=(n, 2), dtype=float
Chord intervals to be merged, in the format returned by
mir_eval.io.load_labeled_intervals()
.- labelslist, shape=(n,)
Chord labels to be merged, in the format returned by
mir_eval.io.load_labeled_intervals()
.
- Returns
- merged_ivsnp.ndarray, shape=(k, 2), dtype=float
Merged chord intervals, k <= n
- mir_eval.chord.evaluate(ref_intervals, ref_labels, est_intervals, est_labels, **kwargs)¶
Computes weighted accuracy for all comparison functions for the given reference and estimated annotations.
- Parameters
- ref_intervalsnp.ndarray, shape=(n, 2)
Reference chord intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- ref_labelslist, shape=(n,)
reference chord labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- est_intervalsnp.ndarray, shape=(m, 2)
estimated chord intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- est_labelslist, shape=(m,)
estimated chord labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> scores = mir_eval.chord.evaluate(ref_intervals, ref_labels, ... est_intervals, est_labels)
mir_eval.melody
¶
Melody extraction algorithms aim to produce a sequence of frequency values corresponding to the pitch of the dominant melody from a musical recording. For evaluation, an estimated pitch series is evaluated against a reference based on whether the voicing (melody present or not) and the pitch is correct (within some tolerance).
- For a detailed explanation of the measures please refer to:
J. Salamon, E. Gomez, D. P. W. Ellis and G. Richard, “Melody Extraction from Polyphonic Music Signals: Approaches, Applications and Challenges”, IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.
- and:
G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gomez, S. Streich, and B. Ong. “Melody transcription from music audio: Approaches and evaluation”, IEEE Transactions on Audio, Speech, and Language Processing, 15(4):1247-1256, 2007.
For an explanation of the generalized measures (using non-binary voicings), please refer to:
R. Bittner and J. Bosch, “Generalized Metrics for Single-F0 Estimation Evaluation”, International Society for Music Information Retrieval Conference (ISMIR), 2019.
Conventions¶
Melody annotations are assumed to be given in the format of a 1d array of frequency values which are accompanied by a 1d array of times denoting when each frequency value occurs. In a reference melody time series, a frequency value of 0 denotes “unvoiced”. In a estimated melody time series, unvoiced frames can be indicated either by 0 Hz or by a negative Hz value - negative values represent the algorithm’s pitch estimate for frames it has determined as unvoiced, in case they are in fact voiced.
Metrics are computed using a sequence of reference and estimated pitches in
cents and voicing arrays, both of which are sampled to the same
timebase. The function mir_eval.melody.to_cent_voicing()
can be used to
convert a sequence of estimated and reference times and frequency values in Hz
to voicing arrays and frequency arrays in the format required by the
metric functions. By default, the convention is to resample the estimated
melody time series to the reference melody time series’ timebase.
Metrics¶
mir_eval.melody.voicing_measures()
: Voicing measures, including the recall rate (proportion of frames labeled as melody frames in the reference that are estimated as melody frames) and the false alarm rate (proportion of frames labeled as non-melody in the reference that are mistakenly estimated as melody frames)mir_eval.melody.raw_pitch_accuracy()
: Raw Pitch Accuracy, which computes the proportion of melody frames in the reference for which the frequency is considered correct (i.e. within half a semitone of the reference frequency)mir_eval.melody.raw_chroma_accuracy()
: Raw Chroma Accuracy, where the estimated and reference frequency sequences are mapped onto a single octave before computing the raw pitch accuracymir_eval.melody.overall_accuracy()
: Overall Accuracy, which computes the proportion of all frames correctly estimated by the algorithm, including whether non-melody frames where labeled by the algorithm as non-melody
- mir_eval.melody.validate_voicing(ref_voicing, est_voicing)¶
Checks that voicing inputs to a metric are in the correct format.
- Parameters
- ref_voicingnp.ndarray
Reference voicing array
- est_voicingnp.ndarray
Estimated voicing array
- mir_eval.melody.validate(ref_voicing, ref_cent, est_voicing, est_cent)¶
Checks that voicing and frequency arrays are well-formed. To be used in conjunction with
mir_eval.melody.validate_voicing()
- Parameters
- ref_voicingnp.ndarray
Reference voicing array
- ref_centnp.ndarray
Reference pitch sequence in cents
- est_voicingnp.ndarray
Estimated voicing array
- est_centnp.ndarray
Estimate pitch sequence in cents
- mir_eval.melody.hz2cents(freq_hz, base_frequency=10.0)¶
Convert an array of frequency values in Hz to cents. 0 values are left in place.
- Parameters
- freq_hznp.ndarray
Array of frequencies in Hz.
- base_frequencyfloat
Base frequency for conversion. (Default value = 10.0)
- Returns
- ——-
- freq_centnp.ndarray
Array of frequencies in cents, relative to base_frequency
- mir_eval.melody.freq_to_voicing(frequencies, voicing=None)¶
Convert from an array of frequency values to frequency array + voice/unvoiced array
- Parameters
- frequenciesnp.ndarray
Array of frequencies. A frequency <= 0 indicates “unvoiced”.
- voicingnp.ndarray
Array of voicing values. (Default value = None) Default None, which means the voicing is inferred from frequencies:
frames with frequency <= 0.0 are considered “unvoiced” frames with frequency > 0.0 are considered “voiced”
If specified, voicing is used as the voicing array, but frequencies with value 0 are forced to have 0 voicing.
Voicing inferred by negative frequency values is ignored.
- Returns
- frequenciesnp.ndarray
Array of frequencies, all >= 0.
- voicednp.ndarray
Array of voicings between 0 and 1, same length as frequencies, which indicates voiced or unvoiced
- mir_eval.melody.constant_hop_timebase(hop, end_time)¶
Generates a time series from 0 to
end_time
with times spacedhop
apart- Parameters
- hopfloat
Spacing of samples in the time series
- end_timefloat
Time series will span
[0, end_time]
- Returns
- timesnp.ndarray
Generated timebase
- mir_eval.melody.resample_melody_series(times, frequencies, voicing, times_new, kind='linear')¶
Resamples frequency and voicing time series to a new timescale. Maintains any zero (“unvoiced”) values in frequencies.
If
times
andtimes_new
are equivalent, no resampling will be performed.- Parameters
- timesnp.ndarray
Times of each frequency value
- frequenciesnp.ndarray
Array of frequency values, >= 0
- voicingnp.ndarray
Array which indicates voiced or unvoiced. This array may be binary or have continuous values between 0 and 1.
- times_newnp.ndarray
Times to resample frequency and voicing sequences to
- kindstr
kind parameter to pass to scipy.interpolate.interp1d. (Default value = ‘linear’)
- Returns
- frequencies_resamplednp.ndarray
Frequency array resampled to new timebase
- voicing_resamplednp.ndarray
Voicing array resampled to new timebase
- mir_eval.melody.to_cent_voicing(ref_time, ref_freq, est_time, est_freq, est_voicing=None, ref_reward=None, base_frequency=10.0, hop=None, kind='linear')¶
Converts reference and estimated time/frequency (Hz) annotations to sampled frequency (cent)/voicing arrays.
A zero frequency indicates “unvoiced”.
- If est_voicing is not provided, a negative frequency indicates:
- “Predicted as unvoiced, but if it’s voiced,
this is the frequency estimate”.
- If it is provided, negative frequency values are ignored, and the voicing
from est_voicing is directly used.
- Parameters
- ref_timenp.ndarray
Time of each reference frequency value
- ref_freqnp.ndarray
Array of reference frequency values
- est_timenp.ndarray
Time of each estimated frequency value
- est_freqnp.ndarray
Array of estimated frequency values
- est_voicingnp.ndarray
Estimate voicing confidence. Default None, which means the voicing is inferred from est_freq:
frames with frequency <= 0.0 are considered “unvoiced” frames with frequency > 0.0 are considered “voiced”
- ref_rewardnp.ndarray
Reference voicing reward. Default None, which means all frames are weighted equally.
- base_frequencyfloat
Base frequency in Hz for conversion to cents (Default value = 10.)
- hopfloat
Hop size, in seconds, to resample, default None which means use ref_time
- kindstr
kind parameter to pass to scipy.interpolate.interp1d. (Default value = ‘linear’)
- Returns
- ref_voicingnp.ndarray
Resampled reference voicing array
- ref_centnp.ndarray
Resampled reference frequency (cent) array
- est_voicingnp.ndarray
Resampled estimated voicing array
- est_centnp.ndarray
Resampled estimated frequency (cent) array
- mir_eval.melody.voicing_recall(ref_voicing, est_voicing)¶
Compute the voicing recall given two voicing indicator sequences, one as reference (truth) and the other as the estimate (prediction). The sequences must be of the same length. Examples ——– >>> ref_time, ref_freq = mir_eval.io.load_time_series(‘ref.txt’) >>> est_time, est_freq = mir_eval.io.load_time_series(‘est.txt’) >>> (ref_v, ref_c, … est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time, … ref_freq, … est_time, … est_freq) >>> recall = mir_eval.melody.voicing_recall(ref_v, est_v) Parameters ———- ref_voicing : np.ndarray
Reference boolean voicing array
- est_voicingnp.ndarray
Estimated boolean voicing array
- vx_recallfloat
Voicing recall rate, the fraction of voiced frames in ref indicated as voiced in est
- mir_eval.melody.voicing_false_alarm(ref_voicing, est_voicing)¶
Compute the voicing false alarm rates given two voicing indicator sequences, one as reference (truth) and the other as the estimate (prediction). The sequences must be of the same length. Examples ——– >>> ref_time, ref_freq = mir_eval.io.load_time_series(‘ref.txt’) >>> est_time, est_freq = mir_eval.io.load_time_series(‘est.txt’) >>> (ref_v, ref_c, … est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time, … ref_freq, … est_time, … est_freq) >>> false_alarm = mir_eval.melody.voicing_false_alarm(ref_v, est_v) Parameters ———- ref_voicing : np.ndarray
Reference boolean voicing array
- est_voicingnp.ndarray
Estimated boolean voicing array
- vx_false_alarmfloat
Voicing false alarm rate, the fraction of unvoiced frames in ref indicated as voiced in est
- mir_eval.melody.voicing_measures(ref_voicing, est_voicing)¶
Compute the voicing recall and false alarm rates given two voicing indicator sequences, one as reference (truth) and the other as the estimate (prediction). The sequences must be of the same length. Examples ——– >>> ref_time, ref_freq = mir_eval.io.load_time_series(‘ref.txt’) >>> est_time, est_freq = mir_eval.io.load_time_series(‘est.txt’) >>> (ref_v, ref_c, … est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time, … ref_freq, … est_time, … est_freq) >>> recall, false_alarm = mir_eval.melody.voicing_measures(ref_v, … est_v) Parameters ———- ref_voicing : np.ndarray
Reference boolean voicing array
- est_voicingnp.ndarray
Estimated boolean voicing array
- vx_recallfloat
Voicing recall rate, the fraction of voiced frames in ref indicated as voiced in est
- vx_false_alarmfloat
Voicing false alarm rate, the fraction of unvoiced frames in ref indicated as voiced in est
- mir_eval.melody.raw_pitch_accuracy(ref_voicing, ref_cent, est_voicing, est_cent, cent_tolerance=50)¶
Compute the raw pitch accuracy given two pitch (frequency) sequences in cents and matching voicing indicator sequences. The first pitch and voicing arrays are treated as the reference (truth), and the second two as the estimate (prediction). All 4 sequences must be of the same length.
- Parameters
- ref_voicingnp.ndarray
Reference voicing array. When this array is non-binary, it is treated as a ‘reference reward’, as in (Bittner & Bosch, 2019)
- ref_centnp.ndarray
Reference pitch sequence in cents
- est_voicingnp.ndarray
Estimated voicing array
- est_centnp.ndarray
Estimate pitch sequence in cents
- cent_tolerancefloat
Maximum absolute deviation in cents for a frequency value to be considered correct (Default value = 50)
- Returns
- raw_pitchfloat
Raw pitch accuracy, the fraction of voiced frames in ref_cent for which est_cent provides a correct frequency values (within cent_tolerance cents).
Examples
>>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt') >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt') >>> (ref_v, ref_c, ... est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time, ... ref_freq, ... est_time, ... est_freq) >>> raw_pitch = mir_eval.melody.raw_pitch_accuracy(ref_v, ref_c, ... est_v, est_c)
- mir_eval.melody.raw_chroma_accuracy(ref_voicing, ref_cent, est_voicing, est_cent, cent_tolerance=50)¶
Compute the raw chroma accuracy given two pitch (frequency) sequences in cents and matching voicing indicator sequences. The first pitch and voicing arrays are treated as the reference (truth), and the second two as the estimate (prediction). All 4 sequences must be of the same length.
- Parameters
- ref_voicingnp.ndarray
Reference voicing array. When this array is non-binary, it is treated as a ‘reference reward’, as in (Bittner & Bosch, 2019)
- ref_centnp.ndarray
Reference pitch sequence in cents
- est_voicingnp.ndarray
Estimated voicing array
- est_centnp.ndarray
Estimate pitch sequence in cents
- cent_tolerancefloat
Maximum absolute deviation in cents for a frequency value to be considered correct (Default value = 50)
- Returns
- raw_chromafloat
Raw chroma accuracy, the fraction of voiced frames in ref_cent for which est_cent provides a correct frequency values (within cent_tolerance cents), ignoring octave errors
Examples
>>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt') >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt') >>> (ref_v, ref_c, ... est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time, ... ref_freq, ... est_time, ... est_freq) >>> raw_chroma = mir_eval.melody.raw_chroma_accuracy(ref_v, ref_c, ... est_v, est_c)
- mir_eval.melody.overall_accuracy(ref_voicing, ref_cent, est_voicing, est_cent, cent_tolerance=50)¶
Compute the overall accuracy given two pitch (frequency) sequences in cents and matching voicing indicator sequences. The first pitch and voicing arrays are treated as the reference (truth), and the second two as the estimate (prediction). All 4 sequences must be of the same length.
- Parameters
- ref_voicingnp.ndarray
Reference voicing array. When this array is non-binary, it is treated as a ‘reference reward’, as in (Bittner & Bosch, 2019)
- ref_centnp.ndarray
Reference pitch sequence in cents
- est_voicingnp.ndarray
Estimated voicing array
- est_centnp.ndarray
Estimate pitch sequence in cents
- cent_tolerancefloat
Maximum absolute deviation in cents for a frequency value to be considered correct (Default value = 50)
- Returns
- overall_accuracyfloat
Overall accuracy, the total fraction of correctly estimates frames, where provides a correct frequency values (within cent_tolerance).
Examples
>>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt') >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt') >>> (ref_v, ref_c, ... est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time, ... ref_freq, ... est_time, ... est_freq) >>> overall_accuracy = mir_eval.melody.overall_accuracy(ref_v, ref_c, ... est_v, est_c)
- mir_eval.melody.evaluate(ref_time, ref_freq, est_time, est_freq, est_voicing=None, ref_reward=None, **kwargs)¶
Evaluate two melody (predominant f0) transcriptions, where the first is treated as the reference (ground truth) and the second as the estimate to be evaluated (prediction).
- Parameters
- ref_timenp.ndarray
Time of each reference frequency value
- ref_freqnp.ndarray
Array of reference frequency values
- est_timenp.ndarray
Time of each estimated frequency value
- est_freqnp.ndarray
Array of estimated frequency values
- est_voicingnp.ndarray
Estimate voicing confidence. Default None, which means the voicing is inferred from est_freq:
frames with frequency <= 0.0 are considered “unvoiced” frames with frequency > 0.0 are considered “voiced”
- ref_rewardnp.ndarray
Reference pitch estimation reward. Default None, which means all frames are weighted equally.
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
References
- 6
J. Salamon, E. Gomez, D. P. W. Ellis and G. Richard, “Melody Extraction from Polyphonic Music Signals: Approaches, Applications and Challenges”, IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.
- 7
G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gomez, S. Streich, and B. Ong. “Melody transcription from music audio: Approaches and evaluation”, IEEE Transactions on Audio, Speech, and Language Processing, 15(4):1247-1256, 2007.
- 8
R. Bittner and J. Bosch, “Generalized Metrics for Single-F0 Estimation Evaluation”, International Society for Music Information Retrieval Conference (ISMIR), 2019.
Examples
>>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt') >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt') >>> scores = mir_eval.melody.evaluate(ref_time, ref_freq, ... est_time, est_freq)
mir_eval.multipitch
¶
The goal of multiple f0 (multipitch) estimation and tracking is to identify all of the active fundamental frequencies in each time frame in a complex music signal.
Conventions¶
Multipitch estimates are represented by a timebase and a corresponding list of arrays of frequency estimates. Frequency estimates may have any number of frequency values, including 0 (represented by an empty array). Time values are in units of seconds and frequency estimates are in units of Hz.
The timebase of the estimate time series should ideally match the timebase of the reference time series, but if this is not the case, the estimate time series is resampled using a nearest neighbor interpolation to match the estimate. Time values in the estimate time series that are outside of the range of the reference time series are given null (empty array) frequencies.
By default, a frequency is “correct” if it is within 0.5 semitones of a reference frequency. Frequency values are compared by first mapping them to log-2 semitone space, where the distance between semitones is constant. Chroma-wrapped frequency values are computed by taking the log-2 frequency values modulo 12 to map them down to a single octave. A chroma-wrapped frequency estimate is correct if it’s single-octave value is within 0.5 semitones of the single-octave reference frequency.
Metrics¶
mir_eval.multipitch.metrics()
: Precision, Recall, Accuracy, Substitution, Miss, False Alarm, and Total Error scores based both on raw frequency values and values mapped to a single octave (chroma).
References¶
- 9
G. E. Poliner, and D. P. W. Ellis, “A Discriminative Model for Polyphonic Piano Transription”, EURASIP Journal on Advances in Signal Processing, 2007(1):154-163, Jan. 2007.
- 10
Bay, M., Ehmann, A. F., & Downie, J. S. (2009). Evaluation of Multiple-F0 Estimation and Tracking Systems. In ISMIR (pp. 315-320).
- mir_eval.multipitch.validate(ref_time, ref_freqs, est_time, est_freqs)¶
Checks that the time and frequency inputs are well-formed.
- Parameters
- ref_timenp.ndarray
reference time stamps in seconds
- ref_freqslist of np.ndarray
reference frequencies in Hz
- est_timenp.ndarray
estimate time stamps in seconds
- est_freqslist of np.ndarray
estimated frequencies in Hz
- mir_eval.multipitch.resample_multipitch(times, frequencies, target_times)¶
Resamples multipitch time series to a new timescale. Values in
target_times
outside the range oftimes
return no pitch estimate.- Parameters
- timesnp.ndarray
Array of time stamps
- frequencieslist of np.ndarray
List of np.ndarrays of frequency values
- target_timesnp.ndarray
Array of target time stamps
- Returns
- frequencies_resampledlist of numpy arrays
Frequency list of lists resampled to new timebase
- mir_eval.multipitch.frequencies_to_midi(frequencies, ref_frequency=440.0)¶
Converts frequencies to continuous MIDI values.
- Parameters
- frequencieslist of np.ndarray
Original frequency values
- ref_frequencyfloat
reference frequency in Hz.
- Returns
- frequencies_midilist of np.ndarray
Continuous MIDI frequency values.
- mir_eval.multipitch.midi_to_chroma(frequencies_midi)¶
Wrap MIDI frequencies to a single octave (chroma).
- Parameters
- frequencies_midilist of np.ndarray
Continuous MIDI note frequency values.
- Returns
- frequencies_chromalist of np.ndarray
Midi values wrapped to one octave.
- mir_eval.multipitch.compute_num_freqs(frequencies)¶
Computes the number of frequencies for each time point.
- Parameters
- frequencieslist of np.ndarray
Frequency values
- Returns
- num_freqsnp.ndarray
Number of frequencies at each time point.
- mir_eval.multipitch.compute_num_true_positives(ref_freqs, est_freqs, window=0.5, chroma=False)¶
Compute the number of true positives in an estimate given a reference. A frequency is correct if it is within a quartertone of the correct frequency.
- Parameters
- ref_freqslist of np.ndarray
reference frequencies (MIDI)
- est_freqslist of np.ndarray
estimated frequencies (MIDI)
- windowfloat
Window size, in semitones
- chromabool
If True, computes distances modulo n. If True,
ref_freqs
andest_freqs
should be wrapped modulo n.
- Returns
- true_positivesnp.ndarray
Array the same length as ref_freqs containing the number of true positives.
- mir_eval.multipitch.compute_accuracy(true_positives, n_ref, n_est)¶
Compute accuracy metrics.
- Parameters
- true_positivesnp.ndarray
Array containing the number of true positives at each time point.
- n_refnp.ndarray
Array containing the number of reference frequencies at each time point.
- n_estnp.ndarray
Array containing the number of estimate frequencies at each time point.
- Returns
- precisionfloat
sum(true_positives)/sum(n_est)
- recallfloat
sum(true_positives)/sum(n_ref)
- accfloat
sum(true_positives)/sum(n_est + n_ref - true_positives)
- mir_eval.multipitch.compute_err_score(true_positives, n_ref, n_est)¶
Compute error score metrics.
- Parameters
- true_positivesnp.ndarray
Array containing the number of true positives at each time point.
- n_refnp.ndarray
Array containing the number of reference frequencies at each time point.
- n_estnp.ndarray
Array containing the number of estimate frequencies at each time point.
- Returns
- e_subfloat
Substitution error
- e_missfloat
Miss error
- e_fafloat
False alarm error
- e_totfloat
Total error
- mir_eval.multipitch.metrics(ref_time, ref_freqs, est_time, est_freqs, **kwargs)¶
Compute multipitch metrics. All metrics are computed at the ‘macro’ level such that the frame true positive/false positive/false negative rates are summed across time and the metrics are computed on the combined values.
- Parameters
- ref_timenp.ndarray
Time of each reference frequency value
- ref_freqslist of np.ndarray
List of np.ndarrays of reference frequency values
- est_timenp.ndarray
Time of each estimated frequency value
- est_freqslist of np.ndarray
List of np.ndarrays of estimate frequency values
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- precisionfloat
Precision (TP/(TP + FP))
- recallfloat
Recall (TP/(TP + FN))
- accuracyfloat
Accuracy (TP/(TP + FP + FN))
- e_subfloat
Substitution error
- e_missfloat
Miss error
- e_fafloat
False alarm error
- e_totfloat
Total error
- precision_chromafloat
Chroma precision
- recall_chromafloat
Chroma recall
- accuracy_chromafloat
Chroma accuracy
- e_sub_chromafloat
Chroma substitution error
- e_miss_chromafloat
Chroma miss error
- e_fa_chromafloat
Chroma false alarm error
- e_tot_chromafloat
Chroma total error
Examples
>>> ref_time, ref_freqs = mir_eval.io.load_ragged_time_series( ... 'reference.txt') >>> est_time, est_freqs = mir_eval.io.load_ragged_time_series( ... 'estimated.txt') >>> metris_tuple = mir_eval.multipitch.metrics( ... ref_time, ref_freqs, est_time, est_freqs)
- mir_eval.multipitch.evaluate(ref_time, ref_freqs, est_time, est_freqs, **kwargs)¶
Evaluate two multipitch (multi-f0) transcriptions, where the first is treated as the reference (ground truth) and the second as the estimate to be evaluated (prediction).
- Parameters
- ref_timenp.ndarray
Time of each reference frequency value
- ref_freqslist of np.ndarray
List of np.ndarrays of reference frequency values
- est_timenp.ndarray
Time of each estimated frequency value
- est_freqslist of np.ndarray
List of np.ndarrays of estimate frequency values
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
Examples
>>> ref_time, ref_freq = mir_eval.io.load_ragged_time_series('ref.txt') >>> est_time, est_freq = mir_eval.io.load_ragged_time_series('est.txt') >>> scores = mir_eval.multipitch.evaluate(ref_time, ref_freq, ... est_time, est_freq)
mir_eval.onset
¶
The goal of an onset detection algorithm is to automatically determine when notes are played in a piece of music. The primary method used to evaluate onset detectors is to first determine which estimated onsets are “correct”, where correctness is defined as being within a small window of a reference onset.
Based in part on this script:
Conventions¶
Onsets should be provided in the form of a 1-dimensional array of onset times in seconds in increasing order.
Metrics¶
mir_eval.onset.f_measure()
: Precision, Recall, and F-measure scores based on the number of esimated onsets which are sufficiently close to reference onsets.
- mir_eval.onset.validate(reference_onsets, estimated_onsets)¶
Checks that the input annotations to a metric look like valid onset time arrays, and throws helpful errors if not.
- Parameters
- reference_onsetsnp.ndarray
reference onset locations, in seconds
- estimated_onsetsnp.ndarray
estimated onset locations, in seconds
- mir_eval.onset.f_measure(reference_onsets, estimated_onsets, window=0.05)¶
Compute the F-measure of correct vs incorrectly predicted onsets. “Corectness” is determined over a small window.
- Parameters
- reference_onsetsnp.ndarray
reference onset locations, in seconds
- estimated_onsetsnp.ndarray
estimated onset locations, in seconds
- windowfloat
Window size, in seconds (Default value = .05)
- Returns
- f_measurefloat
2*precision*recall/(precision + recall)
- precisionfloat
(# true positives)/(# true positives + # false positives)
- recallfloat
(# true positives)/(# true positives + # false negatives)
Examples
>>> reference_onsets = mir_eval.io.load_events('reference.txt') >>> estimated_onsets = mir_eval.io.load_events('estimated.txt') >>> F, P, R = mir_eval.onset.f_measure(reference_onsets, ... estimated_onsets)
- mir_eval.onset.evaluate(reference_onsets, estimated_onsets, **kwargs)¶
Compute all metrics for the given reference and estimated annotations.
- Parameters
- reference_onsetsnp.ndarray
reference onset locations, in seconds
- estimated_onsetsnp.ndarray
estimated onset locations, in seconds
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
Examples
>>> reference_onsets = mir_eval.io.load_events('reference.txt') >>> estimated_onsets = mir_eval.io.load_events('estimated.txt') >>> scores = mir_eval.onset.evaluate(reference_onsets, ... estimated_onsets)
mir_eval.pattern
¶
Pattern discovery involves the identification of musical patterns (i.e. short fragments or melodic ideas that repeat at least twice) both from audio and symbolic representations. The metrics used to evaluate pattern discovery systems attempt to quantify the ability of the algorithm to not only determine the present patterns in a piece, but also to find all of their occurrences.
- Based on the methods described here:
T. Collins. MIREX task: Discovery of repeated themes & sections. http://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_&_Sections, 2013.
Conventions¶
The input format can be automatically generated by calling
mir_eval.io.load_patterns()
. This format is a list of a list of
tuples. The first list collections patterns, each of which is a list of
occurences, and each occurrence is a list of MIDI onset tuples of
(onset_time, mid_note)
A pattern is a list of occurrences. The first occurrence must be the prototype of that pattern (i.e. the most representative of all the occurrences). An occurrence is a list of tuples containing the onset time and the midi note number.
Metrics¶
mir_eval.pattern.standard_FPR()
: Strict metric in order to find the possibly transposed patterns of exact length. This is the only metric that considers transposed patterns.mir_eval.pattern.establishment_FPR()
: Evaluates the amount of patterns that were successfully identified by the estimated results, no matter how many occurrences they found. In other words, this metric captures how the algorithm successfully established that a pattern repeated at least twice, and this pattern is also found in the reference annotation.mir_eval.pattern.occurrence_FPR()
: Evaluation of how well an estimation can effectively identify all the occurrences of the found patterns, independently of how many patterns have been discovered. This metric has a threshold parameter that indicates how similar two occurrences must be in order to be considered equal. In MIREX, this evaluation is run twice, with thresholds .75 and .5.mir_eval.pattern.three_layer_FPR()
: Aims to evaluate the general similarity between the reference and the estimations, combining both the establishment of patterns and the retrieval of its occurrences in a single F1 score.mir_eval.pattern.first_n_three_layer_P()
: Computes the three-layer precision for the first N patterns only in order to measure the ability of the algorithm to sort the identified patterns based on their relevance.mir_eval.pattern.first_n_target_proportion_R()
: Computes the target proportion recall for the first N patterns only in order to measure the ability of the algorithm to sort the identified patterns based on their relevance.
- mir_eval.pattern.validate(reference_patterns, estimated_patterns)¶
Checks that the input annotations to a metric look like valid pattern lists, and throws helpful errors if not.
- Parameters
- reference_patternslist
The reference patterns using the format returned by
mir_eval.io.load_patterns()
- estimated_patternslist
The estimated patterns in the same format
- Returns
- mir_eval.pattern.standard_FPR(reference_patterns, estimated_patterns, tol=1e-05)¶
Standard F1 Score, Precision and Recall.
This metric checks if the prototype patterns of the reference match possible translated patterns in the prototype patterns of the estimations. Since the sizes of these prototypes must be equal, this metric is quite restictive and it tends to be 0 in most of 2013 MIREX results.
- Parameters
- reference_patternslist
The reference patterns using the format returned by
mir_eval.io.load_patterns()
- estimated_patternslist
The estimated patterns in the same format
- tolfloat
Tolerance level when comparing reference against estimation. Default parameter is the one found in the original matlab code by Tom Collins used for MIREX 2013. (Default value = 1e-5)
- Returns
- f_measurefloat
The standard F1 Score
- precisionfloat
The standard Precision
- recallfloat
The standard Recall
Examples
>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt") >>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt") >>> F, P, R = mir_eval.pattern.standard_FPR(ref_patterns, est_patterns)
- mir_eval.pattern.establishment_FPR(reference_patterns, estimated_patterns, similarity_metric='cardinality_score')¶
Establishment F1 Score, Precision and Recall.
- Parameters
- reference_patternslist
The reference patterns in the format returned by
mir_eval.io.load_patterns()
- estimated_patternslist
The estimated patterns in the same format
- similarity_metricstr
A string representing the metric to be used when computing the similarity matrix. Accepted values:
“cardinality_score”: Count of the intersection between occurrences.
(Default value = “cardinality_score”)
- Returns
- f_measurefloat
The establishment F1 Score
- precisionfloat
The establishment Precision
- recallfloat
The establishment Recall
Examples
>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt") >>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt") >>> F, P, R = mir_eval.pattern.establishment_FPR(ref_patterns, ... est_patterns)
- mir_eval.pattern.occurrence_FPR(reference_patterns, estimated_patterns, thres=0.75, similarity_metric='cardinality_score')¶
Establishment F1 Score, Precision and Recall.
- Parameters
- reference_patternslist
The reference patterns in the format returned by
mir_eval.io.load_patterns()
- estimated_patternslist
The estimated patterns in the same format
- thresfloat
How similar two occcurrences must be in order to be considered equal (Default value = .75)
- similarity_metricstr
A string representing the metric to be used when computing the similarity matrix. Accepted values:
“cardinality_score”: Count of the intersection between occurrences.
(Default value = “cardinality_score”)
- Returns
- f_measurefloat
The establishment F1 Score
- precisionfloat
The establishment Precision
- recallfloat
The establishment Recall
Examples
>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt") >>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt") >>> F, P, R = mir_eval.pattern.occurrence_FPR(ref_patterns, ... est_patterns)
- mir_eval.pattern.three_layer_FPR(reference_patterns, estimated_patterns)¶
Three Layer F1 Score, Precision and Recall. As described by Meridith.
- Parameters
- reference_patternslist
The reference patterns in the format returned by
mir_eval.io.load_patterns()
- estimated_patternslist
The estimated patterns in the same format
- Returns
- f_measurefloat
The three-layer F1 Score
- precisionfloat
The three-layer Precision
- recallfloat
The three-layer Recall
Examples
>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt") >>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt") >>> F, P, R = mir_eval.pattern.three_layer_FPR(ref_patterns, ... est_patterns)
- mir_eval.pattern.first_n_three_layer_P(reference_patterns, estimated_patterns, n=5)¶
First n three-layer precision.
This metric is basically the same as the three-layer FPR but it is only applied to the first n estimated patterns, and it only returns the precision. In MIREX and typically, n = 5.
- Parameters
- reference_patternslist
The reference patterns in the format returned by
mir_eval.io.load_patterns()
- estimated_patternslist
The estimated patterns in the same format
- nint
Number of patterns to consider from the estimated results, in the order they appear in the matrix (Default value = 5)
- Returns
- precisionfloat
The first n three-layer Precision
Examples
>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt") >>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt") >>> P = mir_eval.pattern.first_n_three_layer_P(ref_patterns, ... est_patterns, n=5)
- mir_eval.pattern.first_n_target_proportion_R(reference_patterns, estimated_patterns, n=5)¶
First n target proportion establishment recall metric.
This metric is similar is similar to the establishment FPR score, but it only takes into account the first n estimated patterns and it only outputs the Recall value of it.
- Parameters
- reference_patternslist
The reference patterns in the format returned by
mir_eval.io.load_patterns()
- estimated_patternslist
The estimated patterns in the same format
- nint
Number of patterns to consider from the estimated results, in the order they appear in the matrix. (Default value = 5)
- Returns
- recallfloat
The first n target proportion Recall.
Examples
>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt") >>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt") >>> R = mir_eval.pattern.first_n_target_proportion_R( ... ref_patterns, est_patterns, n=5)
- mir_eval.pattern.evaluate(ref_patterns, est_patterns, **kwargs)¶
Load data and perform the evaluation.
- Parameters
- ref_patternslist
The reference patterns in the format returned by
mir_eval.io.load_patterns()
- est_patternslist
The estimated patterns in the same format
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
Examples
>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt") >>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt") >>> scores = mir_eval.pattern.evaluate(ref_patterns, est_patterns)
mir_eval.segment
¶
Evaluation criteria for structural segmentation fall into two categories: boundary annotation and structural annotation. Boundary annotation is the task of predicting the times at which structural changes occur, such as when a verse transitions to a refrain. Metrics for boundary annotation compare estimated segment boundaries to reference boundaries. Structural annotation is the task of assigning labels to detected segments. The estimated labels may be arbitrary strings - such as A, B, C, - and they need not describe functional concepts. Metrics for structural annotation are similar to those used for clustering data.
Conventions¶
Both boundary and structural annotation metrics require two dimensional arrays
with two columns, one for boundary start times and one for boundary end times.
Structural annotation further require lists of reference and estimated segment
labels which must have a length which is equal to the number of rows in the
corresponding list of boundary edges. In both tasks, we assume that
annotations express a partitioning of the track into intervals. The function
mir_eval.util.adjust_intervals()
can be used to pad or crop the segment
boundaries to span the duration of the entire track.
Metrics¶
mir_eval.segment.detection()
: An estimated boundary is considered correct if it falls within a window around a reference boundary 11mir_eval.segment.deviation()
: Computes the median absolute time difference from a reference boundary to its nearest estimated boundary, and vice versa 11mir_eval.segment.pairwise()
: For classifying pairs of sampled time instants as belonging to the same structural component 12mir_eval.segment.rand_index()
: Clusters reference and estimated annotations and compares them by the Rand Indexmir_eval.segment.ari()
: Computes the Rand index, adjusted for chancemir_eval.segment.nce()
: Interprets sampled reference and estimated labels as samples of random variables Y_R, Y_E from which the conditional entropy of Y_R given Y_E (Under-Segmentation) and Y_E given Y_R (Over-Segmentation) are estimated 13mir_eval.segment.mutual_information()
: Computes the standard, normalized, and adjusted mutual information of sampled reference and estimated segmentsmir_eval.segment.vmeasure()
: Computes the V-Measure, which is similar to the conditional entropy metrics, but uses the marginal distributions as normalization rather than the maximum entropy distribution 14
References¶
- 11(1,2)
Turnbull, D., Lanckriet, G. R., Pampalk, E., & Goto, M. A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting. In ISMIR (pp. 51-54).
- 12
Levy, M., & Sandler, M. Structural segmentation of musical audio by constrained clustering. IEEE transactions on audio, speech, and language processing, 16(2), 318-326.
- 13
Lukashevich, H. M. Towards Quantitative Measures of Evaluating Song Segmentation. In ISMIR (pp. 375-380).
- 14
Rosenberg, A., & Hirschberg, J. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In EMNLP-CoNLL (Vol. 7, pp. 410-420).
- mir_eval.segment.validate_boundary(reference_intervals, estimated_intervals, trim)¶
Checks that the input annotations to a segment boundary estimation metric (i.e. one that only takes in segment intervals) look like valid segment times, and throws helpful errors if not.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
.- trimbool
will the start and end events be trimmed?
- mir_eval.segment.validate_structure(reference_intervals, reference_labels, estimated_intervals, estimated_labels)¶
Checks that the input annotations to a structure estimation metric (i.e. one that takes in both segment boundaries and their labels) look like valid segment times and labels, and throws helpful errors if not.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- reference_labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_labelslist, shape=(m,)
estimated segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.
- mir_eval.segment.detection(reference_intervals, estimated_intervals, window=0.5, beta=1.0, trim=False)¶
Boundary detection hit-rate.
A hit is counted whenever an reference boundary is within
window
of a estimated boundary. Note that each boundary is matched at most once: this is achieved by computing the size of a maximal matching between reference and estimated boundary points, subject to the window constraint.- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
.- windowfloat > 0
size of the window of ‘correctness’ around ground-truth beats (in seconds) (Default value = 0.5)
- betafloat > 0
weighting constant for F-measure. (Default value = 1.0)
- trimboolean
if
True
, the first and last boundary times are ignored. Typically, these denote start (0) and end-markers. (Default value = False)
- Returns
- precisionfloat
precision of estimated predictions
- recallfloat
recall of reference reference boundaries
- f_measurefloat
F-measure (weighted harmonic mean of
precision
andrecall
)
Examples
>>> ref_intervals, _ = mir_eval.io.load_labeled_intervals('ref.lab') >>> est_intervals, _ = mir_eval.io.load_labeled_intervals('est.lab') >>> # With 0.5s windowing >>> P05, R05, F05 = mir_eval.segment.detection(ref_intervals, ... est_intervals, ... window=0.5) >>> # With 3s windowing >>> P3, R3, F3 = mir_eval.segment.detection(ref_intervals, ... est_intervals, ... window=3) >>> # Ignoring hits for the beginning and end of track >>> P, R, F = mir_eval.segment.detection(ref_intervals, ... est_intervals, ... window=0.5, ... trim=True)
- mir_eval.segment.deviation(reference_intervals, estimated_intervals, trim=False)¶
Compute the median deviations between reference and estimated boundary times.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
.- trimboolean
if
True
, the first and last intervals are ignored. Typically, these denote start (0.0) and end-of-track markers. (Default value = False)
- Returns
- reference_to_estimatedfloat
median time from each reference boundary to the closest estimated boundary
- estimated_to_referencefloat
median time from each estimated boundary to the closest reference boundary
Examples
>>> ref_intervals, _ = mir_eval.io.load_labeled_intervals('ref.lab') >>> est_intervals, _ = mir_eval.io.load_labeled_intervals('est.lab') >>> r_to_e, e_to_r = mir_eval.boundary.deviation(ref_intervals, ... est_intervals)
- mir_eval.segment.pairwise(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1, beta=1.0)¶
Frame-clustering segmentation evaluation by pair-wise agreement.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- reference_labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_labelslist, shape=(m,)
estimated segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- frame_sizefloat > 0
length (in seconds) of frames for clustering (Default value = 0.1)
- betafloat > 0
beta value for F-measure (Default value = 1.0)
- Returns
- precisionfloat > 0
Precision of detecting whether frames belong in the same cluster
- recallfloat > 0
Recall of detecting whether frames belong in the same cluster
- ffloat > 0
F-measure of detecting whether frames belong in the same cluster
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> # Trim or pad the estimate to match reference timing >>> (ref_intervals, ... ref_labels) = mir_eval.util.adjust_intervals(ref_intervals, ... ref_labels, ... t_min=0) >>> (est_intervals, ... est_labels) = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, t_min=0, t_max=ref_intervals.max()) >>> precision, recall, f = mir_eval.structure.pairwise(ref_intervals, ... ref_labels, ... est_intervals, ... est_labels)
- mir_eval.segment.rand_index(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1, beta=1.0)¶
(Non-adjusted) Rand index.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- reference_labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_labelslist, shape=(m,)
estimated segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- frame_sizefloat > 0
length (in seconds) of frames for clustering (Default value = 0.1)
- betafloat > 0
beta value for F-measure (Default value = 1.0)
- Returns
- rand_indexfloat > 0
Rand index
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> # Trim or pad the estimate to match reference timing >>> (ref_intervals, ... ref_labels) = mir_eval.util.adjust_intervals(ref_intervals, ... ref_labels, ... t_min=0) >>> (est_intervals, ... est_labels) = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, t_min=0, t_max=ref_intervals.max()) >>> rand_index = mir_eval.structure.rand_index(ref_intervals, ... ref_labels, ... est_intervals, ... est_labels)
- mir_eval.segment.ari(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1)¶
Adjusted Rand Index (ARI) for frame clustering segmentation evaluation.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- reference_labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_labelslist, shape=(m,)
estimated segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- frame_sizefloat > 0
length (in seconds) of frames for clustering (Default value = 0.1)
- Returns
- ari_scorefloat > 0
Adjusted Rand index between segmentations.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> # Trim or pad the estimate to match reference timing >>> (ref_intervals, ... ref_labels) = mir_eval.util.adjust_intervals(ref_intervals, ... ref_labels, ... t_min=0) >>> (est_intervals, ... est_labels) = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, t_min=0, t_max=ref_intervals.max()) >>> ari_score = mir_eval.structure.ari(ref_intervals, ref_labels, ... est_intervals, est_labels)
- mir_eval.segment.mutual_information(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1)¶
Frame-clustering segmentation: mutual information metrics.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- reference_labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_labelslist, shape=(m,)
estimated segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- frame_sizefloat > 0
length (in seconds) of frames for clustering (Default value = 0.1)
- Returns
- MIfloat > 0
Mutual information between segmentations
- AMIfloat
Adjusted mutual information between segmentations.
- NMIfloat > 0
Normalize mutual information between segmentations
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> # Trim or pad the estimate to match reference timing >>> (ref_intervals, ... ref_labels) = mir_eval.util.adjust_intervals(ref_intervals, ... ref_labels, ... t_min=0) >>> (est_intervals, ... est_labels) = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, t_min=0, t_max=ref_intervals.max()) >>> mi, ami, nmi = mir_eval.structure.mutual_information(ref_intervals, ... ref_labels, ... est_intervals, ... est_labels)
- mir_eval.segment.nce(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1, beta=1.0, marginal=False)¶
Frame-clustering segmentation: normalized conditional entropy
Computes cross-entropy of cluster assignment, normalized by the max-entropy.
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- reference_labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_labelslist, shape=(m,)
estimated segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- frame_sizefloat > 0
length (in seconds) of frames for clustering (Default value = 0.1)
- betafloat > 0
beta for F-measure (Default value = 1.0)
- marginalbool
If False, normalize conditional entropy by uniform entropy. If True, normalize conditional entropy by the marginal entropy. (Default value = False)
- Returns
- S_over
Over-clustering score:
For marginal=False,
1 - H(y_est | y_ref) / log(|y_est|)
For marginal=True,
1 - H(y_est | y_ref) / H(y_est)
If |y_est|==1, then S_over will be 0.
- S_under
Under-clustering score:
For marginal=False,
1 - H(y_ref | y_est) / log(|y_ref|)
For marginal=True,
1 - H(y_ref | y_est) / H(y_ref)
If |y_ref|==1, then S_under will be 0.
- S_F
F-measure for (S_over, S_under)
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> # Trim or pad the estimate to match reference timing >>> (ref_intervals, ... ref_labels) = mir_eval.util.adjust_intervals(ref_intervals, ... ref_labels, ... t_min=0) >>> (est_intervals, ... est_labels) = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, t_min=0, t_max=ref_intervals.max()) >>> S_over, S_under, S_F = mir_eval.structure.nce(ref_intervals, ... ref_labels, ... est_intervals, ... est_labels)
- mir_eval.segment.vmeasure(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1, beta=1.0)¶
Frame-clustering segmentation: v-measure
Computes cross-entropy of cluster assignment, normalized by the marginal-entropy.
This is equivalent to nce(…, marginal=True).
- Parameters
- reference_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- reference_labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- estimated_labelslist, shape=(m,)
estimated segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- frame_sizefloat > 0
length (in seconds) of frames for clustering (Default value = 0.1)
- betafloat > 0
beta for F-measure (Default value = 1.0)
- Returns
- V_precision
Over-clustering score:
1 - H(y_est | y_ref) / H(y_est)
If |y_est|==1, then V_precision will be 0.
- V_recall
Under-clustering score:
1 - H(y_ref | y_est) / H(y_ref)
If |y_ref|==1, then V_recall will be 0.
- V_F
F-measure for (V_precision, V_recall)
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> # Trim or pad the estimate to match reference timing >>> (ref_intervals, ... ref_labels) = mir_eval.util.adjust_intervals(ref_intervals, ... ref_labels, ... t_min=0) >>> (est_intervals, ... est_labels) = mir_eval.util.adjust_intervals( ... est_intervals, est_labels, t_min=0, t_max=ref_intervals.max()) >>> V_precision, V_recall, V_F = mir_eval.structure.vmeasure(ref_intervals, ... ref_labels, ... est_intervals, ... est_labels)
- mir_eval.segment.evaluate(ref_intervals, ref_labels, est_intervals, est_labels, **kwargs)¶
Compute all metrics for the given reference and estimated annotations.
- Parameters
- ref_intervalsnp.ndarray, shape=(n, 2)
reference segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- ref_labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- est_intervalsnp.ndarray, shape=(m, 2)
estimated segment intervals, in the format returned by
mir_eval.io.load_labeled_intervals()
.- est_labelslist, shape=(m,)
estimated segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
Examples
>>> (ref_intervals, ... ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab') >>> (est_intervals, ... est_labels) = mir_eval.io.load_labeled_intervals('est.lab') >>> scores = mir_eval.segment.evaluate(ref_intervals, ref_labels, ... est_intervals, est_labels)
mir_eval.hierarchy
¶
Evaluation criteria for hierarchical structure analysis.
Hierarchical structure analysis seeks to annotate a track with a nested
decomposition of the temporal elements of the piece, effectively providing
a kind of “parse tree” of the composition. Unlike the flat segmentation
metrics defined in mir_eval.segment
, which can only encode one level of
analysis, hierarchical annotations expose the relationships between short
segments and the larger compositional elements to which they belong.
Conventions¶
Annotations are assumed to take the form of an ordered list of segmentations.
As in the mir_eval.segment
metrics, each segmentation itself consists of
an n-by-2 array of interval times, so that the i
th segment spans time
intervals[i, 0]
to intervals[i, 1]
.
Hierarchical annotations are ordered by increasing specificity, so that the first segmentation should contain the fewest segments, and the last segmentation contains the most.
Metrics¶
mir_eval.hierarchy.tmeasure()
: Precision, recall, and F-measure of triplet-based frame accuracy for boundary detection.mir_eval.hierarchy.lmeasure()
: Precision, recall, and F-measure of triplet-based frame accuracy for segment labeling.
References¶
- 15
Brian McFee, Oriol Nieto, and Juan P. Bello. “Hierarchical evaluation of segment boundary detection”, International Society for Music Information Retrieval (ISMIR) conference, 2015.
- 16
Brian McFee, Oriol Nieto, Morwaread Farbood, and Juan P. Bello. “Evaluating hierarchical structure in music annotations”, Frontiers in Psychology, 2017.
- mir_eval.hierarchy.validate_hier_intervals(intervals_hier)¶
Validate a hierarchical segment annotation.
- Parameters
- intervals_hierordered list of segmentations
- Raises
- ValueError
If any segmentation does not span the full duration of the top-level segmentation.
If any segmentation does not start at 0.
- mir_eval.hierarchy.tmeasure(reference_intervals_hier, estimated_intervals_hier, transitive=False, window=15.0, frame_size=0.1, beta=1.0)¶
Computes the tree measures for hierarchical segment annotations.
- Parameters
- reference_intervals_hierlist of ndarray
reference_intervals_hier[i]
contains the segment intervals (in seconds) for thei
th layer of the annotations. Layers are ordered from top to bottom, so that the last list of intervals should be the most specific.- estimated_intervals_hierlist of ndarray
Like
reference_intervals_hier
but for the estimated annotation- transitivebool
whether to compute the t-measures using transitivity or not.
- windowfloat > 0
size of the window (in seconds). For each query frame q, result frames are only counted within q +- window.
- frame_sizefloat > 0
length (in seconds) of frames. The frame size cannot be longer than the window.
- betafloat > 0
beta parameter for the F-measure.
- Returns
- t_precisionnumber [0, 1]
T-measure Precision
- t_recallnumber [0, 1]
T-measure Recall
- t_measurenumber [0, 1]
F-beta measure for
(t_precision, t_recall)
- Raises
- ValueError
If either of the input hierarchies are inconsistent
If the input hierarchies have different time durations
If
frame_size > window
orframe_size <= 0
- mir_eval.hierarchy.lmeasure(reference_intervals_hier, reference_labels_hier, estimated_intervals_hier, estimated_labels_hier, frame_size=0.1, beta=1.0)¶
Computes the tree measures for hierarchical segment annotations.
- Parameters
- reference_intervals_hierlist of ndarray
reference_intervals_hier[i]
contains the segment intervals (in seconds) for thei
th layer of the annotations. Layers are ordered from top to bottom, so that the last list of intervals should be the most specific.- reference_labels_hierlist of list of str
reference_labels_hier[i]
contains the segment labels for the ``i``th layer of the annotations- estimated_intervals_hierlist of ndarray
- estimated_labels_hierlist of ndarray
Like
reference_intervals_hier
andreference_labels_hier
but for the estimated annotation- frame_sizefloat > 0
length (in seconds) of frames. The frame size cannot be longer than the window.
- betafloat > 0
beta parameter for the F-measure.
- Returns
- l_precisionnumber [0, 1]
L-measure Precision
- l_recallnumber [0, 1]
L-measure Recall
- l_measurenumber [0, 1]
F-beta measure for
(l_precision, l_recall)
- Raises
- ValueError
If either of the input hierarchies are inconsistent
If the input hierarchies have different time durations
If
frame_size > window
orframe_size <= 0
- mir_eval.hierarchy.evaluate(ref_intervals_hier, ref_labels_hier, est_intervals_hier, est_labels_hier, **kwargs)¶
Compute all hierarchical structure metrics for the given reference and estimated annotations.
- Parameters
- ref_intervals_hierlist of list-like
- ref_labels_hierlist of list of str
- est_intervals_hierlist of list-like
- est_labels_hierlist of list of str
Hierarchical annotations are encoded as an ordered list of segmentations. Each segmentation itself is a list (or list-like) of intervals (*_intervals_hier) and a list of lists of labels (*_labels_hier).
- kwargs
additional keyword arguments to the evaluation metrics.
- Returns
- scoresOrderedDict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
T-measures are computed in both the “full” (
transitive=True
) and “reduced” (transitive=False
) modes.
- Raises
- ValueError
Thrown when the provided annotations are not valid.
Examples
A toy example with two two-layer annotations
>>> ref_i = [[[0, 30], [30, 60]], [[0, 15], [15, 30], [30, 45], [45, 60]]] >>> est_i = [[[0, 45], [45, 60]], [[0, 15], [15, 30], [30, 45], [45, 60]]] >>> ref_l = [ ['A', 'B'], ['a', 'b', 'a', 'c'] ] >>> est_l = [ ['A', 'B'], ['a', 'a', 'b', 'b'] ] >>> scores = mir_eval.hierarchy.evaluate(ref_i, ref_l, est_i, est_l) >>> dict(scores) {'T-Measure full': 0.94822745804853459, 'T-Measure reduced': 0.8732458222764804, 'T-Precision full': 0.96569179094693058, 'T-Precision reduced': 0.89939075137018787, 'T-Recall full': 0.93138358189386117, 'T-Recall reduced': 0.84857799953694923}
A more realistic example, using SALAMI pre-parsed annotations
>>> def load_salami(filename): ... "load SALAMI event format as labeled intervals" ... events, labels = mir_eval.io.load_labeled_events(filename) ... intervals = mir_eval.util.boundaries_to_intervals(events)[0] ... return intervals, labels[:len(intervals)] >>> ref_files = ['data/10/parsed/textfile1_uppercase.txt', ... 'data/10/parsed/textfile1_lowercase.txt'] >>> est_files = ['data/10/parsed/textfile2_uppercase.txt', ... 'data/10/parsed/textfile2_lowercase.txt'] >>> ref = [load_salami(fname) for fname in ref_files] >>> ref_int = [seg[0] for seg in ref] >>> ref_lab = [seg[1] for seg in ref] >>> est = [load_salami(fname) for fname in est_files] >>> est_int = [seg[0] for seg in est] >>> est_lab = [seg[1] for seg in est] >>> scores = mir_eval.hierarchy.evaluate(ref_int, ref_lab, ... est_hier, est_lab) >>> dict(scores) {'T-Measure full': 0.66029225561405358, 'T-Measure reduced': 0.62001868041578034, 'T-Precision full': 0.66844764668949885, 'T-Precision reduced': 0.63252297209957919, 'T-Recall full': 0.6523334654992341, 'T-Recall reduced': 0.60799919710921635}
mir_eval.separation
¶
Source separation algorithms attempt to extract recordings of individual sources from a recording of a mixture of sources. Evaluation methods for source separation compare the extracted sources from reference sources and attempt to measure the perceptual quality of the separation.
- See also the bss_eval MATLAB toolbox:
Conventions¶
An audio signal is expected to be in the format of a 1-dimensional array where the entries are the samples of the audio signal. When providing a group of estimated or reference sources, they should be provided in a 2-dimensional array, where the first dimension corresponds to the source number and the second corresponds to the samples.
Metrics¶
mir_eval.separation.bss_eval_sources()
: Computes the bss_eval_sources metrics from bss_eval, which optionally optimally match the estimated sources to the reference sources and measure the distortion and artifacts present in the estimated sources as well as the interference between them.mir_eval.separation.bss_eval_sources_framewise()
: Computes the bss_eval_sources metrics on a frame-by-frame basis.mir_eval.separation.bss_eval_images()
: Computes the bss_eval_images metrics from bss_eval, which includes the metrics inmir_eval.separation.bss_eval_sources()
plus the image to spatial distortion ratio.mir_eval.separation.bss_eval_images_framewise()
: Computes the bss_eval_images metrics on a frame-by-frame basis.
References¶
- mir_eval.separation.validate(reference_sources, estimated_sources)¶
Checks that the input data to a metric are valid, and throws helpful errors if not.
- Parameters
- reference_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing true sources
- estimated_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing estimated sources
- mir_eval.separation.bss_eval_sources(reference_sources, estimated_sources, compute_permutation=True)¶
Ordering and measurement of the separation quality for estimated source signals in terms of filtered true source, interference and artifacts.
The decomposition allows a time-invariant filter distortion of length 512, as described in Section III.B of 17.
Passing
False
forcompute_permutation
will improve the computation performance of the evaluation; however, it is not always appropriate and is not the way that the BSS_EVAL Matlab toolbox computes bss_eval_sources.- Parameters
- reference_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing true sources (must have same shape as estimated_sources)
- estimated_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing estimated sources (must have same shape as reference_sources)
- compute_permutationbool, optional
compute permutation of estimate/source combinations (True by default)
- Returns
- sdrnp.ndarray, shape=(nsrc,)
vector of Signal to Distortion Ratios (SDR)
- sirnp.ndarray, shape=(nsrc,)
vector of Source to Interference Ratios (SIR)
- sarnp.ndarray, shape=(nsrc,)
vector of Sources to Artifacts Ratios (SAR)
- permnp.ndarray, shape=(nsrc,)
vector containing the best ordering of estimated sources in the mean SIR sense (estimated source number
perm[j]
corresponds to true source numberj
). Note:perm
will be[0, 1, ..., nsrc-1]
ifcompute_permutation
isFalse
.
References
- 18
Emmanuel Vincent, Shoko Araki, Fabian J. Theis, Guido Nolte, Pau Bofill, Hiroshi Sawada, Alexey Ozerov, B. Vikrham Gowreesunker, Dominik Lutter and Ngoc Q.K. Duong, “The Signal Separation Evaluation Campaign (2007-2010): Achievements and remaining challenges”, Signal Processing, 92, pp. 1928-1936, 2012.
Examples
>>> # reference_sources[n] should be an ndarray of samples of the >>> # n'th reference source >>> # estimated_sources[n] should be the same for the n'th estimated >>> # source >>> (sdr, sir, sar, ... perm) = mir_eval.separation.bss_eval_sources(reference_sources, ... estimated_sources)
- mir_eval.separation.bss_eval_sources_framewise(reference_sources, estimated_sources, window=1323000, hop=661500, compute_permutation=False)¶
Framewise computation of bss_eval_sources
Please be aware that this function does not compute permutations (by default) on the possible relations between reference_sources and estimated_sources due to the dangers of a changing permutation. Therefore (by default), it assumes that
reference_sources[i]
corresponds toestimated_sources[i]
. To enable computing permutations please setcompute_permutation
to beTrue
and check that the returnedperm
is identical for all windows.NOTE: if
reference_sources
andestimated_sources
would be evaluated using only a single window or are shorter than the window length, the result ofmir_eval.separation.bss_eval_sources()
called onreference_sources
andestimated_sources
(with thecompute_permutation
parameter passed tomir_eval.separation.bss_eval_sources()
) is returned.- Parameters
- reference_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing true sources (must have the same shape as
estimated_sources
)- estimated_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing estimated sources (must have the same shape as
reference_sources
)- windowint, optional
Window length for framewise evaluation (default value is 30s at a sample rate of 44.1kHz)
- hopint, optional
Hop size for framewise evaluation (default value is 15s at a sample rate of 44.1kHz)
- compute_permutationbool, optional
compute permutation of estimate/source combinations for all windows (False by default)
- Returns
- sdrnp.ndarray, shape=(nsrc, nframes)
vector of Signal to Distortion Ratios (SDR)
- sirnp.ndarray, shape=(nsrc, nframes)
vector of Source to Interference Ratios (SIR)
- sarnp.ndarray, shape=(nsrc, nframes)
vector of Sources to Artifacts Ratios (SAR)
- permnp.ndarray, shape=(nsrc, nframes)
vector containing the best ordering of estimated sources in the mean SIR sense (estimated source number
perm[j]
corresponds to true source numberj
). Note:perm
will berange(nsrc)
for all windows ifcompute_permutation
isFalse
Examples
>>> # reference_sources[n] should be an ndarray of samples of the >>> # n'th reference source >>> # estimated_sources[n] should be the same for the n'th estimated >>> # source >>> (sdr, sir, sar, ... perm) = mir_eval.separation.bss_eval_sources_framewise( reference_sources, ... estimated_sources)
- mir_eval.separation.bss_eval_images(reference_sources, estimated_sources, compute_permutation=True)¶
Implementation of the bss_eval_images function from the BSS_EVAL Matlab toolbox.
Ordering and measurement of the separation quality for estimated source signals in terms of filtered true source, interference and artifacts. This method also provides the ISR measure.
The decomposition allows a time-invariant filter distortion of length 512, as described in Section III.B of 17.
Passing
False
forcompute_permutation
will improve the computation performance of the evaluation; however, it is not always appropriate and is not the way that the BSS_EVAL Matlab toolbox computes bss_eval_images.- Parameters
- reference_sourcesnp.ndarray, shape=(nsrc, nsampl, nchan)
matrix containing true sources
- estimated_sourcesnp.ndarray, shape=(nsrc, nsampl, nchan)
matrix containing estimated sources
- compute_permutationbool, optional
compute permutation of estimate/source combinations (True by default)
- Returns
- sdrnp.ndarray, shape=(nsrc,)
vector of Signal to Distortion Ratios (SDR)
- isrnp.ndarray, shape=(nsrc,)
vector of source Image to Spatial distortion Ratios (ISR)
- sirnp.ndarray, shape=(nsrc,)
vector of Source to Interference Ratios (SIR)
- sarnp.ndarray, shape=(nsrc,)
vector of Sources to Artifacts Ratios (SAR)
- permnp.ndarray, shape=(nsrc,)
vector containing the best ordering of estimated sources in the mean SIR sense (estimated source number
perm[j]
corresponds to true source numberj
). Note:perm
will be(1,2,...,nsrc)
ifcompute_permutation
isFalse
.
References
- 19
Emmanuel Vincent, Shoko Araki, Fabian J. Theis, Guido Nolte, Pau Bofill, Hiroshi Sawada, Alexey Ozerov, B. Vikrham Gowreesunker, Dominik Lutter and Ngoc Q.K. Duong, “The Signal Separation Evaluation Campaign (2007-2010): Achievements and remaining challenges”, Signal Processing, 92, pp. 1928-1936, 2012.
Examples
>>> # reference_sources[n] should be an ndarray of samples of the >>> # n'th reference source >>> # estimated_sources[n] should be the same for the n'th estimated >>> # source >>> (sdr, isr, sir, sar, ... perm) = mir_eval.separation.bss_eval_images(reference_sources, ... estimated_sources)
- mir_eval.separation.bss_eval_images_framewise(reference_sources, estimated_sources, window=1323000, hop=661500, compute_permutation=False)¶
Framewise computation of bss_eval_images
Please be aware that this function does not compute permutations (by default) on the possible relations between
reference_sources
andestimated_sources
due to the dangers of a changing permutation. Therefore (by default), it assumes thatreference_sources[i]
corresponds toestimated_sources[i]
. To enable computing permutations please setcompute_permutation
to beTrue
and check that the returnedperm
is identical for all windows.NOTE: if
reference_sources
andestimated_sources
would be evaluated using only a single window or are shorter than the window length, the result ofbss_eval_images
called onreference_sources
andestimated_sources
(with thecompute_permutation
parameter passed tobss_eval_images
) is returned- Parameters
- reference_sourcesnp.ndarray, shape=(nsrc, nsampl, nchan)
matrix containing true sources (must have the same shape as
estimated_sources
)- estimated_sourcesnp.ndarray, shape=(nsrc, nsampl, nchan)
matrix containing estimated sources (must have the same shape as
reference_sources
)- windowint
Window length for framewise evaluation
- hopint
Hop size for framewise evaluation
- compute_permutationbool, optional
compute permutation of estimate/source combinations for all windows (False by default)
- Returns
- sdrnp.ndarray, shape=(nsrc, nframes)
vector of Signal to Distortion Ratios (SDR)
- isrnp.ndarray, shape=(nsrc, nframes)
vector of source Image to Spatial distortion Ratios (ISR)
- sirnp.ndarray, shape=(nsrc, nframes)
vector of Source to Interference Ratios (SIR)
- sarnp.ndarray, shape=(nsrc, nframes)
vector of Sources to Artifacts Ratios (SAR)
- permnp.ndarray, shape=(nsrc, nframes)
vector containing the best ordering of estimated sources in the mean SIR sense (estimated source number perm[j] corresponds to true source number j) Note: perm will be range(nsrc) for all windows if compute_permutation is False
Examples
>>> # reference_sources[n] should be an ndarray of samples of the >>> # n'th reference source >>> # estimated_sources[n] should be the same for the n'th estimated >>> # source >>> (sdr, isr, sir, sar, ... perm) = mir_eval.separation.bss_eval_images_framewise( reference_sources, ... estimated_sources, window, .... hop)
- mir_eval.separation.evaluate(reference_sources, estimated_sources, **kwargs)¶
Compute all metrics for the given reference and estimated signals.
NOTE: This will always compute
mir_eval.separation.bss_eval_images()
for any valid input and will additionally computemir_eval.separation.bss_eval_sources()
for valid input with fewer than 3 dimensions.- Parameters
- reference_sourcesnp.ndarray, shape=(nsrc, nsampl[, nchan])
matrix containing true sources
- estimated_sourcesnp.ndarray, shape=(nsrc, nsampl[, nchan])
matrix containing estimated sources
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
Examples
>>> # reference_sources[n] should be an ndarray of samples of the >>> # n'th reference source >>> # estimated_sources[n] should be the same for the n'th estimated source >>> scores = mir_eval.separation.evaluate(reference_sources, ... estimated_sources)
mir_eval.tempo
¶
The goal of a tempo estimation algorithm is to automatically detect the tempo of a piece of music, measured in beats per minute (BPM).
See http://www.music-ir.org/mirex/wiki/2014:Audio_Tempo_Estimation for a description of the task and evaluation criteria.
Conventions¶
Reference and estimated tempi should be positive, and provided in ascending order as a numpy array of length 2.
The weighting value from the reference must be a float in the range [0, 1].
Metrics¶
mir_eval.tempo.detection()
: Relative error, hits, and weighted precision of tempo estimation.
- mir_eval.tempo.validate_tempi(tempi, reference=True)¶
Checks that there are two non-negative tempi. For a reference value, at least one tempo has to be greater than zero.
- Parameters
- tempinp.ndarray
length-2 array of tempo, in bpm
- referencebool
indicates a reference value
- mir_eval.tempo.validate(reference_tempi, reference_weight, estimated_tempi)¶
Checks that the input annotations to a metric look like valid tempo annotations.
- Parameters
- reference_tempinp.ndarray
reference tempo values, in bpm
- reference_weightfloat
perceptual weight of slow vs fast in reference
- estimated_tempinp.ndarray
estimated tempo values, in bpm
- mir_eval.tempo.detection(reference_tempi, reference_weight, estimated_tempi, tol=0.08)¶
Compute the tempo detection accuracy metric.
- Parameters
- reference_tempinp.ndarray, shape=(2,)
Two non-negative reference tempi
- reference_weightfloat > 0
The relative strength of
reference_tempi[0]
vsreference_tempi[1]
.- estimated_tempinp.ndarray, shape=(2,)
Two non-negative estimated tempi.
- tolfloat in [0, 1]:
The maximum allowable deviation from a reference tempo to count as a hit.
|est_t - ref_t| <= tol * ref_t
(Default value = 0.08)
- Returns
- p_scorefloat in [0, 1]
Weighted average of recalls:
reference_weight * hits[0] + (1 - reference_weight) * hits[1]
- one_correctbool
True if at least one reference tempo was correctly estimated
- both_correctbool
True if both reference tempi were correctly estimated
- Raises
- ValueError
If the input tempi are ill-formed
If the reference weight is not in the range [0, 1]
If
tol < 0
ortol > 1
.
- mir_eval.tempo.evaluate(reference_tempi, reference_weight, estimated_tempi, **kwargs)¶
Compute all metrics for the given reference and estimated annotations.
- Parameters
- reference_tempinp.ndarray, shape=(2,)
Two non-negative reference tempi
- reference_weightfloat > 0
The relative strength of
reference_tempi[0]
vsreference_tempi[1]
.- estimated_tempinp.ndarray, shape=(2,)
Two non-negative estimated tempi.
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
mir_eval.transcription
¶
The aim of a transcription algorithm is to produce a symbolic representation of a recorded piece of music in the form of a set of discrete notes. There are different ways to represent notes symbolically. Here we use the piano-roll convention, meaning each note has a start time, a duration (or end time), and a single, constant, pitch value. Pitch values can be quantized (e.g. to a semitone grid tuned to 440 Hz), but do not have to be. Also, the transcription can contain the notes of a single instrument or voice (for example the melody), or the notes of all instruments/voices in the recording. This module is instrument agnostic: all notes in the estimate are compared against all notes in the reference.
There are many metrics for evaluating transcription algorithms. Here we limit ourselves to the most simple and commonly used: given two sets of notes, we count how many estimated notes match the reference, and how many do not. Based on these counts we compute the precision, recall, f-measure and overlap ratio of the estimate given the reference. The default criteria for considering two notes to be a match are adopted from the MIREX Multiple fundamental frequency estimation and tracking, Note Tracking subtask (task 2):
“This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a reference note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the reference note’s duration around the reference note’s offset, or within 50ms whichever is larger.”
In short, we compute precision, recall, f-measure and overlap ratio, once without taking offsets into account, and the second time with.
For further details see Salamon, 2013 (page 186), and references therein:
Salamon, J. (2013). Melody Extraction from Polyphonic Music Signals. Ph.D. thesis, Universitat Pompeu Fabra, Barcelona, Spain, 2013.
IMPORTANT NOTE: the evaluation code in mir_eval
contains several important
differences with respect to the code used in MIREX 2015 for the Note Tracking
subtask on the Su dataset (henceforth “MIREX”):
mir_eval
uses bipartite graph matching to find the optimal pairing of reference notes to estimated notes. MIREX uses a greedy matching algorithm, which can produce sub-optimal note matching. This will result inmir_eval
’s metrics being slightly higher compared to MIREX.MIREX rounds down the onset and offset times of each note to 2 decimal points using
new_time = 0.01 * floor(time*100)
.mir_eval
rounds down the note onset and offset times to 4 decinal points. This will bring our metrics down a notch compared to the MIREX results.In the MIREX wiki, the criterion for matching offsets is that they must be within
0.2 * ref_duration
or 0.05 seconds from each other, whichever is greater (i.e.offset_dif <= max(0.2 * ref_duration, 0.05)
. The MIREX code however only uses a threshold of0.2 * ref_duration
, without the 0.05 second minimum. Sincemir_eval
does include this minimum, it might produce slightly higher results compared to MIREX.
This means that differences 1 and 3 bring mir_eval
’s metrics up compared to
MIREX, whilst 2 brings them down. Based on internal testing, overall the effect
of these three differences is that the Precision, Recall and F-measure returned
by mir_eval
will be higher compared to MIREX by about 1%-2%.
Finally, note that different evaluation scripts have been used for the Multi-F0
Note Tracking task in MIREX over the years. In particular, some scripts used
<
for matching onsets, offsets, and pitch values, whilst the others used
<=
for these checks. mir_eval
provides both options: by default the
latter (<=
) is used, but you can set strict=True
when calling
mir_eval.transcription.precision_recall_f1_overlap()
in which case
<
will be used. The default value (strict=False
) is the same as that
used in MIREX 2015 for the Note Tracking subtask on the Su dataset.
Conventions¶
Notes should be provided in the form of an interval array and a pitch array. The interval array contains two columns, one for note onsets and the second for note offsets (each row represents a single note). The pitch array contains one column with the corresponding note pitch values (one value per note), represented by their fundamental frequency (f0) in Hertz.
Metrics¶
mir_eval.transcription.precision_recall_f1_overlap()
: The precision, recall, F-measure, and Average Overlap Ratio of the note transcription, where an estimated note is considered correct if its pitch, onset and (optionally) offset are sufficiently close to a reference note.mir_eval.transcription.onset_precision_recall_f1()
: The precision, recall and F-measure of the note transcription, where an estimated note is considered correct if its onset is sufficiently close to a reference note’s onset. That is, these metrics are computed taking only note onsets into account, meaning two notes could be matched even if they have very different pitch values.mir_eval.transcription.offset_precision_recall_f1()
: The precision, recall and F-measure of the note transcription, where an estimated note is considered correct if its offset is sufficiently close to a reference note’s offset. That is, these metrics are computed taking only note offsets into account, meaning two notes could be matched even if they have very different pitch values.
- mir_eval.transcription.validate(ref_intervals, ref_pitches, est_intervals, est_pitches)¶
Checks that the input annotations to a metric look like time intervals and a pitch list, and throws helpful errors if not.
- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
- mir_eval.transcription.validate_intervals(ref_intervals, est_intervals)¶
Checks that the input annotations to a metric look like time intervals, and throws helpful errors if not.
- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- mir_eval.transcription.match_note_offsets(ref_intervals, est_intervals, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False)¶
Compute a maximum matching between reference and estimated notes, only taking note offsets into account.
Given two note sequences represented by
ref_intervals
andest_intervals
(seemir_eval.io.load_valued_intervals()
), we seek the largest set of correspondences(i, j)
such that the offset of reference notei
has to be withinoffset_tolerance
of the offset of estimated notej
, whereoffset_tolerance
is equal tooffset_ratio
times the reference note’s duration, i.e.offset_ratio * ref_duration[i]
whereref_duration[i] = ref_intervals[i, 1] - ref_intervals[i, 0]
. If the resultingoffset_tolerance
is less thanoffset_min_tolerance
(50 ms by default) thenoffset_min_tolerance
is used instead.Every reference note is matched against at most one estimated note.
Note there are separate functions
match_note_onsets()
andmatch_notes()
for matching notes based on onsets only or based on onset, offset, and pitch, respectively. This is because the rules for matching note onsets and matching note offsets are different.- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- offset_ratiofloat > 0
The ratio of the reference note’s duration used to define the
offset_tolerance
. Default is 0.2 (20%), meaning theoffset_tolerance
will equal theref_duration * 0.2
, or 0.05 (50 ms), whichever is greater.- offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See
offset_ratio
description for an explanation of how the offset tolerance is determined.- strictbool
If
strict=False
(the default), threshold checks for offset matching are performed using<=
(less than or equal). Ifstrict=True
, the threshold checks are performed using<
(less than).
- Returns
- matchinglist of tuples
A list of matched reference and estimated notes.
matching[i] == (i, j)
where reference notei
matches estimated notej
.
- mir_eval.transcription.match_note_onsets(ref_intervals, est_intervals, onset_tolerance=0.05, strict=False)¶
Compute a maximum matching between reference and estimated notes, only taking note onsets into account.
Given two note sequences represented by
ref_intervals
andest_intervals
(seemir_eval.io.load_valued_intervals()
), we see the largest set of correspondences(i,j)
such that the onset of reference notei
is withinonset_tolerance
of the onset of estimated notej
.Every reference note is matched against at most one estimated note.
Note there are separate functions
match_note_offsets()
andmatch_notes()
for matching notes based on offsets only or based on onset, offset, and pitch, respectively. This is because the rules for matching note onsets and matching note offsets are different.- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).
- strictbool
If
strict=False
(the default), threshold checks for onset matching are performed using<=
(less than or equal). Ifstrict=True
, the threshold checks are performed using<
(less than).
- Returns
- matchinglist of tuples
A list of matched reference and estimated notes.
matching[i] == (i, j)
where reference notei
matches estimated notej
.
- mir_eval.transcription.match_notes(ref_intervals, ref_pitches, est_intervals, est_pitches, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False)¶
Compute a maximum matching between reference and estimated notes, subject to onset, pitch and (optionally) offset constraints.
Given two note sequences represented by
ref_intervals
,ref_pitches
,est_intervals
andest_pitches
(seemir_eval.io.load_valued_intervals()
), we seek the largest set of correspondences(i, j)
such that:The onset of reference note
i
is withinonset_tolerance
of the onset of estimated notej
.The pitch of reference note
i
is withinpitch_tolerance
of the pitch of estimated notej
.If
offset_ratio
is notNone
, the offset of reference notei
has to be withinoffset_tolerance
of the offset of estimated notej
, whereoffset_tolerance
is equal tooffset_ratio
times the reference note’s duration, i.e.offset_ratio * ref_duration[i]
whereref_duration[i] = ref_intervals[i, 1] - ref_intervals[i, 0]
. If the resultingoffset_tolerance
is less than 0.05 (50 ms), 0.05 is used instead.If
offset_ratio
isNone
, note offsets are ignored, and only criteria 1 and 2 are taken into consideration.
Every reference note is matched against at most one estimated note.
This is useful for computing precision/recall metrics for note transcription.
Note there are separate functions
match_note_onsets()
andmatch_note_offsets()
for matching notes based on onsets only or based on offsets only, respectively.- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
- onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).
- pitch_tolerancefloat > 0
The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).
- offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance
will equal theref_duration * 0.2
, or 0.05 (50 ms), whichever is greater. Ifoffset_ratio
is set toNone
, offsets are ignored in the matching.- offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results if
offset_ratio
is notNone
.- strictbool
If
strict=False
(the default), threshold checks for onset, offset, and pitch matching are performed using<=
(less than or equal). Ifstrict=True
, the threshold checks are performed using<
(less than).
- Returns
- matchinglist of tuples
A list of matched reference and estimated notes.
matching[i] == (i, j)
where reference notei
matches estimated notej
.
- mir_eval.transcription.precision_recall_f1_overlap(ref_intervals, ref_pitches, est_intervals, est_pitches, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, beta=1.0)¶
Compute the Precision, Recall and F-measure of correct vs incorrectly transcribed notes, and the Average Overlap Ratio for correctly transcribed notes (see
average_overlap_ratio()
). “Correctness” is determined based on note onset, pitch and (optionally) offset: an estimated note is assumed correct if its onset is within +-50ms of a reference note and its pitch (F0) is within +- quarter tone (50 cents) of the corresponding reference note. Ifoffset_ratio
isNone
, note offsets are ignored in the comparison. Otherwise, on top of the above requirements, a correct returned note is required to have an offset value within 20% (by default, adjustable via theoffset_ratio
parameter) of the reference note’s duration around the reference note’s offset, or withinoffset_min_tolerance
(50 ms by default), whichever is larger.- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
- onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).
- pitch_tolerancefloat > 0
The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).
- offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance
will equal theref_duration * 0.2
, oroffset_min_tolerance
(0.05 by default, i.e. 50 ms), whichever is greater. Ifoffset_ratio
is set toNone
, offsets are ignored in the evaluation.- offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See
offset_ratio
description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results ifoffset_ratio
is notNone
.- strictbool
If
strict=False
(the default), threshold checks for onset, offset, and pitch matching are performed using<=
(less than or equal). Ifstrict=True
, the threshold checks are performed using<
(less than).- betafloat > 0
Weighting factor for f-measure (default value = 1.0).
- Returns
- precisionfloat
The computed precision score
- recallfloat
The computed recall score
- f_measurefloat
The computed F-measure score
- avg_overlap_ratiofloat
The computed Average Overlap Ratio score
Examples
>>> ref_intervals, ref_pitches = mir_eval.io.load_valued_intervals( ... 'reference.txt') >>> est_intervals, est_pitches = mir_eval.io.load_valued_intervals( ... 'estimated.txt') >>> (precision, ... recall, ... f_measure) = mir_eval.transcription.precision_recall_f1_overlap( ... ref_intervals, ref_pitches, est_intervals, est_pitches) >>> (precision_no_offset, ... recall_no_offset, ... f_measure_no_offset) = ( ... mir_eval.transcription.precision_recall_f1_overlap( ... ref_intervals, ref_pitches, est_intervals, est_pitches, ... offset_ratio=None))
- mir_eval.transcription.average_overlap_ratio(ref_intervals, est_intervals, matching)¶
Compute the Average Overlap Ratio between a reference and estimated note transcription. Given a reference and corresponding estimated note, their overlap ratio (OR) is defined as the ratio between the duration of the time segment in which the two notes overlap and the time segment spanned by the two notes combined (earliest onset to latest offset):
>>> OR = ((min(ref_offset, est_offset) - max(ref_onset, est_onset)) / ... (max(ref_offset, est_offset) - min(ref_onset, est_onset)))
The Average Overlap Ratio (AOR) is given by the mean OR computed over all matching reference and estimated notes. The metric goes from 0 (worst) to 1 (best).
Note: this function assumes the matching of reference and estimated notes (see
match_notes()
) has already been performed and is provided by thematching
parameter. Furthermore, it is highly recommended to validate the intervals (seevalidate_intervals()
) before calling this function, otherwise it is possible (though unlikely) for this function to attempt a divide-by-zero operation.- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- matchinglist of tuples
A list of matched reference and estimated notes.
matching[i] == (i, j)
where reference notei
matches estimated notej
.
- Returns
- avg_overlap_ratiofloat
The computed Average Overlap Ratio score
- mir_eval.transcription.onset_precision_recall_f1(ref_intervals, est_intervals, onset_tolerance=0.05, strict=False, beta=1.0)¶
Compute the Precision, Recall and F-measure of note onsets: an estimated onset is considered correct if it is within +-50ms of a reference onset. Note that this metric completely ignores note offset and note pitch. This means an estimated onset will be considered correct if it matches a reference onset, even if the onsets come from notes with completely different pitches (i.e. notes that would not match with
match_notes()
).- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).
- strictbool
If
strict=False
(the default), threshold checks for onset matching are performed using<=
(less than or equal). Ifstrict=True
, the threshold checks are performed using<
(less than).- betafloat > 0
Weighting factor for f-measure (default value = 1.0).
- Returns
- precisionfloat
The computed precision score
- recallfloat
The computed recall score
- f_measurefloat
The computed F-measure score
Examples
>>> ref_intervals, _ = mir_eval.io.load_valued_intervals( ... 'reference.txt') >>> est_intervals, _ = mir_eval.io.load_valued_intervals( ... 'estimated.txt') >>> (onset_precision, ... onset_recall, ... onset_f_measure) = mir_eval.transcription.onset_precision_recall_f1( ... ref_intervals, est_intervals)
- mir_eval.transcription.offset_precision_recall_f1(ref_intervals, est_intervals, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, beta=1.0)¶
Compute the Precision, Recall and F-measure of note offsets: an estimated offset is considered correct if it is within +-50ms (or 20% of the ref note duration, which ever is greater) of a reference offset. Note that this metric completely ignores note onsets and note pitch. This means an estimated offset will be considered correct if it matches a reference offset, even if the offsets come from notes with completely different pitches (i.e. notes that would not match with
match_notes()
).- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance
will equal theref_duration * 0.2
, oroffset_min_tolerance
(0.05 by default, i.e. 50 ms), whichever is greater.- offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See
offset_ratio
description for an explanation of how the offset tolerance is determined.- strictbool
If
strict=False
(the default), threshold checks for onset matching are performed using<=
(less than or equal). Ifstrict=True
, the threshold checks are performed using<
(less than).- betafloat > 0
Weighting factor for f-measure (default value = 1.0).
- Returns
- precisionfloat
The computed precision score
- recallfloat
The computed recall score
- f_measurefloat
The computed F-measure score
Examples
>>> ref_intervals, _ = mir_eval.io.load_valued_intervals( ... 'reference.txt') >>> est_intervals, _ = mir_eval.io.load_valued_intervals( ... 'estimated.txt') >>> (offset_precision, ... offset_recall, ... offset_f_measure) = mir_eval.transcription.offset_precision_recall_f1( ... ref_intervals, est_intervals)
- mir_eval.transcription.evaluate(ref_intervals, ref_pitches, est_intervals, est_pitches, **kwargs)¶
Compute all metrics for the given reference and estimated annotations.
- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
Examples
>>> ref_intervals, ref_pitches = mir_eval.io.load_valued_intervals( ... 'reference.txt') >>> est_intervals, est_pitches = mir_eval.io.load_valued_intervals( ... 'estimate.txt') >>> scores = mir_eval.transcription.evaluate(ref_intervals, ref_pitches, ... est_intervals, est_pitches)
mir_eval.transcription_velocity
¶
Transcription evaluation, as defined in mir_eval.transcription
, does not
take into account the velocities of reference and estimated notes. This
submodule implements a variant of
mir_eval.transcription.precision_recall_f1_overlap()
which
additionally considers note velocity when determining whether a note is
correctly transcribed. This is done by defining a new function
mir_eval.transcription_velocity.match_notes()
which first calls
mir_eval.transcription.match_notes()
to get a note matching based on
onset, offset, and pitch. Then, we follow the evaluation procedure described in
20 to test whether an estimated note should be considered
correct:
Reference velocities are re-scaled to the range [0, 1].
A linear regression is performed to estimate global scale and offset parameters which minimize the L2 distance between matched estimated and (rescaled) reference notes.
The scale and offset parameters are used to rescale estimated velocities.
An estimated/reference note pair which has been matched according to the onset, offset, and pitch is further only considered correct if the rescaled velocities are within a predefined threshold, defaulting to 0.1.
mir_eval.transcription_velocity.match_notes()
is used to define a new
variant mir_eval.transcription_velocity.precision_recall_f1_overlap()
which considers velocity.
Conventions¶
This submodule follows the conventions of mir_eval.transcription
and
additionally requires velocities to be provided as MIDI velocities in the range
[0, 127].
Metrics¶
mir_eval.transcription_velocity.precision_recall_f1_overlap()
: The precision, recall, F-measure, and Average Overlap Ratio of the note transcription, where an estimated note is considered correct if its pitch, onset, velocity and (optionally) offset are sufficiently close to a reference note.
References¶
- 20
Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, and Douglas Eck, “Onsets and Frames: Dual-Objective Piano Transcription”, Proceedings of the 19th International Society for Music Information Retrieval Conference, 2018.
- mir_eval.transcription_velocity.validate(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities)¶
Checks that the input annotations have valid time intervals, pitches, and velocities, and throws helpful errors if not.
- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
- ref_velocitiesnp.ndarray, shape=(n,)
Array of MIDI velocities (i.e. between 0 and 127) of reference notes
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
- est_velocitiesnp.ndarray, shape=(m,)
Array of MIDI velocities (i.e. between 0 and 127) of estimated notes
- mir_eval.transcription_velocity.match_notes(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, velocity_tolerance=0.1)¶
Match notes, taking note velocity into consideration.
This function first calls
mir_eval.transcription.match_notes()
to match notes according to the supplied intervals, pitches, onset, offset, and pitch tolerances. The velocities of the matched notes are then used to estimate a slope and intercept which can rescale the estimated velocities so that they are as close as possible (in L2 sense) to their matched reference velocities. Velocities are then normalized to the range [0, 1]. A estimated note is then further only considered correct if its velocity is withinvelocity_tolerance
of its matched (according to pitch and timing) reference note.- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
- ref_velocitiesnp.ndarray, shape=(n,)
Array of MIDI velocities (i.e. between 0 and 127) of reference notes
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
- est_velocitiesnp.ndarray, shape=(m,)
Array of MIDI velocities (i.e. between 0 and 127) of estimated notes
- onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).
- pitch_tolerancefloat > 0
The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).
- offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance
will equal theref_duration * 0.2
, or 0.05 (50 ms), whichever is greater. Ifoffset_ratio
is set toNone
, offsets are ignored in the matching.- offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results if
offset_ratio
is notNone
.- strictbool
If
strict=False
(the default), threshold checks for onset, offset, and pitch matching are performed using<=
(less than or equal). Ifstrict=True
, the threshold checks are performed using<
(less than).- velocity_tolerancefloat > 0
Estimated notes are considered correct if, after rescaling and normalization to [0, 1], they are within
velocity_tolerance
of a matched reference note.
- Returns
- matchinglist of tuples
A list of matched reference and estimated notes.
matching[i] == (i, j)
where reference notei
matches estimated notej
.
- mir_eval.transcription_velocity.precision_recall_f1_overlap(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, velocity_tolerance=0.1, beta=1.0)¶
Compute the Precision, Recall and F-measure of correct vs incorrectly transcribed notes, and the Average Overlap Ratio for correctly transcribed notes (see
mir_eval.transcription.average_overlap_ratio()
). “Correctness” is determined based on note onset, velocity, pitch and (optionally) offset. An estimated note is considered correct ifIts onset is within
onset_tolerance
(default +-50ms) of a reference noteIts pitch (F0) is within +/-
pitch_tolerance
(default one quarter tone, 50 cents) of the corresponding reference noteIts velocity, after normalizing reference velocities to the range [0, 1] and globally rescaling estimated velocities to minimize L2 distance between matched reference notes, is within
velocity_tolerance
(default 0.1) the corresponding reference noteIf
offset_ratio
isNone
, note offsets are ignored in the comparison. Otherwise, on top of the above requirements, a correct returned note is required to have an offset value within offset_ratio` (default 20%) of the reference note’s duration around the reference note’s offset, or withinoffset_min_tolerance
(default 50 ms), whichever is larger.
- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
- ref_velocitiesnp.ndarray, shape=(n,)
Array of MIDI velocities (i.e. between 0 and 127) of reference notes
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
- est_velocitiesnp.ndarray, shape=(n,)
Array of MIDI velocities (i.e. between 0 and 127) of estimated notes
- onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).
- pitch_tolerancefloat > 0
The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).
- offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance
will equal theref_duration * 0.2
, oroffset_min_tolerance
(0.05 by default, i.e. 50 ms), whichever is greater. Ifoffset_ratio
is set toNone
, offsets are ignored in the evaluation.- offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See
offset_ratio
description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results ifoffset_ratio
is notNone
.- strictbool
If
strict=False
(the default), threshold checks for onset, offset, and pitch matching are performed using<=
(less than or equal). Ifstrict=True
, the threshold checks are performed using<
(less than).- velocity_tolerancefloat > 0
Estimated notes are considered correct if, after rescaling and normalization to [0, 1], they are within
velocity_tolerance
of a matched reference note.- betafloat > 0
Weighting factor for f-measure (default value = 1.0).
- Returns
- precisionfloat
The computed precision score
- recallfloat
The computed recall score
- f_measurefloat
The computed F-measure score
- avg_overlap_ratiofloat
The computed Average Overlap Ratio score
- mir_eval.transcription_velocity.evaluate(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities, **kwargs)¶
Compute all metrics for the given reference and estimated annotations.
- Parameters
- ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
- ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
- ref_velocitiesnp.ndarray, shape=(n,)
Array of MIDI velocities (i.e. between 0 and 127) of reference notes
- est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
- est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
- est_velocitiesnp.ndarray, shape=(n,)
Array of MIDI velocities (i.e. between 0 and 127) of estimated notes
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
mir_eval.key
¶
Key Detection involves determining the underlying key (distribution of notes and note transitions) in a piece of music. Key detection algorithms are evaluated by comparing their estimated key to a ground-truth reference key and reporting a score according to the relationship of the keys.
Conventions¶
Keys are represented as strings of the form '(key) (mode)'
, e.g. 'C#
major'
or 'Fb minor'
. The case of the key is ignored. Note that certain
key strings are equivalent, e.g. 'C# major'
and 'Db major'
. The mode
may only be specified as either 'major'
or 'minor'
, no other mode
strings will be accepted.
Metrics¶
mir_eval.key.weighted_score()
: Heuristic scoring of the relation of two keys.
- mir_eval.key.validate_key(key)¶
Checks that a key is well-formatted, e.g. in the form
'C# major'
. The Key can be ‘X’ if it is not possible to categorize the Key and mode can be ‘other’ if it can’t be categorized as major or minor.- Parameters
- keystr
Key to verify
- mir_eval.key.validate(reference_key, estimated_key)¶
Checks that the input annotations to a metric are valid key strings and throws helpful errors if not.
- Parameters
- reference_keystr
Reference key string.
- estimated_keystr
Estimated key string.
- mir_eval.key.split_key_string(key)¶
Splits a key string (of the form, e.g.
'C# major'
), into a tuple of(key, mode)
wherekey
is is an integer representing the semitone distance from C.- Parameters
- keystr
String representing a key.
- Returns
- keyint
Number of semitones above C.
- modestr
String representing the mode.
- mir_eval.key.weighted_score(reference_key, estimated_key)¶
Computes a heuristic score which is weighted according to the relationship of the reference and estimated key, as follows:
Relationship
Score
Same key and mode
1.0
Estimated key is a perfect fifth above reference key
0.5
Relative major/minor (same key signature)
0.3
Parallel major/minor (same key)
0.2
Other
0.0
- Parameters
- reference_keystr
Reference key string.
- estimated_keystr
Estimated key string.
- Returns
- scorefloat
Score representing how closely related the keys are.
Examples
>>> ref_key = mir_eval.io.load_key('ref.txt') >>> est_key = mir_eval.io.load_key('est.txt') >>> score = mir_eval.key.weighted_score(ref_key, est_key)
- mir_eval.key.evaluate(reference_key, estimated_key, **kwargs)¶
Compute all metrics for the given reference and estimated annotations.
- Parameters
- ref_keystr
Reference key string.
- ref_keystr
Estimated key string.
- kwargs
Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.
- Returns
- scoresdict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
Examples
>>> ref_key = mir_eval.io.load_key('reference.txt') >>> est_key = mir_eval.io.load_key('estimated.txt') >>> scores = mir_eval.key.evaluate(ref_key, est_key)
mir_eval.util
¶
This submodule collects useful functionality required across the task submodules, such as preprocessing, validation, and common computations.
- mir_eval.util.index_labels(labels, case_sensitive=False)¶
Convert a list of string identifiers into numerical indices.
- Parameters
- labelslist of strings, shape=(n,)
A list of annotations, e.g., segment or chord labels from an annotation file.
- case_sensitivebool
Set to True to enable case-sensitive label indexing (Default value = False)
- Returns
- indiceslist, shape=(n,)
Numerical representation of
labels
- index_to_labeldict
Mapping to convert numerical indices back to labels.
labels[i] == index_to_label[indices[i]]
- mir_eval.util.generate_labels(items, prefix='__')¶
Given an array of items (e.g. events, intervals), create a synthetic label for each event of the form ‘(label prefix)(item number)’
- Parameters
- itemslist-like
A list or array of events or intervals
- prefixstr
This prefix will be prepended to all synthetically generated labels (Default value = ‘__’)
- Returns
- labelslist of str
Synthetically generated labels
- mir_eval.util.intervals_to_samples(intervals, labels, offset=0, sample_size=0.1, fill_value=None)¶
Convert an array of labeled time intervals to annotated samples.
- Parameters
- intervalsnp.ndarray, shape=(n, d)
An array of time intervals, as returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
. Thei
th interval spans timeintervals[i, 0]
tointervals[i, 1]
.- labelslist, shape=(n,)
The annotation for each interval
- offsetfloat > 0
Phase offset of the sampled time grid (in seconds) (Default value = 0)
- sample_sizefloat > 0
duration of each sample to be generated (in seconds) (Default value = 0.1)
- fill_valuetype(labels[0])
Object to use for the label with out-of-range time points. (Default value = None)
- Returns
- sample_timeslist
list of sample times
- sample_labelslist
array of labels for each generated sample
Notes
Intervals will be rounded down to the nearest multiple of
sample_size
.
- mir_eval.util.interpolate_intervals(intervals, labels, time_points, fill_value=None)¶
Assign labels to a set of points in time given a set of intervals.
Time points that do not lie within an interval are mapped to fill_value.
- Parameters
- intervalsnp.ndarray, shape=(n, 2)
An array of time intervals, as returned by
mir_eval.io.load_intervals()
. Thei
th interval spans timeintervals[i, 0]
tointervals[i, 1]
.Intervals are assumed to be disjoint.
- labelslist, shape=(n,)
The annotation for each interval
- time_pointsarray_like, shape=(m,)
Points in time to assign labels. These must be in non-decreasing order.
- fill_valuetype(labels[0])
Object to use for the label with out-of-range time points. (Default value = None)
- Returns
- aligned_labelslist
Labels corresponding to the given time points.
- Raises
- ValueError
If time_points is not in non-decreasing order.
- mir_eval.util.sort_labeled_intervals(intervals, labels=None)¶
Sort intervals, and optionally, their corresponding labels according to start time.
- Parameters
- intervalsnp.ndarray, shape=(n, 2)
The input intervals
- labelslist, optional
Labels for each interval
- Returns
- intervals_sorted or (intervals_sorted, labels_sorted)
Labels are only returned if provided as input
- mir_eval.util.f_measure(precision, recall, beta=1.0)¶
Compute the f-measure from precision and recall scores.
- Parameters
- precisionfloat in (0, 1]
Precision
- recallfloat in (0, 1]
Recall
- betafloat > 0
Weighting factor for f-measure (Default value = 1.0)
- Returns
- f_measurefloat
The weighted f-measure
- mir_eval.util.intervals_to_boundaries(intervals, q=5)¶
Convert interval times into boundaries.
- Parameters
- intervalsnp.ndarray, shape=(n_events, 2)
Array of interval start and end-times
- qint
Number of decimals to round to. (Default value = 5)
- Returns
- boundariesnp.ndarray
Interval boundary times, including the end of the final interval
- mir_eval.util.boundaries_to_intervals(boundaries)¶
Convert an array of event times into intervals
- Parameters
- boundarieslist-like
List-like of event times. These are assumed to be unique timestamps in ascending order.
- Returns
- intervalsnp.ndarray, shape=(n_intervals, 2)
Start and end time for each interval
- mir_eval.util.adjust_intervals(intervals, labels=None, t_min=0.0, t_max=None, start_label='__T_MIN', end_label='__T_MAX')¶
Adjust a list of time intervals to span the range
[t_min, t_max]
.Any intervals lying completely outside the specified range will be removed.
Any intervals lying partially outside the specified range will be cropped.
If the specified range exceeds the span of the provided data in either direction, additional intervals will be appended. If an interval is appended at the beginning, it will be given the label
start_label
; if an interval is appended at the end, it will be given the labelend_label
.- Parameters
- intervalsnp.ndarray, shape=(n_events, 2)
Array of interval start and end-times
- labelslist, len=n_events or None
List of labels (Default value = None)
- t_minfloat or None
Minimum interval start time. (Default value = 0.0)
- t_maxfloat or None
Maximum interval end time. (Default value = None)
- start_labelstr or float or int
Label to give any intervals appended at the beginning (Default value = ‘__T_MIN’)
- end_labelstr or float or int
Label to give any intervals appended at the end (Default value = ‘__T_MAX’)
- Returns
- new_intervalsnp.ndarray
Intervals spanning
[t_min, t_max]
- new_labelslist
List of labels for
new_labels
- mir_eval.util.adjust_events(events, labels=None, t_min=0.0, t_max=None, label_prefix='__')¶
Adjust the given list of event times to span the range
[t_min, t_max]
.Any event times outside of the specified range will be removed.
If the times do not span
[t_min, t_max]
, additional events will be added with the prefixlabel_prefix
.- Parameters
- eventsnp.ndarray
Array of event times (seconds)
- labelslist or None
List of labels (Default value = None)
- t_minfloat or None
Minimum valid event time. (Default value = 0.0)
- t_maxfloat or None
Maximum valid event time. (Default value = None)
- label_prefixstr
Prefix string to use for synthetic labels (Default value = ‘__’)
- Returns
- new_timesnp.ndarray
Event times corrected to the given range.
- mir_eval.util.intersect_files(flist1, flist2)¶
Return the intersection of two sets of filepaths, based on the file name (after the final ‘/’) and ignoring the file extension.
- Parameters
- flist1list
first list of filepaths
- flist2list
second list of filepaths
- Returns
- sublist1list
subset of filepaths with matching stems from
flist1
- sublist2list
corresponding filepaths from
flist2
Examples
>>> flist1 = ['/a/b/abc.lab', '/c/d/123.lab', '/e/f/xyz.lab'] >>> flist2 = ['/g/h/xyz.npy', '/i/j/123.txt', '/k/l/456.lab'] >>> sublist1, sublist2 = mir_eval.util.intersect_files(flist1, flist2) >>> print sublist1 ['/e/f/xyz.lab', '/c/d/123.lab'] >>> print sublist2 ['/g/h/xyz.npy', '/i/j/123.txt']
- mir_eval.util.merge_labeled_intervals(x_intervals, x_labels, y_intervals, y_labels)¶
Merge the time intervals of two sequences.
- Parameters
- x_intervalsnp.ndarray
Array of interval times (seconds)
- x_labelslist or None
List of labels
- y_intervalsnp.ndarray
Array of interval times (seconds)
- y_labelslist or None
List of labels
- Returns
- new_intervalsnp.ndarray
New interval times of the merged sequences.
- new_x_labelslist
New labels for the sequence
x
- new_y_labelslist
New labels for the sequence
y
- mir_eval.util.match_events(ref, est, window, distance=None)¶
Compute a maximum matching between reference and estimated event times, subject to a window constraint.
Given two lists of event times
ref
andest
, we seek the largest set of correspondences(ref[i], est[j])
such thatdistance(ref[i], est[j]) <= window
, and eachref[i]
andest[j]
is matched at most once.This is useful for computing precision/recall metrics in beat tracking, onset detection, and segmentation.
- Parameters
- refnp.ndarray, shape=(n,)
Array of reference values
- estnp.ndarray, shape=(m,)
Array of estimated values
- windowfloat > 0
Size of the window.
- distancefunction
function that computes the outer distance of ref and est. By default uses
|ref[i] - est[j]|
- Returns
- matchinglist of tuples
A list of matched reference and event numbers.
matching[i] == (i, j)
whereref[i]
matchesest[j]
.
- mir_eval.util.validate_intervals(intervals)¶
Checks that an (n, 2) interval ndarray is well-formed, and raises errors if not.
- Parameters
- intervalsnp.ndarray, shape=(n, 2)
Array of interval start/end locations.
- mir_eval.util.validate_events(events, max_time=30000.0)¶
Checks that a 1-d event location ndarray is well-formed, and raises errors if not.
- Parameters
- eventsnp.ndarray, shape=(n,)
Array of event times
- max_timefloat
If an event is found above this time, a ValueError will be raised. (Default value = 30000.)
- mir_eval.util.validate_frequencies(frequencies, max_freq, min_freq, allow_negatives=False)¶
Checks that a 1-d frequency ndarray is well-formed, and raises errors if not.
- Parameters
- frequenciesnp.ndarray, shape=(n,)
Array of frequency values
- max_freqfloat
If a frequency is found above this pitch, a ValueError will be raised. (Default value = 5000.)
- min_freqfloat
If a frequency is found below this pitch, a ValueError will be raised. (Default value = 20.)
- allow_negativesbool
Whether or not to allow negative frequency values.
- mir_eval.util.has_kwargs(function)¶
Determine whether a function has **kwargs.
- Parameters
- functioncallable
The function to test
- Returns
- True if function accepts arbitrary keyword arguments.
- False otherwise.
- mir_eval.util.filter_kwargs(_function, *args, **kwargs)¶
Given a function and args and keyword args to pass to it, call the function but using only the keyword arguments which it accepts. This is equivalent to redefining the function with an additional **kwargs to accept slop keyword args.
If the target function already accepts **kwargs parameters, no filtering is performed.
- Parameters
- _functioncallable
Function to call. Can take in any number of args or kwargs
- mir_eval.util.intervals_to_durations(intervals)¶
Converts an array of n intervals to their n durations.
- Parameters
- intervalsnp.ndarray, shape=(n, 2)
An array of time intervals, as returned by
mir_eval.io.load_intervals()
. Thei
th interval spans timeintervals[i, 0]
tointervals[i, 1]
.
- Returns
- durationsnp.ndarray, shape=(n,)
Array of the duration of each interval.
- mir_eval.util.hz_to_midi(freqs)¶
Convert Hz to MIDI numbers
- Parameters
- freqsnumber or ndarray
Frequency/frequencies in Hz
- Returns
- midinumber or ndarray
MIDI note numbers corresponding to input frequencies. Note that these may be fractional.
- mir_eval.util.midi_to_hz(midi)¶
Convert MIDI numbers to Hz
- Parameters
- midinumber or ndarray
MIDI notes
- Returns
- freqsnumber or ndarray
Frequency/frequencies in Hz corresponding to midi
mir_eval.io
¶
Functions for loading in annotations from files in different formats.
- mir_eval.io.load_delimited(filename, converters, delimiter='\\s+', comment='#')¶
Utility function for loading in data from an annotation file where columns are delimited. The number of columns is inferred from the length of the provided converters list.
- Parameters
- filenamestr
Path to the annotation file
- converterslist of functions
Each entry in column
n
of the file will be cast by the functionconverters[n]
.- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- columnstuple of lists
Each list in this tuple corresponds to values in one of the columns in the file.
Examples
>>> # Load in a one-column list of event times (floats) >>> load_delimited('events.txt', [float]) >>> # Load in a list of labeled events, separated by commas >>> load_delimited('labeled_events.csv', [float, str], ',')
- mir_eval.io.load_events(filename, delimiter='\\s+', comment='#')¶
Import time-stamp events from an annotation file. The file should consist of a single column of numeric values corresponding to the event times. This is primarily useful for processing events which lack duration, such as beats or onsets.
- Parameters
- filenamestr
Path to the annotation file
- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- event_timesnp.ndarray
array of event times (float)
- mir_eval.io.load_labeled_events(filename, delimiter='\\s+', comment='#')¶
Import labeled time-stamp events from an annotation file. The file should consist of two columns; the first having numeric values corresponding to the event times and the second having string labels for each event. This is primarily useful for processing labeled events which lack duration, such as beats with metric beat number or onsets with an instrument label.
- Parameters
- filenamestr
Path to the annotation file
- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- event_timesnp.ndarray
array of event times (float)
- labelslist of str
list of labels
- mir_eval.io.load_intervals(filename, delimiter='\\s+', comment='#')¶
Import intervals from an annotation file. The file should consist of two columns of numeric values corresponding to start and end time of each interval. This is primarily useful for processing events which span a duration, such as segmentation, chords, or instrument activation.
- Parameters
- filenamestr
Path to the annotation file
- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- intervalsnp.ndarray, shape=(n_events, 2)
array of event start and end times
- mir_eval.io.load_labeled_intervals(filename, delimiter='\\s+', comment='#')¶
Import labeled intervals from an annotation file. The file should consist of three columns: Two consisting of numeric values corresponding to start and end time of each interval and a third corresponding to the label of each interval. This is primarily useful for processing events which span a duration, such as segmentation, chords, or instrument activation.
- Parameters
- filenamestr
Path to the annotation file
- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- intervalsnp.ndarray, shape=(n_events, 2)
array of event start and end time
- labelslist of str
list of labels
- mir_eval.io.load_time_series(filename, delimiter='\\s+', comment='#')¶
Import a time series from an annotation file. The file should consist of two columns of numeric values corresponding to the time and value of each sample of the time series.
- Parameters
- filenamestr
Path to the annotation file
- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- timesnp.ndarray
array of timestamps (float)
- valuesnp.ndarray
array of corresponding numeric values (float)
- mir_eval.io.load_patterns(filename)¶
Loads the patters contained in the filename and puts them into a list of patterns, each pattern being a list of occurrence, and each occurrence being a list of (onset, midi) pairs.
The input file must be formatted as described in MIREX 2013: http://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections
- Parameters
- filenamestr
The input file path containing the patterns of a given piece using the MIREX 2013 format.
- Returns
- pattern_listlist
The list of patterns, containing all their occurrences, using the following format:
onset_midi = (onset_time, midi_number) occurrence = [onset_midi1, ..., onset_midiO] pattern = [occurrence1, ..., occurrenceM] pattern_list = [pattern1, ..., patternN]
where
N
is the number of patterns,M[i]
is the number of occurrences of thei
th pattern, andO[j]
is the number of onsets in thej
’th occurrence. E.g.:occ1 = [(0.5, 67.0), (1.0, 67.0), (1.5, 67.0), (2.0, 64.0)] occ2 = [(4.5, 65.0), (5.0, 65.0), (5.5, 65.0), (6.0, 62.0)] pattern1 = [occ1, occ2] occ1 = [(10.5, 67.0), (11.0, 67.0), (11.5, 67.0), (12.0, 64.0), (12.5, 69.0), (13.0, 69.0), (13.5, 69.0), (14.0, 67.0), (14.5, 76.0), (15.0, 76.0), (15.5, 76.0), (16.0, 72.0)] occ2 = [(18.5, 67.0), (19.0, 67.0), (19.5, 67.0), (20.0, 62.0), (20.5, 69.0), (21.0, 69.0), (21.5, 69.0), (22.0, 67.0), (22.5, 77.0), (23.0, 77.0), (23.5, 77.0), (24.0, 74.0)] pattern2 = [occ1, occ2] pattern_list = [pattern1, pattern2]
- mir_eval.io.load_wav(path, mono=True)¶
Loads a .wav file as a numpy array using
scipy.io.wavfile
.- Parameters
- pathstr
Path to a .wav file
- monobool
If the provided .wav has more than one channel, it will be converted to mono if
mono=True
. (Default value = True)
- Returns
- audio_datanp.ndarray
Array of audio samples, normalized to the range [-1., 1.]
- fsint
Sampling rate of the audio data
- mir_eval.io.load_valued_intervals(filename, delimiter='\\s+', comment='#')¶
Import valued intervals from an annotation file. The file should consist of three columns: Two consisting of numeric values corresponding to start and end time of each interval and a third, also of numeric values, corresponding to the value of each interval. This is primarily useful for processing events which span a duration and have a numeric value, such as piano-roll notes which have an onset, offset, and a pitch value.
- Parameters
- filenamestr
Path to the annotation file
- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- intervalsnp.ndarray, shape=(n_events, 2)
Array of event start and end times
- valuesnp.ndarray, shape=(n_events,)
Array of values
- mir_eval.io.load_key(filename, delimiter='\\s+', comment='#')¶
Load key labels from an annotation file. The file should consist of two string columns: One denoting the key scale degree (semitone), and the other denoting the mode (major or minor). The file should contain only one row.
- Parameters
- filenamestr
Path to the annotation file
- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- keystr
Key label, in the form
'(key) (mode)'
- mir_eval.io.load_tempo(filename, delimiter='\\s+', comment='#')¶
Load tempo estimates from an annotation file in MIREX format. The file should consist of three numeric columns: the first two correspond to tempo estimates (in beats-per-minute), and the third denotes the relative confidence of the first value compared to the second (in the range [0, 1]). The file should contain only one row.
- Parameters
- filenamestr
Path to the annotation file
- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- tempinp.ndarray, non-negative
The two tempo estimates
- weightfloat [0, 1]
The relative importance of
tempi[0]
compared totempi[1]
- mir_eval.io.load_ragged_time_series(filename, dtype=<class 'float'>, delimiter='\\s+', header=False, comment='#')¶
Utility function for loading in data from a delimited time series annotation file with a variable number of columns. Assumes that column 0 contains time stamps and columns 1 through n contain values. n may be variable from time stamp to time stamp.
- Parameters
- filenamestr
Path to the annotation file
- dtypefunction
Data type to apply to values columns.
- delimiterstr
Separator regular expression. By default, lines will be split by any amount of whitespace.
- headerbool
Indicates whether a header row is present or not. By default, assumes no header is present.
- commentstr or None
Comment regular expression. Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
- Returns
- timesnp.ndarray
array of timestamps (float)
- valueslist of np.ndarray
list of arrays of corresponding values
Examples
>>> # Load a ragged list of tab-delimited multi-f0 midi notes >>> times, vals = load_ragged_time_series('multif0.txt', dtype=int, delimiter='\t') >>> # Load a raggled list of space delimited multi-f0 values with a header >>> times, vals = load_ragged_time_series('labeled_events.csv', header=True)
mir_eval.sonify
¶
Methods which sonify annotations for “evaluation by ear”. All functions return a raw signal at the specified sampling rate.
- mir_eval.sonify.clicks(times, fs, click=None, length=None)¶
Returns a signal with the signal ‘click’ placed at each specified time
- Parameters
- timesnp.ndarray
times to place clicks, in seconds
- fsint
desired sampling rate of the output signal
- clicknp.ndarray
click signal, defaults to a 1 kHz blip
- lengthint
desired number of samples in the output signal, defaults to
times.max()*fs + click.shape[0] + 1
- Returns
- click_signalnp.ndarray
Synthesized click signal
- mir_eval.sonify.time_frequency(gram, frequencies, times, fs, function=<ufunc 'sin'>, length=None, n_dec=1)¶
Reverse synthesis of a time-frequency representation of a signal
- Parameters
- gramnp.ndarray
gram[n, m]
is the magnitude offrequencies[n]
fromtimes[m]
totimes[m + 1]
Non-positive magnitudes are interpreted as silence.
- frequenciesnp.ndarray
array of size
gram.shape[0]
denoting the frequency of each row of gram- timesnp.ndarray, shape=
(gram.shape[1],)
or(gram.shape[1], 2)
Either the start time of each column in the gram, or the time interval corresponding to each column.
- fsint
desired sampling rate of the output signal
- functionfunction
function to use to synthesize notes, should be 2\pi-periodic
- lengthint
desired number of samples in the output signal, defaults to
times[-1]*fs
- n_decint
the number of decimals used to approximate each sonfied frequency. Defaults to 1 decimal place. Higher precision will be slower.
- Returns
- outputnp.ndarray
synthesized version of the piano roll
- mir_eval.sonify.pitch_contour(times, frequencies, fs, amplitudes=None, function=<ufunc 'sin'>, length=None, kind='linear')¶
Sonify a pitch contour.
- Parameters
- timesnp.ndarray
time indices for each frequency measurement, in seconds
- frequenciesnp.ndarray
frequency measurements, in Hz. Non-positive measurements will be interpreted as un-voiced samples.
- fsint
desired sampling rate of the output signal
- amplitudesnp.ndarray
amplitude measurments, nonnegative defaults to
np.ones((length,))
- functionfunction
function to use to synthesize notes, should be 2\pi-periodic
- lengthint
desired number of samples in the output signal, defaults to
max(times)*fs
- kindstr
Interpolation mode for the frequency and amplitude values. See:
scipy.interpolate.interp1d
for valid settings.
- Returns
- outputnp.ndarray
synthesized version of the pitch contour
- mir_eval.sonify.chroma(chromagram, times, fs, **kwargs)¶
Reverse synthesis of a chromagram (semitone matrix)
- Parameters
- chromagramnp.ndarray, shape=(12, times.shape[0])
Chromagram matrix, where each row represents a semitone [C->Bb] i.e.,
chromagram[3, j]
is the magnitude of D# fromtimes[j]
totimes[j + 1]
- times: np.ndarray, shape=(len(chord_labels),) or (len(chord_labels), 2)
Either the start time of each column in the chromagram, or the time interval corresponding to each column.
- fsint
Sampling rate to synthesize audio data at
- kwargs
Additional keyword arguments to pass to
mir_eval.sonify.time_frequency()
- Returns
- outputnp.ndarray
Synthesized chromagram
- mir_eval.sonify.chords(chord_labels, intervals, fs, **kwargs)¶
Synthesizes chord labels
- Parameters
- chord_labelslist of str
List of chord label strings.
- intervalsnp.ndarray, shape=(len(chord_labels), 2)
Start and end times of each chord label
- fsint
Sampling rate to synthesize at
- kwargs
Additional keyword arguments to pass to
mir_eval.sonify.time_frequency()
- Returns
- outputnp.ndarray
Synthesized chord labels
mir_eval.display
¶
Display functions
- mir_eval.display.segments(intervals, labels, base=None, height=None, text=False, text_kw=None, ax=None, **kwargs)¶
Plot a segmentation as a set of disjoint rectangles.
- Parameters
- intervalsnp.ndarray, shape=(n, 2)
segment intervals, in the format returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
.- labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- basenumber
The vertical position of the base of the rectangles. By default, this will be the bottom of the plot.
- heightnumber
The height of the rectangles. By default, this will be the top of the plot (minus
base
).- textbool
If true, each segment’s label is displayed in its upper-left corner
- text_kwdict
If
text == True
, the properties of the text object can be specified here. Seematplotlib.pyplot.Text
for valid parameters- axmatplotlib.pyplot.axes
An axis handle on which to draw the segmentation. If none is provided, a new set of axes is created.
- kwargs
Additional keyword arguments to pass to
matplotlib.patches.Rectangle
.
- Returns
- axmatplotlib.pyplot.axes._subplots.AxesSubplot
A handle to the (possibly constructed) plot axes
- mir_eval.display.labeled_intervals(intervals, labels, label_set=None, base=None, height=None, extend_labels=True, ax=None, tick=True, **kwargs)¶
Plot labeled intervals with each label on its own row.
- Parameters
- intervalsnp.ndarray, shape=(n, 2)
segment intervals, in the format returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
.- labelslist, shape=(n,)
reference segment labels, in the format returned by
mir_eval.io.load_labeled_intervals()
.- label_setlist
An (ordered) list of labels to determine the plotting order. If not provided, the labels will be inferred from
ax.get_yticklabels()
. If noyticklabels
exist, then the sorted set of unique values inlabels
is taken as the label set.- basenp.ndarray, shape=(n,), optional
Vertical positions of each label. By default, labels are positioned at integers
np.arange(len(labels))
.- heightscalar or np.ndarray, shape=(n,), optional
Height for each label. If scalar, the same value is applied to all labels. By default, each label has
height=1
.- extend_labelsbool
If
False
, only values oflabels
that also exist inlabel_set
will be shown.If
True
, all labels are shown, with those in labels but not in label_set appended to the top of the plot. A horizontal line is drawn to indicate the separation between values in or out oflabel_set
.- axmatplotlib.pyplot.axes
An axis handle on which to draw the intervals. If none is provided, a new set of axes is created.
- tickbool
If
True
, sets tick positions and labels on the y-axis.- kwargs
Additional keyword arguments to pass to matplotlib.collection.BrokenBarHCollection.
- Returns
- axmatplotlib.pyplot.axes._subplots.AxesSubplot
A handle to the (possibly constructed) plot axes
- class mir_eval.display.IntervalFormatter(base, ticks)¶
Bases:
matplotlib.ticker.Formatter
Ticker formatter for labeled interval plots.
- Parameters
- basearray-like of int
The base positions of each label
- ticksarray-like of string
The labels for the ticks
- Attributes
- axis
Methods
__call__
(x[, pos])Return the format for tick value x at position pos.
fix_minus
(s)Some classes may want to replace a hyphen for minus with the proper unicode symbol (U+2212) for typographical correctness. This is a helper method to perform such a replacement when it is enabled via :rc:`axes.unicode_minus`.
format_data
(value)Return the full string representation of the value with the position unspecified.
format_data_short
(value)Return a short string version of the tick value.
format_ticks
(values)Return the tick labels for all the ticks at once.
set_locs
(locs)Set the locations of the ticks.
create_dummy_axis
get_offset
set_axis
set_bounds
set_data_interval
set_view_interval
- mir_eval.display.hierarchy(intervals_hier, labels_hier, levels=None, ax=None, **kwargs)¶
Plot a hierarchical segmentation
- Parameters
- intervals_hierlist of np.ndarray
A list of segmentation intervals. Each element should be an n-by-2 array of segment intervals, in the format returned by
mir_eval.io.load_intervals()
ormir_eval.io.load_labeled_intervals()
. Segmentations should be ordered by increasing specificity.- labels_hierlist of list-like
A list of segmentation labels. Each element should be a list of labels for the corresponding element in intervals_hier.
- levelslist of string
Each element
levels[i]
is a label for the`i
th segmentation. This is used in the legend to denote the levels in a segment hierarchy.- kwargs
Additional keyword arguments to labeled_intervals.
- Returns
- axmatplotlib.pyplot.axes._subplots.AxesSubplot
A handle to the (possibly constructed) plot axes
- mir_eval.display.events(times, labels=None, base=None, height=None, ax=None, text_kw=None, **kwargs)¶
Plot event times as a set of vertical lines
- Parameters
- timesnp.ndarray, shape=(n,)
event times, in the format returned by
mir_eval.io.load_events()
ormir_eval.io.load_labeled_events()
.- labelslist, shape=(n,), optional
event labels, in the format returned by
mir_eval.io.load_labeled_events()
.- basenumber
The vertical position of the base of the line. By default, this will be the bottom of the plot.
- heightnumber
The height of the lines. By default, this will be the top of the plot (minus base).
- axmatplotlib.pyplot.axes
An axis handle on which to draw the segmentation. If none is provided, a new set of axes is created.
- text_kwdict
If labels is provided, the properties of the text objects can be specified here. See matplotlib.pyplot.Text for valid parameters
- kwargs
Additional keyword arguments to pass to matplotlib.pyplot.vlines.
- Returns
- axmatplotlib.pyplot.axes._subplots.AxesSubplot
A handle to the (possibly constructed) plot axes
- mir_eval.display.pitch(times, frequencies, midi=False, unvoiced=False, ax=None, **kwargs)¶
Visualize pitch contours
- Parameters
- timesnp.ndarray, shape=(n,)
Sample times of frequencies
- frequenciesnp.ndarray, shape=(n,)
frequencies (in Hz) of the pitch contours. Voicing is indicated by sign (positive for voiced, non-positive for non-voiced).
- midibool
If True, plot on a MIDI-numbered vertical axis. Otherwise, plot on a linear frequency axis.
- unvoicedbool
If True, unvoiced pitch contours are plotted and indicated by transparency.
Otherwise, unvoiced pitch contours are omitted from the display.
- axmatplotlib.pyplot.axes
An axis handle on which to draw the pitch contours. If none is provided, a new set of axes is created.
- kwargs
Additional keyword arguments to matplotlib.pyplot.plot.
- Returns
- axmatplotlib.pyplot.axes._subplots.AxesSubplot
A handle to the (possibly constructed) plot axes
- mir_eval.display.multipitch(times, frequencies, midi=False, unvoiced=False, ax=None, **kwargs)¶
Visualize multiple f0 measurements
- Parameters
- timesnp.ndarray, shape=(n,)
Sample times of frequencies
- frequencieslist of np.ndarray
frequencies (in Hz) of the pitch measurements. Voicing is indicated by sign (positive for voiced, non-positive for non-voiced).
times and frequencies should be in the format produced by
mir_eval.io.load_ragged_time_series()
- midibool
If True, plot on a MIDI-numbered vertical axis. Otherwise, plot on a linear frequency axis.
- unvoicedbool
If True, unvoiced pitches are plotted and indicated by transparency.
Otherwise, unvoiced pitches are omitted from the display.
- axmatplotlib.pyplot.axes
An axis handle on which to draw the pitch contours. If none is provided, a new set of axes is created.
- kwargs
Additional keyword arguments to plt.scatter.
- Returns
- axmatplotlib.pyplot.axes._subplots.AxesSubplot
A handle to the (possibly constructed) plot axes
- mir_eval.display.piano_roll(intervals, pitches=None, midi=None, ax=None, **kwargs)¶
Plot a quantized piano roll as intervals
- Parameters
- intervalsnp.ndarray, shape=(n, 2)
timing intervals for notes
- pitchesnp.ndarray, shape=(n,), optional
pitches of notes (in Hz).
- midinp.ndarray, shape=(n,), optional
pitches of notes (in MIDI numbers).
At least one of
pitches
ormidi
must be provided.- axmatplotlib.pyplot.axes
An axis handle on which to draw the intervals. If none is provided, a new set of axes is created.
- kwargs
Additional keyword arguments to
labeled_intervals()
.
- Returns
- axmatplotlib.pyplot.axes._subplots.AxesSubplot
A handle to the (possibly constructed) plot axes
- mir_eval.display.separation(sources, fs=22050, labels=None, alpha=0.75, ax=None, **kwargs)¶
Source-separation visualization
- Parameters
- sourcesnp.ndarray, shape=(nsrc, nsampl)
A list of waveform buffers corresponding to each source
- fsnumber > 0
The sampling rate
- labelslist of strings
An optional list of descriptors corresponding to each source
- alphafloat in [0, 1]
Maximum alpha (opacity) of spectrogram values.
- axmatplotlib.pyplot.axes
An axis handle on which to draw the spectrograms. If none is provided, a new set of axes is created.
- kwargs
Additional keyword arguments to
scipy.signal.spectrogram
- Returns
- ax
The axis handle for this plot
- mir_eval.display.ticker_notes(ax=None)¶
Set the y-axis of the given axes to MIDI notes
- Parameters
- axmatplotlib.pyplot.axes
The axes handle to apply the ticker. By default, uses the current axes handle.
- mir_eval.display.ticker_pitch(ax=None)¶
Set the y-axis of the given axes to MIDI frequencies
- Parameters
- axmatplotlib.pyplot.axes
The axes handle to apply the ticker. By default, uses the current axes handle.