Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subset of annotations for FSD 1st release #26

Closed
edufonseca opened this issue May 17, 2017 · 3 comments
Closed

Subset of annotations for FSD 1st release #26

edufonseca opened this issue May 17, 2017 · 3 comments

Comments

@edufonseca
Copy link
Contributor

edufonseca commented May 17, 2017

After defining the mapping to create lists of candidate samples to fill the sound categories (#25), we should decide a subset of all mapped annotations where to concentrate the validations in the initial validation task for FSD 1st release. This is related to the design of the subsets #24, in such a way that the chosen subset of annotations should meet the requirements of the data subsets that we want to provide.

A simple example:
Suppose we want the Medium subset of the FSD 1st release to have ~100k annotations with rater agreement. When validating an annotation, raters can say NP or U with certain probability. This means that we should select >100k annotations as starting point.

@edufonseca edufonseca changed the title Define subset of annotations for FSD 1st release Subset of annotations for FSD 1st release May 17, 2017
@edufonseca
Copy link
Contributor Author

We could do this in two ways:

  1. assuming a constant factor for all categories, eg, if we want to have 10 (Present) audio clips per category, we select 15 clips (annotations) to be validated
  2. considering that the mapping may be better in certain categories, eg, for a well mapped category, 13 audio clips may be enough, while for a bad mapped one we could need 20

@xavierfav
Copy link
Contributor

xavierfav commented Jul 11, 2017

The prioritization of categories according to their number of ground truth annotation should bring us to a balanced dataset #38.
We also choosed to omit the category that have two few annotation candidates.

I think this is enough for getting a good subset.
However, annotation are not prioritized based on sound "quality" #23

@xavierfav xavierfav removed the FSD1.0 label Jan 23, 2018
@xavierfav xavierfav added this to the FSD 1st release milestone Jan 23, 2018
@xavierfav
Copy link
Contributor

We decided not to create any subset, but rather prioritize some categories and annotations.
Some requirements are already functional (prioritize categories), and it will be improved in #133.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants