Distributing labelling budget

Formalizing problem

If is a set of classes (labels or shop departments in our example), and is a space of all possible products and product groups, then our goal is to find a classifier function . This sounds like a multiclass classification problem, although we are not necessarily going to use machine learning to solve it.

Obviously, we have bent reality a bit assuming that:

each product can be found only in one department (otherwise that would be multi-label classification),
and all products can be found in a considered grocery store.

We have the budget to manually label only of products, where and .

Products are organized in categories that make up a taxonomy . More formally, a taxonomy, is a tree, where leaves stand for products and inner nodes are product categories.

Labelling in iterations

My idea is to label products iteratively.

First iteration is somehow special:

Sample products.
Label manually.
Predict labels for remaining products () based on labels for and relations in .

There will be products () with only one matching label and products () with ambiguous predictions, i.e., multiple label candidates, that require manual clarification. In subsequent iterations we will gradually clarify those ambiguous predictions:

Sample products.
Label manually.
Predict labels for products without manual labels () based on all manual labels collected so far (i.e. labels for ) and relations in .

Repeat until there are no products with ambiguous predictions () or you have consumed whole budget (). The ultimate labelling will come from both manual labels (for ) and unambiguous predictions (for ).

Considerations

We have one annotator, predicting next label costs nothing (compared to manual cost), so given a budget , so we can have iterations, in each we select only one product for manual labelling.

The problem then boils down, which product to select next?

The one that resolves most conflicts: products with the highest number of label candidates?
Or on the contrary:

"In contrast, for an ambiguous instance which falls near the boundary of categories, even those reliable workers will still disagree with each other and generate inconsistent labels. For those ambiguous instances, we are facing a challenging decision problem on how much budget that we should spend on them. On one hand, it is worth to collect more labels to boost the accuracy of the aggregate label. On the other hand, since our goal is to maximize the overall labeling accuracy, when the budget is limited, we should simply put those few highly ambiguous instances aside to save budget for labeling less difficult instances."

Quote form Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling

Technically, the number of conflicting labels could be included in the weight function.

However, if in the first iteration we select only one item to label then all remaining items will get same label (which might be incorrect). In the begining we want to find items matching possibly all label classes.

Also, conflicts are good because they lead to clarification.

TODO:

Experiment with current implementation to get some intuitions on budget allocation, number of iterations, when conflicts are good.
If we don't have groundtruth labels, how do we know what is an optimal labelling, how to measure it? How they evaluated that in the paper?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

budget.md

budget.md

Distributing labelling budget

Formalizing problem

Labelling in iterations

Considerations

Files

budget.md

Latest commit

History

budget.md

File metadata and controls

Distributing labelling budget

Formalizing problem

Labelling in iterations

Considerations