If is a set of classes (labels or shop departments in our example), and
is a space of all possible products and product groups, then our goal is to find a classifier function
. This sounds like a multiclass classification problem, although we are not necessarily going to use machine learning to solve it.
Obviously, we have bent reality a bit assuming that:
- each product can be found only in one department (otherwise that would be multi-label classification),
- and all products can be found in a considered grocery store.
We have the budget to manually label only of
products, where
and
.
Products are organized in categories that make up a taxonomy . More formally, a taxonomy, is a tree, where leaves stand
for products and inner nodes are product categories.
My idea is to label products iteratively.
First iteration is somehow special:
- Sample
products.
- Label
manually.
- Predict labels for remaining products (
) based on labels for
and relations in
.
There will be products () with only one matching label and products (
) with ambiguous predictions, i.e.,
multiple label candidates, that require manual clarification. In subsequent iterations
we will gradually
clarify those ambiguous predictions:
- Sample
products.
- Label
manually.
- Predict labels for products without manual labels (
) based on all manual labels collected so far (i.e. labels for
) and relations in
.
Repeat until there are no products with ambiguous predictions () or you have consumed whole
budget (
). The ultimate labelling will come from both manual labels
(for
) and unambiguous predictions (for
).
We have one annotator, predicting next label costs nothing (compared to manual cost), so given a budget , so we can have
iterations, in each we select only one product for manual labelling.
The problem then boils down, which product to select next?
-
The one that resolves most conflicts: products with the highest number of label candidates?
-
Or on the contrary:
"In contrast, for an ambiguous instance which falls near the boundary of categories, even those reliable workers will still disagree with each other and generate inconsistent labels. For those ambiguous instances, we are facing a challenging decision problem on how much budget that we should spend on them. On one hand, it is worth to collect more labels to boost the accuracy of the aggregate label. On the other hand, since our goal is to maximize the overall labeling accuracy, when the budget is limited, we should simply put those few highly ambiguous instances aside to save budget for labeling less difficult instances."
Quote form Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling
Technically, the number of conflicting labels could be included in the weight function.
However, if in the first iteration we select only one item to label then all remaining items will get same label (which might be incorrect). In the begining we want to find items matching possibly all label classes.
Also, conflicts are good because they lead to clarification.
TODO:
- Experiment with current implementation to get some intuitions on budget allocation, number of iterations, when conflicts are good.
- If we don't have groundtruth labels, how do we know what is an optimal labelling, how to measure it? How they evaluated that in the paper?