Cross entropy stopping condition for training. #684

Tilps · 2019-01-17T21:37:39Z

Rather than using a fixed 800 nodes, or using pruning to do early stopping (which based on its selection point favors certain distribution shapes), I wonder if we could use a distribution cross entropy approach.
Previously some analysis was done that showed about 800 nodes was the peak cross entropy delta rate and afterwards has diminishing returns - but that was averaged over many games. If we sample the visit distribution every x nodes, we can calculate an average cross entropy per node change for the last x nodes. But specific for the current position, which may have a longer period of information gain compared to average.

Then we can set a minimum and maximum node visit range, and in between we stop if the average cross entropy per node change drops below a configurable value y.
Would need to do some simulations to find the trade off between speed and what value of y we can go down to, but maybe once we have draw agreements in place the additional saved time per game can be spent a bit here.

Naphthalin · 2020-04-30T14:24:05Z

#721 was merged a while ago, this issue can be closed I guess :)

Tilps · 2020-05-01T00:43:55Z

yeap, done.

Tilps mentioned this issue Feb 8, 2019

Add option for doing kldgain thresholding rather than absolute visit limiting. #721

Merged

Tilps closed this as completed May 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross entropy stopping condition for training. #684

Cross entropy stopping condition for training. #684

Tilps commented Jan 17, 2019

Naphthalin commented Apr 30, 2020

Tilps commented May 1, 2020

Cross entropy stopping condition for training. #684

Cross entropy stopping condition for training. #684

Comments

Tilps commented Jan 17, 2019

Naphthalin commented Apr 30, 2020

Tilps commented May 1, 2020