Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect priors get loaded intermittently (Mac/OpenCL?) #811

Closed
Mardak opened this issue Mar 21, 2019 · 7 comments
Closed

Incorrect priors get loaded intermittently (Mac/OpenCL?) #811

Mardak opened this issue Mar 21, 2019 · 7 comments

Comments

@Mardak
Copy link
Contributor

Mardak commented Mar 21, 2019

I was checking in on t40's progress with current master daf933e and #237's drawn position by trying to force the top prior move to be picked for each side with --fpu-value=100:
screen shot 2018-08-06 at 3 16 11 pm

./lc0 -w 41620 --verbose-move-stats --fpu-value=100 --minibatch-size=1
position startpos moves e2e4 c7c5 g1f3 e7e6 d2d4 c5d4 f3d4 b8c6 b1c3 d8c7 c1e3 a7a6 d1d2 g8f6 e1c1 f8e7 f2f3 b7b5 g2g4 c6d4 e3d4 c8b7 c1b1 e8g8 f1d3 b5b4 c3a4 d7d5 e4e5 f6d7 d2e3 b7c6 a4b6 d7b6 d4b6 c7d7 f3f4 c6b5 b6d4 b5d3 e3d3 a6a5 f4f5 d7c7 f5f6 e7c5 d4c5 c7c5 h2h4 a5a4 g4g5 a4a3 d3d4 c5c7 b2b3 f8c8 h1h2 c7c3 h4h5 c3g3 h2e2 g3f3 d4d2 c8c7 e2f2 f3g3 f2e2 a8b8 d2d4 g3f3 e2h2 f3g3 h2e2 g3h3 g5g6 f7g6 h5g6 h7g6 d1g1 h3f3 e2h2 f3e4 d4e4 d5e4 g1g6 b8b5 c2c4 b5e5 b1c2 e5f5 f6g7 c7g7 g6e6 g7g1 e6e4 f5f3 e4h4 f3c3 c2d2 g1a1 h4h8
go nodes 100

… pv g8f7 h2h7 f7g6 h7h6 g6g5 h6h5 g5g4 h5h4 g4g5 h4h5 g5g4 h5h4 g4g5 h4h5
g8g7  (156 ) N:      43 (+ 0) (P: 46.58%) (Q:  0.02514) (D:  0.488) (U: 0.30885) (Q+U:  0.33399) (V:  0.0669) 
g8f7  (155 ) N:      51 (+ 1) (P: 53.42%) (Q:  0.02866) (D:  0.392) (U: 0.29403) (Q+U:  0.32270) (V:  0.0302) 

However, one time out of ~100 attempts (restarting lc0 each time), it doesn't lead to a draw:

… pv g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5g4 h5h4 g4f5 c4c5 a1a2 d2d1 c3b3 h8h5 f5e6 h5h6 e6d7 h6h7 d7c6 h4h6 c6b5 h7b7 b5c5
g8g7  (156 ) N:      24 (+ 0) (P: 46.58%) (Q:  0.10247) (D:  0.000) (U: 0.48200) (Q+U:  0.58446) (V:  0.0669) 
g8f7  (155 ) N:      50 (+ 2) (P: 53.42%) (Q:  0.45652) (D:  0.040) (U: 0.26072) (Q+U:  0.71724) (V:  0.0302)

Then with the same lc0 instance, I added some more nodes to see if it would change then jumped ahead to where white played the wrong move c4c5 (instead of perpetual check moving rook on h file):

position startpos moves e2e4 c7c5 g1f3 e7e6 d2d4 c5d4 f3d4 b8c6 b1c3 d8c7 c1e3 a7a6 d1d2 g8f6 e1c1 f8e7 f2f3 b7b5 g2g4 c6d4 e3d4 c8b7 c1b1 e8g8 f1d3 b5b4 c3a4 d7d5 e4e5 f6d7 d2e3 b7c6 a4b6 d7b6 d4b6 c7d7 f3f4 c6b5 b6d4 b5d3 e3d3 a6a5 f4f5 d7c7 f5f6 e7c5 d4c5 c7c5 h2h4 a5a4 g4g5 a4a3 d3d4 c5c7 b2b3 f8c8 h1h2 c7c3 h4h5 c3g3 h2e2 g3f3 d4d2 c8c7 e2f2 f3g3 f2e2 a8b8 d2d4 g3f3 e2h2 f3g3 h2e2 g3h3 g5g6 f7g6 h5g6 h7g6 d1g1 h3f3 e2h2 f3e4 d4e4 d5e4 g1g6 b8b5 c2c4 b5e5 b1c2 e5f5 f6g7 c7g7 g6e6 g7g1 e6e4 f5f3 e4h4 f3c3 c2d2 g1a1 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5g4 h5h4 g4f5
go nodes 1

h4h5  (888 ) N:       0 (+ 0) (P:  9.78%) (Q: -39.75361) (D:  0.000) (U: 9.39260) (Q+U: -30.36101) (V:  -.----) 
c4c5  (727 ) N:     965 (+ 0) (P: 15.03%) (Q: -0.98330) (D:  0.000) (U: 0.01494) (Q+U: -0.96836) (V: -0.1075) 

Some reason the highest prior move was indeed c4c5… But then I quit and jumped straight to that "wrong move" position:

c4c5  (727 ) N:       0 (+ 0) (P:  2.78%) (Q:  0.00205) (D:  0.000) (U: 0.08341) (Q+U:  0.08546) (V:  -.----) 
h4h5  (888 ) N:       0 (+ 0) (P: 38.00%) (Q:  0.00205) (D:  0.000) (U: 1.14005) (Q+U:  1.14211) (V:  -.----) 

The correct priors for h4h5 and c4c5 do get loaded…

Then trying to reproduce this issue again while writing up this comment. Again maybe 1 out of 100 attempts resulted in a not-drawn position but for a different move d2e2:

position startpos moves e2e4 c7c5 g1f3 e7e6 d2d4 c5d4 f3d4 b8c6 b1c3 d8c7 c1e3 a7a6 d1d2 g8f6 e1c1 f8e7 f2f3 b7b5 g2g4 c6d4 e3d4 c8b7 c1b1 e8g8 f1d3 b5b4 c3a4 d7d5 e4e5 f6d7 d2e3 b7c6 a4b6 d7b6 d4b6 c7d7 f3f4 c6b5 b6d4 b5d3 e3d3 a6a5 f4f5 d7c7 f5f6 e7c5 d4c5 c7c5 h2h4 a5a4 g4g5 a4a3 d3d4 c5c7 b2b3 f8c8 h1h2 c7c3 h4h5 c3g3 h2e2 g3f3 d4d2 c8c7 e2f2 f3g3 f2e2 a8b8 d2d4 g3f3 e2h2 f3g3 h2e2 g3h3 g5g6 f7g6 h5g6 h7g6 d1g1 h3f3 e2h2 f3e4 d4e4 d5e4 g1g6 b8b5 c2c4 b5e5 b1c2 e5f5 f6g7 c7g7 g6e6 g7g1 e6e4 f5f3 e4h4 f3c3 c2d2 g1a1 h4h8
go nodes 100

… pv g8g7 h2h7 g7g6 h7h6 g6g5 h6h5 g5g4 d2e2 a1a2 e2d1 c3d3 d1c1 a2a1 c1c2 d3c3 c2d2 a1a2 d2d1 c3d3 d1c1 a2a1
g8f7  (155 ) N:      23 (+ 0) (P: 53.42%) (Q:  0.20912) (D:  0.000) (U: 0.57184) (Q+U:  0.78096) (V:  0.0302) 
g8g7  (156 ) N:      50 (+ 1) (P: 46.58%) (Q:  0.52688) (D:  0.000) (U: 0.23015) (Q+U:  0.75703) (V:  0.0669) 

position startpos moves e2e4 c7c5 g1f3 e7e6 d2d4 c5d4 f3d4 b8c6 b1c3 d8c7 c1e3 a7a6 d1d2 g8f6 e1c1 f8e7 f2f3 b7b5 g2g4 c6d4 e3d4 c8b7 c1b1 e8g8 f1d3 b5b4 c3a4 d7d5 e4e5 f6d7 d2e3 b7c6 a4b6 d7b6 d4b6 c7d7 f3f4 c6b5 b6d4 b5d3 e3d3 a6a5 f4f5 d7c7 f5f6 e7c5 d4c5 c7c5 h2h4 a5a4 g4g5 a4a3 d3d4 c5c7 b2b3 f8c8 h1h2 c7c3 h4h5 c3g3 h2e2 g3f3 d4d2 c8c7 e2f2 f3g3 f2e2 a8b8 d2d4 g3f3 e2h2 f3g3 h2e2 g3h3 g5g6 f7g6 h5g6 h7g6 d1g1 h3f3 e2h2 f3e4 d4e4 d5e4 g1g6 b8b5 c2c4 b5e5 b1c2 e5f5 f6g7 c7g7 g6e6 g7g1 e6e4 f5f3 e4h4 f3c3 c2d2 g1a1 h4h8 g8g7 h2h7 g7g6 h7h6 g6g5 h6h5 g5g4
go nodes 1

h5h4  (1124) N:       0 (+ 0) (P:  2.20%) (Q: -47.60742) (D:  0.000) (U: 0.33640) (Q+U: -47.27101) (V:  -.----) 
d2e2  (282 ) N:      26 (+ 1) (P: 22.11%) (Q: -0.64373) (D:  0.000) (U: 0.12091) (Q+U: -0.52282) (V: -0.2743) 

then restarting lc0

d2e2  (282 ) N:       0 (+ 0) (P:  3.06%) (Q: -0.13879) (D:  0.000) (U: 0.09178) (Q+U: -0.04700) (V:  -.----) 
h5h4  (1124) N:       0 (+ 0) (P: 39.14%) (Q: -0.13879) (D:  0.000) (U: 1.17411) (Q+U:  1.03533) (V:  -.----) 
@Mardak
Copy link
Contributor Author

Mardak commented Mar 21, 2019

Oh. KillerDucky suggested using check backend from #146. And indeed using the same weights and position:

Creating backend [check]...
Working backend set to opencl.
Reference backend set to blas.
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
…
Creating backend [blas]...
BLAS, maximum batch size set to 256
BLAS vendor: Apple vecLib.
Apple vecLib ignores blas_cores (1) parameter.
BLAS max batch size is 256.
Check mode: check only with relative tolerance 1.0e-05, absolute tolerance 1.0e-04.
Check rate: 20%.
*** ERROR check failed for a batch of 17 policy incorrect (but value ok).
*** ERROR check failed for a batch of 8 both value and policy incorrect.
*** ERROR check failed for a batch of 44 both value and policy incorrect.

@Mardak
Copy link
Contributor Author

Mardak commented Mar 21, 2019

So some more testing with --backend-opts="freq=1.0" and various networks and go nodes 1000 from startpos:

11258
Check passed for a batch of 1.
Check passed for a batch of 20.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 30.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 30.
Check passed for a batch of 32.
Check passed for a batch of 30.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 28.
Check passed for a batch of 34.
Check passed for a batch of 30.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 35.
Check passed for a batch of 32.
Check passed for a batch of 31.
Check passed for a batch of 63.
Check passed for a batch of 32.
Check passed for a batch of 37.
Check passed for a batch of 32.
Check passed for a batch of 49.
Check passed for a batch of 32.
Check passed for a batch of 37.
Check passed for a batch of 31.
Check passed for a batch of 75.
Check passed for a batch of 36.

22202
Check passed for a batch of 1.
Check passed for a batch of 20.
Check passed for a batch of 26.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 6.
Check passed for a batch of 24.
Check passed for a batch of 39.
Check passed for a batch of 23.
Check passed for a batch of 49.
Check passed for a batch of 32.
Check passed for a batch of 42.
Check passed for a batch of 56.
Check passed for a batch of 39.
Check passed for a batch of 63.
Check passed for a batch of 46.
Check passed for a batch of 42.
Check passed for a batch of 45.
Check passed for a batch of 32.
Check passed for a batch of 109.
Check passed for a batch of 62.
Check passed for a batch of 79.
Check passed for a batch of 95.
Check passed for a batch of 88.

32930
Check passed for a batch of 1.
Check passed for a batch of 20.
Check passed for a batch of 24.
Check passed for a batch of 32.
Check passed for a batch of 30.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 20.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
Check passed for a batch of 31.
Check passed for a batch of 30.
Check passed for a batch of 32.
Check passed for a batch of 30.
Check passed for a batch of 21.
Check passed for a batch of 29.
Check passed for a batch of 21.
Check passed for a batch of 30.
Check passed for a batch of 23.
Check passed for a batch of 31.
Check passed for a batch of 26.
Check passed for a batch of 28.
Check passed for a batch of 45.
Check passed for a batch of 7.
Check passed for a batch of 31.
Check passed for a batch of 45.
Check passed for a batch of 24.
Check passed for a batch of 40.
Check passed for a batch of 30.
Check passed for a batch of 29.
Check passed for a batch of 38.
Check passed for a batch of 27.
Check passed for a batch of 39.
Check passed for a batch of 40.
Check passed for a batch of 39.
Check passed for a batch of 29.
Check passed for a batch of 33.
Check passed for a batch of 31.
Check passed for a batch of 46.
Check passed for a batch of 27.
Check passed for a batch of 28.
*** ERROR check failed for a batch of 40 both value and policy incorrect.
Check passed for a batch of 30.
Check passed for a batch of 66.

36092
Check passed for a batch of 1.
Check passed for a batch of 20.
Check passed for a batch of 20.
Check passed for a batch of 30.
Check passed for a batch of 28.
Check passed for a batch of 26.
Check passed for a batch of 25.
Check passed for a batch of 32.
Check passed for a batch of 26.
Check passed for a batch of 32.
Check passed for a batch of 27.
Check passed for a batch of 32.
Check passed for a batch of 25.
Check passed for a batch of 32.
Check passed for a batch of 23.
Check passed for a batch of 32.
Check passed for a batch of 22.
Check passed for a batch of 32.
Check passed for a batch of 31.
Check passed for a batch of 30.
Check passed for a batch of 30.
Check passed for a batch of 31.
Check passed for a batch of 31.
Check passed for a batch of 30.
Check passed for a batch of 32.
Check passed for a batch of 31.
Check passed for a batch of 31.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 31.
Check passed for a batch of 30.
Check passed for a batch of 36.
Check passed for a batch of 31.
*** ERROR check failed for a batch of 49 both value and policy incorrect.
Check passed for a batch of 34.
Check passed for a batch of 32.
Check passed for a batch of 43.
Check passed for a batch of 31.
Check passed for a batch of 30.
Check passed for a batch of 32.
Check passed for a batch of 33.
Check passed for a batch of 41.
Check passed for a batch of 37.
Check passed for a batch of 32.

37082
Check passed for a batch of 1.
*** ERROR check failed for a batch of 20 policy incorrect (but value ok).
*** ERROR check failed for a batch of 20 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 17 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
*** ERROR check failed for a batch of 30 both value and policy incorrect.
*** ERROR check failed for a batch of 29 both value and policy incorrect.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
*** ERROR check failed for a batch of 27 both value and policy incorrect.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 23 both value and policy incorrect.
*** ERROR check failed for a batch of 25 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 38 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 30 both value and policy incorrect.
*** ERROR check failed for a batch of 38 both value and policy incorrect.
*** ERROR check failed for a batch of 63 both value and policy incorrect.
*** ERROR check failed for a batch of 76 both value and policy incorrect.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
*** ERROR check failed for a batch of 96 both value and policy incorrect.
*** ERROR check failed for a batch of 49 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
*** ERROR check failed for a batch of 65 both value and policy incorrect.
*** ERROR check failed for a batch of 242 both value and policy incorrect.
*** ERROR check failed for a batch of 46 both value and policy incorrect.

41620
Check passed for a batch of 1.
*** ERROR check failed for a batch of 20 policy incorrect (but value ok).
*** ERROR check failed for a batch of 29 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
*** ERROR check failed for a batch of 28 both value and policy incorrect.
*** ERROR check failed for a batch of 94 both value and policy incorrect.
*** ERROR check failed for a batch of 29 both value and policy incorrect.
*** ERROR check failed for a batch of 28 both value and policy incorrect.
*** ERROR check failed for a batch of 91 both value and policy incorrect.
*** ERROR check failed for a batch of 21 both value and policy incorrect.
*** ERROR check failed for a batch of 22 both value and policy incorrect.
*** ERROR check failed for a batch of 30 both value and policy incorrect.
*** ERROR check failed for a batch of 39 both value and policy incorrect.
*** ERROR check failed for a batch of 27 both value and policy incorrect.
*** ERROR check failed for a batch of 29 both value and policy incorrect.
*** ERROR check failed for a batch of 80 both value and policy incorrect.
*** ERROR check failed for a batch of 68 both value and policy incorrect.
*** ERROR check failed for a batch of 90 both value and policy incorrect.
*** ERROR check failed for a batch of 113 both value and policy incorrect.
*** ERROR check failed for a batch of 110 both value and policy incorrect.
*** ERROR check failed for a batch of 115 both value and policy incorrect.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
Segmentation fault: 11

50680
Check passed for a batch of 1.
Check passed for a batch of 20.
Check passed for a batch of 20.
Check passed for a batch of 30.
Check passed for a batch of 25.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 15.
Check passed for a batch of 27.
Check passed for a batch of 32.
Check passed for a batch of 32.
Check passed for a batch of 26.
Check passed for a batch of 20.
Check passed for a batch of 24.
Check passed for a batch of 32.
Check passed for a batch of 31.
Check passed for a batch of 24.
Check passed for a batch of 31.
Check passed for a batch of 28.
Check passed for a batch of 31.
Check passed for a batch of 29.
Check passed for a batch of 32.
Check passed for a batch of 28.
Check passed for a batch of 32.
Check passed for a batch of 28.
Check passed for a batch of 32.
Check passed for a batch of 30.
Check passed for a batch of 26.
Check passed for a batch of 10.
Check passed for a batch of 4.
Check passed for a batch of 46.
Check passed for a batch of 5.
Check passed for a batch of 31.
Check passed for a batch of 30.
Check passed for a batch of 26.
Check passed for a batch of 36.
Check passed for a batch of 21.
Check passed for a batch of 39.
Check passed for a batch of 23.
Check passed for a batch of 24.
Check passed for a batch of 36.
Check passed for a batch of 30.
Check passed for a batch of 29.
Check passed for a batch of 32.
Check passed for a batch of 43.
Check passed for a batch of 30.
Check passed for a batch of 43.
Check passed for a batch of 31.
Check passed for a batch of 43.

@Tilps
Copy link
Contributor

Tilps commented Mar 21, 2019

there are known bad opencl drivers out there - was the original reason the check backend existed in lczero IIRC - in order to force such contributors to crash out if their opencl started doing bad things.

But you mentioned on discord that it seems most problematic with 80 channel policy head nets (T40/T37) - so that could potentially be a real bug with opencl implementation, or maybe just the very large fully connected layer after the 80 channel policy head triggers bugs in your opencl drivers.

@killerducky
Copy link
Contributor

Cannot reproduce on my Ubuntu system, so our implementation is probably ok. Guessing Mac OpenCL driver bug. We never found what the bug is for LZGo.

@Mardak
Copy link
Contributor Author

Mardak commented Mar 22, 2019

Yeah LZGo always had mac opencl issues, and gcp says there's not much we can do especially with Apple deprecating it with 10.14 Mojave for their own Metal leela-zero/leela-zero#1517 (comment).

Not sure if this needs to be highlighted more for people using t40 on macs, but seems like this problem happens to be avoided with t50+.

For reference if people are running into problems:

macOS 10.14.2
Platform version: OpenCL 1.2 (Oct 29 2018 21:43:16)
Device name:    AMD Radeon Pro 560X Compute Engine
Device driver:  1.2 (Nov 12 2018 21:13:18)

macOS 10.14.3
Platform version: OpenCL 1.2 (Oct 31 2018 21:59:22)
Device name:    AMD Radeon Pro 560X Compute Engine
Device driver:  1.2 (Dec 20 2018 21:22:54)

@Mardak Mardak closed this as completed Mar 22, 2019
@Mardak
Copy link
Contributor Author

Mardak commented Mar 22, 2019

For reference, 37001 and 40001 both trigger errors, so not something that appears as weights develop:

37001
Check passed for a batch of 1.
*** ERROR check failed for a batch of 20 policy incorrect (but value ok).
*** ERROR check failed for a batch of 20 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 50 both value and policy incorrect.
*** ERROR check failed for a batch of 35 both value and policy incorrect.

40001
Check passed for a batch of 1.
*** ERROR check failed for a batch of 20 policy incorrect (but value ok).
*** ERROR check failed for a batch of 20 policy incorrect (but value ok).
*** ERROR check failed for a batch of 32 policy incorrect (but value ok).
*** ERROR check failed for a batch of 27 policy incorrect (but value ok).
*** ERROR check failed for a batch of 25 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 29 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 31 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.
*** ERROR check failed for a batch of 32 both value and policy incorrect.

@gsobala
Copy link
Contributor

gsobala commented Mar 27, 2019

Here is an example of the opencl backend failing on macos with a batch_size of 4, NN 41585. (Net 11258 works fine with this tuning and batch_size)

lc0 --backend=opencl --backend-opts=batch_size=4
       _
|   _ | |
|_ |_ |_| v0.22.0-dev built Mar 26 2019
go nodes 2000
Found pb network file: ./41585.pb.gz
Creating backend [opencl]...
OpenCL, maximum batch size set to 4.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 (Feb 22 2019 20:16:07)
Platform profile: FULL_PROFILE
Platform name:    Apple
Platform vendor:  Apple
Device ID:      0
Device name:    Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
Device type:    CPU
Device vendor:  Intel
Device driver:  1.1
Device speed:   3200 MHZ
Device cores:   16 CU
Device score:   512
Device ID:      1
Device name:    AMD Radeon Pro Vega 56 Compute Engine
Device type:    GPU
Device vendor:  AMD
Device driver:  1.2 (Mar 11 2019 21:14:50)
Device speed:   786 MHZ
Device cores:   56 CU
Device score:   1112
Selected platform: Apple
Selected device: AMD Radeon Pro Vega 56 Compute Engine
with OpenCL 1.2 capability.
Loaded existing SGEMM tuning for batch size 4.
Wavefront/Warp size: 64

Max workgroup size: 256
Max workgroup dimensions: 256 256 256
info depth 1 seldepth 2 time 89 nodes 5 score cp 46 hashfull 0 nps 56 tbhits 0 pv e2e4 e7e5
info depth 2 seldepth 3 time 163 nodes 9 score cp 43 hashfull 0 nps 55 tbhits 0 pv e2e4 e7e5 g1f3
info depth 2 seldepth 4 time 237 nodes 14 score cp -85 hashfull 0 nps 59 tbhits 0 pv d2d4 g8f6 c2c4 e7e6
info depth 3 seldepth 5 time 310 nodes 17 score cp 159 hashfull 0 nps 54 tbhits 0 pv e2e4 e7e5 g1f3 b8c6 f3g5
info depth 3 seldepth 5 time 411 nodes 20 score cp 196 hashfull 0 nps 48 tbhits 0 pv g1f3 d7d5 d2d4 g8f6 c1d2 f6g4
info depth 3 seldepth 6 time 446 nodes 21 score cp 196 hashfull 0 nps 47 tbhits 0 pv g1f3 d7d5 d2d4 g8f6 c1d2 f6g4
info depth 5 seldepth 6 time 460 nodes 63 score cp -984 hashfull 0 nps 136 tbhits 0 pv e2e4 e7e5 g1f3 b8c6 f3g5 c6d4
info depth 5 seldepth 7 time 732 nodes 96 score cp 236 hashfull 1 nps 131 tbhits 0 pv d2d4 g8f6 c2c4 e7e6 h2h3 f8b4 d1d2
info depth 4 seldepth 7 time 1971 nodes 429 score cp -59 hashfull 4 nps 217 tbhits 0 pv d2d4 c7c6 c2c4 d7d6 g2g4 c8e6 g4g5
info depth 4 seldepth 8 time 1978 nodes 445 score cp -59 hashfull 4 nps 224 tbhits 0 pv d2d4 c7c6 c2c4 d7d6 g2g4 c8e6 g4g5
info depth 5 seldepth 9 time 2185 nodes 510 score cp -59 hashfull 4 nps 233 tbhits 0 pv d2d4 c7c6 c2c4 d7d6 g2g4 c8e6 g4g5
info depth 4 seldepth 9 time 2458 nodes 641 score cp -59 hashfull 5 nps 260 tbhits 0 pv d2d4 c7c6 c2c4 d7d6 g2g4 c8e6 g4g5
info depth 5 seldepth 9 time 2580 nodes 664 score cp 105 hashfull 5 nps 257 tbhits 0 pv c2c4 e7e5 b1c3 b7b6 h2h3 c8b7 g2g3
info depth 4 seldepth 9 time 3081 nodes 879 score cp 60 hashfull 6 nps 285 tbhits 0 pv c2c4 e7e5 b1c3 e8e7 d1c2 e7d6
info depth 5 seldepth 9 time 3193 nodes 929 score cp 123 hashfull 7 nps 290 tbhits 0 pv f2f4 d7d5 g2g3 b7b5 a2a3
Segmentation fault: 11

Tuning was

0;XgemmBatched;256;64;256;16; -DKWG=32 -DKWI=2 -DMDIMA=8 -DMDIMC=8 -DMWG=32 -DNDIMB=16 -DNDIMC=16 -DNWG=32 -DSA=1 -DSB=1 -DSTRM=0 -DSTRN=0 -DVWM=4 -DVWN=2;OpenCL: AMD AMD Radeon Pro Vega 56 Compute Engine @ 786MHz

Crashlog is

Thread 5 Crashed:
0   lc0                           	0x000000010a2c15f2 lczero::Edge_Iterator<false>::GetOrSpawnNode(lczero::Node*, std::__1::unique_ptr<lczero::Node, std::__1::default_delete<lczero::Node> >*) + 34
1   lc0                           	0x000000010a2caf47 lczero::SearchWorker::PickNodeToExtend(int) + 311
2   lc0                           	0x000000010a2ca93e lczero::SearchWorker::GatherMinibatch() + 110
3   lc0                           	0x000000010a2ca6f2 lczero::SearchWorker::ExecuteOneIteration() + 82
4   lc0                           	0x000000010a2d1508 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, lczero::Search::StartThreads(unsigned long)::$_4> >(void*) + 152
5   libsystem_pthread.dylib       	0x00007fff728fa2eb _pthread_body + 126
6   libsystem_pthread.dylib       	0x00007fff728fd249 _pthread_start + 66
7   libsystem_pthread.dylib       	0x00007fff728f940d thread_start + 13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants