You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.
Overall, it worked great, and I was super impressed by how smooth the process was. However, I found the processor utilisation a bit less than I would have expected. Here's the output from /usr/bin/time:
I ran this on a server with 40 threads, and I would have expected the process to basically max out all of them. Instead, usage rarely went over about 300% - it feels like there was a lot of lock contention or something. I don't think IO was the problem - it was running off spinning disk, so you might expect the random nature of the IO to hurt it, but I kept an eye on atop, and the disk didn't seem to be a bottleneck.
It's not particularly important to get into this now I think - there's not a big difference between this taking 6 hours and 1 hour right now. It'll be something to keep an eye on at some point though.
The text was updated successfully, but these errors were encountered:
+1 to a warning and an argument in a TBD export function. We should probably do the same in the readers to use threads scheduler by default since I think splitting data frames into arrays causes a lot of worker communication.
I ran a conversion of a 45 gig plink file as described over in https://github.com/pystatgen/sgkit/issues/48
Overall, it worked great, and I was super impressed by how smooth the process was. However, I found the processor utilisation a bit less than I would have expected. Here's the output from /usr/bin/time:
I ran this on a server with 40 threads, and I would have expected the process to basically max out all of them. Instead, usage rarely went over about 300% - it feels like there was a lot of lock contention or something. I don't think IO was the problem - it was running off spinning disk, so you might expect the random nature of the IO to hurt it, but I kept an eye on
atop
, and the disk didn't seem to be a bottleneck.It's not particularly important to get into this now I think - there's not a big difference between this taking 6 hours and 1 hour right now. It'll be something to keep an eye on at some point though.
The text was updated successfully, but these errors were encountered: