Hey you ! 🫵😎 In this repo follow the notebook to go from xml containing data from extracted papers to a dataset that can then be rephrased thanks to whatever model you chose on either Slurm or Runai job schedulers to generate a LOT of data !
Morevover, there's a .ipynb attempting different things based off the Fractal Patterns May Illuminate the Success of Next-Token Prediction paper in a subdirectory to play around with !
Everything is in the notebook in this repo, simply download it and start !