Skip to content

pdyban/driverchallenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Driver Telematics Analysis

Driver Telematics Analysis is a Kaggle challenge. For more details, see the challenge page. Besides solving a machine learning problem, we want to learn how to use git and scikit-learn.

Submissions can be generated by running scripts from scripts directory, using root as working directory. Features implement a common interface and are stored inside features package. Utilities like plotting, i/o are part of utils package. Working notes are stored as IPython notebooks in notebooks directory.

The repository is now closed, the project has been finished.

Result

My participation is now over. Together with scigor we achieved place 613/1528 which brings us right to the lower end of top 40%. As reported by other participants, 77% accuracy is pretty much all you could achieve without doing trip matching and sophisticated model ensembling.

Status board

Not the best achievement ever, but the competition has taught us a lot of things. We learnt ipython notebooks, mastered git with branching and many troublesome merge conflicts, developed an object-oriented framework for evaluating different models, acquainted ourselves with scikit-learn, matplotlib, employed parallelization, numpy persistence, zipping and csv I/O - all thanks to one challenge.

#Features In the end, we based our classification model on the following features:

  1. AccelerationFeature(10, 31, True, np.median),
  2. AccelerationFeature(30, 51, True, np.median),
  3. AccelerationFeature(50, 71, True, np.median),
  4. AccelerationFeature(5, 130, True, np.median),
  5. AccelerationFeature(10, 31, True, np.mean),
  6. AccelerationFeature(30, 51, True, np.mean),
  7. AccelerationFeature(50, 71, True, np.mean),
  8. AccelerationFeature(5, 130, True, np.mean),
  9. AccelerationFeature(10, 31, False, np.median),
  10. AccelerationFeature(30, 51, False, np.median),
  11. AccelerationFeature(50, 71, False, np.median),
  12. AccelerationFeature(5, 130, False, np.median),
  13. AccelerationFeature(10, 31, False, np.mean),
  14. AccelerationFeature(30, 51, False, np.mean),
  15. AccelerationFeature(50, 71, False, np.mean),
  16. AccelerationFeature(5, 130, False, np.mean),
  17. AngleFeature(0, np.mean),
  18. AngleFeature(1, np.mean),
  19. SpeedPercentileFeature(5),
  20. SpeedPercentileFeature(95),
  21. AccelerationPercentileFeature(5),
  22. AccelerationPercentileFeature(95),
  23. TripLengthFeature(),
  24. AccelerationFeature(10, 31, True, np.mean, False),
  25. AccelerationFeature(30, 51, True, np.mean, False),
  26. AccelerationFeature(50, 71, True, np.mean, False),
  27. AccelerationPercentileFeature(1)
  28. AccelerationPercentileFeature(10)
  29. AccelerationPercentileFeature(25)
  30. AccelerationPercentileFeature(50)
  31. AccelerationPercentileFeature(75)
  32. AccelerationPercentileFeature(90)
  33. AccelerationPercentileFeature(99)
  34. AnglePercentileFeature(1)
  35. AnglePercentileFeature(5)
  36. AnglePercentileFeature(10)
  37. AnglePercentileFeature(25)
  38. AnglePercentileFeature(50)
  39. AnglePercentileFeature(75)
  40. AnglePercentileFeature(90)
  41. AnglePercentileFeature(95)
  42. AnglePercentileFeature(99)
  43. SpeedPercentileFeature(1)
  44. SpeedPercentileFeature(10)
  45. SpeedPercentileFeature(25)
  46. SpeedPercentileFeature(50)
  47. SpeedPercentileFeature(75)
  48. SpeedPercentileFeature(90)
  49. SpeedPercentileFeature(99)

For feature code, see features module. The compiled features are not available in the git repository, but can easily be compiled locally using this script.

Models

We have evaluated a number of different approaches to classification and ended up with a GradientBoosting algorithm used in a cross-validation setting. For our models, see scripts.

Todos -> all done:

  • compute best RDP epsilon value -> dismissed, RDP is far too expensive
  • create script that reduces trips using RDP and stores them as *.npy -> completed
  • analyze article by Olariu -> completed
  • use sklearn's cross-correlation -> completed
  • understand how to measure score offline (maybe use cross-correlation's built-in score) -> completed
  • compute more features: -> completed
    • more percentiles
    • angle features
    • use speed w/o interpolation

Useful Links

Scientific papers

About

Driver telematic signature, a Kaggle challenge

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages