Driver Telematics Analysis

Driver Telematics Analysis is a Kaggle challenge. For more details, see the challenge page. Besides solving a machine learning problem, we want to learn how to use git and scikit-learn.

Submissions can be generated by running scripts from scripts directory, using root as working directory. Features implement a common interface and are stored inside features package. Utilities like plotting, i/o are part of utils package. Working notes are stored as IPython notebooks in notebooks directory.

The repository is now closed, the project has been finished.

Result

My participation is now over. Together with scigor we achieved place 613/1528 which brings us right to the lower end of top 40%. As reported by other participants, 77% accuracy is pretty much all you could achieve without doing trip matching and sophisticated model ensembling.

Not the best achievement ever, but the competition has taught us a lot of things. We learnt ipython notebooks, mastered git with branching and many troublesome merge conflicts, developed an object-oriented framework for evaluating different models, acquainted ourselves with scikit-learn, matplotlib, employed parallelization, numpy persistence, zipping and csv I/O - all thanks to one challenge.

#Features In the end, we based our classification model on the following features:

AccelerationFeature(10, 31, True, np.median),
AccelerationFeature(30, 51, True, np.median),
AccelerationFeature(50, 71, True, np.median),
AccelerationFeature(5, 130, True, np.median),
AccelerationFeature(10, 31, True, np.mean),
AccelerationFeature(30, 51, True, np.mean),
AccelerationFeature(50, 71, True, np.mean),
AccelerationFeature(5, 130, True, np.mean),
AccelerationFeature(10, 31, False, np.median),
AccelerationFeature(30, 51, False, np.median),
AccelerationFeature(50, 71, False, np.median),
AccelerationFeature(5, 130, False, np.median),
AccelerationFeature(10, 31, False, np.mean),
AccelerationFeature(30, 51, False, np.mean),
AccelerationFeature(50, 71, False, np.mean),
AccelerationFeature(5, 130, False, np.mean),
AngleFeature(0, np.mean),
AngleFeature(1, np.mean),
SpeedPercentileFeature(5),
SpeedPercentileFeature(95),
AccelerationPercentileFeature(5),
AccelerationPercentileFeature(95),
TripLengthFeature(),
AccelerationFeature(10, 31, True, np.mean, False),
AccelerationFeature(30, 51, True, np.mean, False),
AccelerationFeature(50, 71, True, np.mean, False),
AccelerationPercentileFeature(1)
AccelerationPercentileFeature(10)
AccelerationPercentileFeature(25)
AccelerationPercentileFeature(50)
AccelerationPercentileFeature(75)
AccelerationPercentileFeature(90)
AccelerationPercentileFeature(99)
AnglePercentileFeature(1)
AnglePercentileFeature(5)
AnglePercentileFeature(10)
AnglePercentileFeature(25)
AnglePercentileFeature(50)
AnglePercentileFeature(75)
AnglePercentileFeature(90)
AnglePercentileFeature(95)
AnglePercentileFeature(99)
SpeedPercentileFeature(1)
SpeedPercentileFeature(10)
SpeedPercentileFeature(25)
SpeedPercentileFeature(50)
SpeedPercentileFeature(75)
SpeedPercentileFeature(90)
SpeedPercentileFeature(99)

For feature code, see features module. The compiled features are not available in the git repository, but can easily be compiled locally using this script.

Models

We have evaluated a number of different approaches to classification and ended up with a GradientBoosting algorithm used in a cross-validation setting. For our models, see scripts.

Todos -> all done:

compute best RDP epsilon value -> dismissed, RDP is far too expensive
create script that reduces trips using RDP and stores them as *.npy -> completed
analyze article by Olariu -> completed
use sklearn's cross-correlation -> completed
understand how to measure score offline (maybe use cross-correlation's built-in score) -> completed
compute more features: -> completed
- more percentiles
- angle features
- use speed w/o interpolation

Useful Links

Scientific papers

Elliptic Envelope

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
features		features
notebooks		notebooks
scripts		scripts
submissions		submissions
utils		utils
.gitignore		.gitignore
README.md		README.md
final_result.png		final_result.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Driver Telematics Analysis

Result

Models

Todos -> all done:

Useful Links

Scientific papers

About

Releases 1

Packages

Contributors 2

Languages

pdyban/driverchallenge

Folders and files

Latest commit

History

Repository files navigation

Driver Telematics Analysis

Result

Models

Todos -> all done:

Useful Links

Scientific papers

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages