Section | Title | Contents |
---|---|---|
00 | Getting Started | Estimators, Transformers, Preprocessors, Pipelines, Model Evaluation, Parameter Searches, Next Steps |
01 | Linear Models | OLS, Ridge, Lasso, Elastic-Net, Least Angle Regression (LARS), LARS Lasso, OMP, Naive Bayes, Generalized Linear Models (GLM), Tweedie Models, Stochastic Gradient Descent (SGD), Perceptrons, Passive-Aggressive Algos, Polynomial Regression |
01a | Logistic Regression | Basics, Examples |
01b | Splines | Polynomial Regression & Basis Functions, Periodic splines |
01c | Quantile Regression | Examples, QR vs linear regression |
01d | Outliers | Robustness, RANSAC, Huber, Thiel-Sen |
02 | Discriminant Analysis | LDA, QDA, Math Foundations, Shrinkage, Estimators |
03 | Kernel Ridge Regression | KRR vs SVR |
04 | Support Vector Machines (SVMs) | Classifiers, Regressors, Scoring, Weights, Complexity, Kernels |
05 | Stochastic Gradient Descent (SGD) | Classifiers, Solvers, Regressors, Sparse Data; Complexity; Stopping/Convergence; Tips |
06 | K Nearest Neighbors (KNN) | Algos (Ball Tree, KD Tree, Brute Force), Radius-based KNN, Nearest Centroid Classifiers, Caching, Neighborhood Components Analysis (NCA) |
07 | Gaussian Processes (GPs) | Regressors, Classifiers, Kernels |
08 | Cross Decomposition | Partial Least Squares (PLS), Canonical PLS, SVD PLS, PLS Regression, Canonical Correlation Analysis (CCA) |
09 | Naive Bayes (NB) | Gaussian NB, Multinomial NB, Complement NB, Bernoulli NB, Categorical NB, Out-of-core fitting |
10 | Decision Trees (DTs) | Classifiers, Graphviz, Regressions, Multiple Outputs, Extra Trees, Complexity, Algorithms, Gini, Entropy, Misclassification, Minimal cost-complexity Pruning |
11a | Ensembles/Bagging | Methods, Random Forests, Extra Trees, Parameters, Parallel Execution, Feature Importance, Random Tree Embedding |
11b | Ensembles/Boosting | Gradient Boosting (GBs), GB Classifiers, GB Regressions, Tree Sizes, Loss Functions, Shrinkage, Subsampling, Feature Importance, Histogram Gradient Boosting (HGB), HGB - Monotonic Constraints |
11ba | Ensembles/Boosting/Adaboost | examples |
11c | Ensembles/Voting | Hard Voting, Soft Voting, Voting Regressor |
11d | Ensembles/General Stacking | Summary |
12 | Multiclass/Multioutput Problems | Label Binarization, One vs Rest (OvR), One vs One (OvO) Classification, Output Codes, Multilabel, Multioutput Classification, Classifier Chains, Multioutput Regressions, Regression Chains |
13 | Feature Selection (FS) | Removing Low-Variance Features, Univariate FS, |
14 | Semi-Supervised | Self-Training Classifier, Label Propagation, Label Spreading |
15 | Isotonic Regression | Example |
16 | Calibration Curves | Intro/Example, Cross-Validation, Metrics, Regressors |
17 | Perceptrons | Intro, Classification, Regression, Regularization, Training, Complexity, Tips |
21 | Gaussian Mixtures (GMs) | Expectation Maximization, Variational Bayes GM |
22 | Manifolds | Isomap, Locally Linear Embedding (LLE), Modified LLE, Hessian LLE, Local Tangent Space Alignment (LTSA), Multidimensional Scaling (MDS), Random Trees Embedding, Spectral Embedding, t-SNE, Neighborhood Components Analysis (NCA) |
23 | Clustering | K-Means, Voronoi Diagrams, Affinity Propagation, Mean Shift, Spectral Clustering, Agglomerative Clustering, Dendrograms, Connectivity Constraints, Distance Metrics, DBSCAN, Optics, Birch |
23a | Clustering Metrics | Rand Index, Mutual Info Score, Homogeneity, Completeness, V-Measure, Fowlkes-Mallows, Silhouette Coefficient, Calinski-Harabasz, Davies-Bouldin, Contingency Matrix, Pair Confusion Matrix |
24 | Biclustering | Spectral Co-Clustering, Spectral Bi-Clustering, Metrics |
25 | Component Analysis / Matrix Factorization | PCA, Incremental PCA, PCA w/ Random SVD, PCA w/ Sparse Data, Kernel PCA, Dimension Reduction Comparison, Truncated SVD / LSA, Dictionary Learning, Factor Analysis, Independent Component Analysis, Non-Negative Matrix Factorization (NNMF), Latent Dirichlet Allocation (LDA) |
26 | Covariance | Empirical CV, Shrunk CV, Max Likelihood Estimation (MLE), Ledoit-Wolf Shrinkage, Oracle Approximating Shrinkage, Sparse Inverse CV, aka Precision Matrix, Mahalanobis Distance |
27 | Novelties & Outliers | One-Class SVMs, Elliptic Envelope, Isolation Forest, Local Outlier Factor |
28 | Density Estimation (DE) | Histograms, Kernel DE |
29 | Restricted Boltzmann Machines (RBMs) | Intro, Training |
31 | Cross Validation (CV) | Intro, Metrics, Parameter Estimation, Pipelines, Prediction Plots, Nesting, K-Fold, Stratified K-Fold, Leave One Out, Leave P Out, Class Label CV, Grouped Data CV, Predefined Splits, Time Series Splits, Permutation Testing, Visualizations |
32 | Parameter Tuning | Grid Search, Randomized Optimization, Successive Halving, Composite Estimators & Parameter Spaces, Alternative to Brute Force, Info Criteria (AIC, BIC) |
33 | Metrics & Scoring (Intro) | scoring, make_scorer |
33a | Classification Metrics | Accuracy, Top-K Accuracy, Balanced Accuracy, Cohen's Kappa, Confusion Matrix, Classification Report, Hamming Loss, Precision, Recall, F-Measure, Precision-Recall Curve, Average Precision, Jaccard Similarity, Hinge Loss, Log Loss, Matthews Correlation Coefficient, Receiver Operating Characteristic (ROC) Curves, ROC-AUC, Detection Error Tradeoff (DET), Zero One Loss, Brier Score |
33b | Multilabel Ranking Metrics | Coverage Error, Label Ranking Avg Precision (LRAP), Label Ranking Loss, Discounted Cumulative Gain (DCG), Normalized DCG |
33c | Regression Metrics | Explained Variance, Max Error, Mean Absolute Error (MAE), Mean Squared Error (MSE), Mean Squared Log Error (MSLE), Mean Absolute Pct Error (MAPE), R^2 score, aka Coefficient of Determination , Tweedie Deviances |
33d | Dummy Metrics | Dummy Classifiers, Dummy Regressors |
34 | Viz/Validation | Validation Curve, Learning Curve |
41 | Viz/Inspection | 2D PDPs, 3D PDPs, Individual Conditional Expectation (ICE) Plot |
42 | Viz/Permutations | Permutation Feature Importance (PFI), Impurity vs Permutation Metrics |
50a | Viz/ROC Curves | ROC Curve |
50b | Viz/custom PDP Plots | Example |
50c | Vis/Classification metrics | Confusion Matrix, ROC Curve, Precision-Recall Curve |
61 | Composite Transformers | Pipelines, Caching, Regression Target xforms, Feature Unions, Column Transformers |
62a | Text Feature Extraction | Bag of Words (BoW), Sparsity, Count Vectorizer, Stop Words, Tf-Idf, Binary Markers, Text file decoding, Hashing Trick, Out-of-core Scaling, Custom Vectorizers |
62b | Image Patch Extraction | Extract from Patches, Reconstruct from Patches, Connectivity Graphs |
63 | Data Preprocessing | Scaling, Quantile Transforms, Power Maps (Box-Cox, Yeo-Johnson), Category Coding, One-Hot Coding, Quantization aka Binning, Feature Binarization |
64 | Missing Value Imputation | Univariate, Multivariate, Multiple-vs-Single, Nearest-Neighbors, Marking Imputed Values |
66 | Random Projections | Johnson-Lindenstrauss lemma, Gaussian RP, Sparse RP Empirical Validation |
67 | Kernel Approximations | Nystroem, RBF Sampler, Additive Chi-Squared Sampler, Skewed Chi-Squared Sampler, Polynomial Sampling - Tensor Sketch |
68 | Pairwise Ops | Distances vs Kernels, Cosine Similarity, Kernels |
69 | Transforming Prediction Targets | Label Binarization, Multilabel Binarization, Label Encoding |
71 | Toy Datasets | Boston, Iris, Diabetes, Digits, Linnerud, Wine, Breast Cancer, Olivetti faces, 20 newsgroups, Labeled faces, Forest covertypes, Reuters corpus, KDD, Cal housing |
73 | Artificial Data | random-nclass-data, Gaussian blobs, Gaussian quantiles, Circles, Moons, Multilabel class data, Hastie data, BiClusters, Checkerboards, Regression, Friedman1/2/3, S-Curve, Swiss Roll, Low-Rank Matrix, Sparse Coded Signal, Sparse Symmetric Positive Definite (SPD) Matrix |
74 | Other Data | Sample images, SVMlight/LibSVM formats, OpenML, pandas.io, scipy.io, numpy.routines.io, scikit-image, imageio, scipy.io.wavfile |
81 | Scaling | Out-of-core ops (BUG = TODO) |
82 | Latency | Bulk-vs-atomic ops, Latency vs Validation, Latency vs #Features, Latency vs Datatype, Latency vs Feature Extraction, Linear Algebra Libs (BLAS, LAPACK, ATLAS, OpenBLAS, MKL, vecLib) |
83 | Parallelism | JobLib, OpenMP, NumPy, Oversubscription, config switches |
90 | Persistence | Pickle, Joblib |
-
Notifications
You must be signed in to change notification settings - Fork 0
bjpcjp/scikit-learn
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Updates in progress. Jupyter workbooks will be added as time allows.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published