Linear Discriminant Analysis

The program is used to practice linear (predictive) discriminant analysis. It is based on the SAS DISCRIM and STEPDISC procedures. The program produces ranking functions, allows to make a prediction from a data set and a variable selection mechanism has been implemented. Automatic reporting of results is also available, in HTML or PDF format.

Carried out by AbdDia and ASKruchinina and Dataymeric as part of a project in L3 IDS (Bachelor in Computer Science and Statistics for Data Science) at the Université Lumière Lyon 2 (France).

Usage and explanations

A demo program has been made available (demo_program.py) with the corresponding data in the data folder.

After loading your data using the pandas library and finding the name of the categorical variable, the user will have to create an object from the LinearDiscriminantAnalysis class of the discriminant_analysis.py module by specifying the two parameters (dataset and name of the categorical variable).

from discriminant_analysis import LinearDiscriminantAnalysis as LDA

lda = LDA(dataset, varName)

From there, the user will be able to directly call the different methods to perform the calculations he wants, as well as the different attributes of the created object, if he wants to retrieve specific parameters.

Attributes available after the creation of the :

dataset: the data set
classLabel: the name of the target variable
classNames: the names of the values taken for the target variable
varNames: the names of the explanatory variable
n: the sample size
p: the number of explanatory variables
K: the number of classes
V: the total covariance matrix
Vb: the biased covariance matrix
W: the intra-class covariance matrix
Wb: the biased intra-class covariance matrix

Thus, to display the biased covariance matrix, just do :

lda.Vb

Discriminant functions

fit() produces the discriminant functions.

lda.fit()

Attributes available after calling :

intercept_: the intercept computed by the model
coef_: the coefficients of the model
infoFuncClassification: the values of the classification function

Prediction function

predict() allows to make a prediction from a set of data and returns the vector of the predicted classes.

y_pred = lda.predict(values_a_predict)

Confusion matrix and accuracy score

Having predictions is all very well, but how can we know or rather measure whether these predictions are correct or not? The confusion matrix and the accuracy rate help.

The function confusion_matrix() (confusion matrix) "provides an unbiased assessment of the model's performance in deployment". It takes the vector of true values of the target variable that were not used during training and the vector of predicted target values. Thus, one can estimate whether the function is capable of good prediction. The function returns the numerical matrix. The function also returns a graph of the confusion matrix.

The function accuracy_score() calculates the prediction rate. It takes as input the true values of the target variable and the predictions and returns the proportion of correct predictions.

lda.confusion_matrix(y_true, y_pred)
lda.accuracy_score(y_true, y_pred)

Attributes available after calling :

confusionMatrix: the confusion matrix
confusionMatrixGraph: graph of the confusion matrix
accuracy: the accuracy's score

Variable selection (Stepwise Discriminant Analysis)

The function stepdisc() allows to realize a stepwise discriminant analysis. It allows the user to take a forward approach or a backward approach. It is necessary to set the stopping threshold (risk).

lda.stepdisc(slentry, method)

Example: forward selection method

lda.stepdisc(0.01, 'forward')

Attributes available after calling :

infoStepResults: displays the different values for the last step
stepdiscSummary: summary of the variable selection approach

Wilks decay curve

The function wilks_decay() is used to calculate the value of Wilks' lambda as a function of the number of selected explanatory variables and to display the curve of the Wilks' lambda decay. For example, if there are 40 explanatory variables, the lambda values will be for {1, 2, ..., 40}.

lda.wilks_decay()
lda.figWilksDecay

Attributes available after calling :

infoWilksDecay: list of wilks lambda values according to the number of selected variables
figWilksDecay: Wilks lambda decay curve

Reporting

Discriminant functions (`fit()`)

Provides output similar to SAS PROC DISCRIM, in HTML or PDF output. Requires importing the project reporting module.

from reporting import HTML, PDF

HTML().discrim_html_output(lda, outputFileName.html)
PDF().discrim_pdf_output(lda, outputFileName.pdf)

Variable selection (`stepdisc()`)

Provides output similar to SAS PROC STEPDISC, in HTML output. Requires importing the reporting module of the project.

from reporting import HTML

HTML().stepdisc_html_output(lda, outputFileName.html)

References

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning : with Applications in R (Springer Texts in Statistics) (1st ed. 2013, Corr. 7th printing 2017 éd.). Springer.
Jia Li. Linear Discriminant Analysis [Diapositives]. Department of Statistics of The Pennsylvania State University.
Ricco Rakotomalala. Cours Analyse Discriminante.
Ricco Rakotomalala. (2020). Analyse Discriminante Linéaire sous R.
Ricco Rakotomalala. (2020). Pratique de l’Analyse Discriminante Linéaire.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
README.md		README.md
calculations.py		calculations.py
demo_program.py		demo_program.py
discriminant_analysis.py		discriminant_analysis.py
reporting.py		reporting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Discriminant Analysis

Usage and explanations

Attributes available after the creation of the :

Discriminant functions

Attributes available after calling :

Prediction function

Confusion matrix and accuracy score

Attributes available after calling :

Variable selection (Stepwise Discriminant Analysis)

Example: forward selection method

Attributes available after calling :

Wilks decay curve

Attributes available after calling :

Reporting

Discriminant functions (`fit()`)

Variable selection (`stepdisc()`)

References

About

Languages

dataymeric/LinearDiscriminantAnalysis

Folders and files

Latest commit

History

Repository files navigation

Linear Discriminant Analysis

Usage and explanations

Attributes available after the creation of the :

Discriminant functions

Attributes available after calling :

Prediction function

Confusion matrix and accuracy score

Attributes available after calling :

Variable selection (Stepwise Discriminant Analysis)

Example: forward selection method

Attributes available after calling :

Wilks decay curve

Attributes available after calling :

Reporting

Discriminant functions (fit())

Variable selection (stepdisc())

References

About

Resources

Stars

Watchers

Forks

Languages

Discriminant functions (`fit()`)

Variable selection (`stepdisc()`)