The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks 🌋

==========================

📎 As known : Vanishing Gradient problem appear from using Tanh or Sigmoid Activation Functions & Exploding Gradient problem appear from using Relu Family Activation Functions.

💡 If we used aggregation between two activation functions it will balance between Vanishing and Exploding problems ??

__________________________________________________________

like using aggregation between Tanh & Leaky RELU. 👇

📌 AD2(z) = Tanh(z) + 0.5*Leaky_Relu(z) <-- blue line in image

__________________________________________________________

However, Learning speed & Accuracy won't decrease when using aggregation than using Leaky RELU only 📊

(please check this notebook in Colab link --> https://lnkd.in/dkhkfWJA).

__________________________________________________________

Difference between just Leaky_Relu (green) and Aggregation 2 functions in post (blue). 📏

it seem decrease exploding because it decrease inputs, Aren't it? 🤔

Demos link: https://lnkd.in/dNdcFizE

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
AD2_Activation_Function.ipynb		AD2_Activation_Function.ipynb
Diff.png		Diff.png
README.md		README.md
desmos-graph (1).png		desmos-graph (1).png
desmos-graph.png		desmos-graph.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks 🌋

📎 As known : Vanishing Gradient problem appear from using Tanh or Sigmoid Activation Functions & Exploding Gradient problem appear from using Relu Family Activation Functions.

💡 If we used aggregation between two activation functions it will balance between Vanishing and Exploding problems ??

__________________________________________________________

like using aggregation between Tanh & Leaky RELU. 👇

📌 AD2(z) = Tanh(z) + 0.5*Leaky_Relu(z) <-- blue line in image

__________________________________________________________

However, Learning speed & Accuracy won't decrease when using aggregation than using Leaky RELU only 📊

(please check this notebook in Colab link --> https://lnkd.in/dkhkfWJA).

__________________________________________________________

Difference between just Leaky_Relu (green) and Aggregation 2 functions in post (blue). 📏

it seem decrease exploding because it decrease inputs, Aren't it? 🤔

Demos link: https://lnkd.in/dNdcFizE

About

Releases

Packages

Languages

Ahmed-G-ElTaher/The-Challenge-of-Vanishing-Exploding-Gradients-in-Deep-Neural-Networks

Folders and files

Latest commit

History

Repository files navigation

The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks 🌋

📎 As known : Vanishing Gradient problem appear from using Tanh or Sigmoid Activation Functions & Exploding Gradient problem appear from using Relu Family Activation Functions.

💡 If we used aggregation between two activation functions it will balance between Vanishing and Exploding problems ??

__________________________________________________________

like using aggregation between Tanh & Leaky RELU. 👇

📌 AD2(z) = Tanh(z) + 0.5*Leaky_Relu(z) <-- blue line in image

__________________________________________________________

However, Learning speed & Accuracy won't decrease when using aggregation than using Leaky RELU only 📊

(please check this notebook in Colab link --> https://lnkd.in/dkhkfWJA).

__________________________________________________________

Difference between just Leaky_Relu (green) and Aggregation 2 functions in post (blue). 📏

it seem decrease exploding because it decrease inputs, Aren't it? 🤔

Demos link: https://lnkd.in/dNdcFizE

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages