Skip to content

Latest commit

 

History

History
100 lines (71 loc) · 3.67 KB

research-arabic-diacritization-06-2021.adoc

File metadata and controls

100 lines (71 loc) · 3.67 KB

Literature and Codes

Last updated: 2021-06.

We review only the more advanced technologies.

Older solutions used rules based approaches.

Deep Learning was applied relatively to the problem of diacritization, gradually getting better results than rules based approaches.

References

Mishkal, Arabic text vocalization software

  • Zerrouki, T.

  • rules based library, 2014

  • code

Automatic minimal diacritization of Arabic texts

  • Rehab Alnefaiea, Aqil M.Azmib

  • 11.2017

  • MADAMIRA software

  • paper

An Approach for Arabic Diacritization

  • Ismail Hadjir, Mohamed Abbache, Fatma Zohra Belkredim

  • 06.2019

  • keywords: Hidden Markov Models, Viterbi algorithm

  • article

Diacritization of Moroccan and Tunisian Arabic Dialects: A CRF Approach

  • Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Younes Samih, Mohammed Attia

  • 2018

  • keywords: Conditional Random Fields, arabic dialects…​

  • paper

Arabic Text Diacritization Using Deep Neural Networks

  • Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh, Mahmoud Al-Ayyoub

  • Shakkala library, tensorflow

  • 04.2019

  • keywords: Embedding, LSTM

  • paper

  • code, tensorflow

  • benchmarks&scripts

Highly Effective Arabic Diacritization using Sequence to Sequence Modeling

  • Hamdy Mubarak, Ahmed Abdelali, Hassan Sajjad, Younes Samih, Kareem Darwish

  • 06.2019

  • keywords: seq2seq(LSTM), NMT, interesting representation units, context window, voting

  • paper

Multi-components System for Automatic Arabic Diacritization

  • Hamza Abbad, Shengwu Xiong

  • 04.2020

  • keywords: LSTM’s, parallel layers for Shadda and Harakat (⇒ pipeline)

  • paper

  • code, tensorflow

Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

  • Badr AlKhamissi, Muhammad N. ElNokrashy, and Mohamed Gabr

  • 12.2020

  • keywords: Cross-level attention, Encoder-Decoder (LSTM), Teacher forcing,

  • paper

  • slides

  • code, pytorch

Effective Deep Learning Models for Automatic Diacritization of Arabic Text

  • Mokthar Ali Hasan Madhfar; Ali Mustafa Qamar

  • 12.2020

  • keywords: embedding, encoder-decoder (LSTM), Highway Nets, Attention, CBHG Module

  • paper

  • code, pytorch

A Deep Belief Network Classification Approach for Automatic Diacritization of Arabic Text

  • Mohammad Aref Alshraideh, Mohammad Alshraideh and Omar Alkadi

  • 4.2021

  • keywords: DBN built with Boltzmann restricted machines (restricted RBM’s) superior to LSTMs, unicode encoding, Borderline-SMOTE

  • paper

Research ideas

Here we just mention some 2021-ish ideas mentioned in the recent papers above:

  • Transformer-based Encoders

  • Byte-pair-encodings

  • Improve Injected Hints Method (train with semi diacritised data)

  • More Interpretable Attention Weights

  • Deep belief networks

  • More data and data processing