Skip to content

FreEM-corpora/FreEMnorm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FreEM Norm corpus

DOI

- WARNING: This repository is the new repository of [PARALLEL17](https://github.com/e-ditiones/PARALLEL17), which is not maintained anymore

Parallel corpus (diplomatic vs normalised) of 17th c. French texts.

For more information about FreEM corpora, cf. our website.

Corpus

The corpus is available in the corpus folder.

A detailed list of the content is available here.

Transcriptions

Transcripts are almost diplomatic. Long ſ is maintained ( plaiſir and not plaisir). Ligatures which have disappeared ( ſt, st, ct) are not kept, but not those that are maintained in contemporary French (œ, æ).

Use the normaliser

[TO DO]

Contribute

If you want to contribute, you can do so by cloning the repository and sending us a pull request, or by sending an email at simon.gabay[at]unige.ch.

Acknowledgments

Additional data and corrections have been provided by Philippe Gambette (GitHub) and Jonathan Poinhos.

Cite this repository

If you use the data:

@software{gabay_simon_2022_6481179,
  author       = {Gabay, Simon and
                  Gambette, Philippe},
  title        = {{FreEM-corpora/FreEMnorm: FreEM norm Parallel
                   (original vs. normalised) corpus for Early Modern
                   French}},
  month        = jan,
  year         = 2022,
  note         = {If you use this software, please cite it as below.},
  publisher    = {Zenodo},
  version      = {1.0.1},
  doi          = {10.5281/zenodo.6481179},
  url          = {https://doi.org/10.5281/zenodo.6481179}
}

You can also additionnally use one of our latest publications:

@inproceedings{gabay:hal-02276150,
  TITLE = {{A Workflow For On The Fly Normalisation Of 17th c. French}},
  AUTHOR = {Gabay, Simon and Riguet, Marine and Barrault, Lo{\"i}c},
  URL = {https://hal.archives-ouvertes.fr/hal-02276150},
  BOOKTITLE = {{DH2019}},
  ADDRESS = {Utrecht, Netherlands},
  ORGANIZATION = {{ADHO}},
  YEAR = {2019},
  MONTH = Jul,
  KEYWORDS = {17th Century France ; Parallel corpus building},
  PDF = {https://hal.archives-ouvertes.fr/hal-02276150/file/DH2019_final.pdf},
  HAL_ID = {hal-02276150},
  HAL_VERSION = {v1},
}
@inproceedings{gabay:hal-02596669,
  TITLE = {{Traduction automatique pour la normalisation du fran{\c c}ais du XVII e si{\`e}cle}},
  AUTHOR = {Gabay, Simon and Barrault, Lo{\"i}c},
  URL = {https://hal.archives-ouvertes.fr/hal-02596669},
  BOOKTITLE = {{TALN 2020}},
  ADDRESS = {Nancy, France},
  ORGANIZATION = {{ATALA}},
  SERIES = {27{\`e}me Conf{\'e}rence sur le Traitement Automatique des Langues Naturelles},
  YEAR = {2020},
  MONTH = Jun,
  KEYWORDS = {Normalisation ; 17th c French ; Neural Machine Translation (NMT) ; Statistical Machine Translation (SMT) ; Digital humanities ; Humanit{\'e}s num{\'e}riques ; Fran{\c c}ais classique ; Traduction automatique neuronale ; Traduction automatique statistique},
  PDF = {https://hal.archives-ouvertes.fr/hal-02596669/file/main.pdf},
  HAL_ID = {hal-02596669},
  HAL_VERSION = {v1},
}
@inproceedings{gabay:hal-03596653,
  TITLE = {{Automatic Normalisation of Early Modern French}},
  AUTHOR = {Bawden, Rachel and Poinhos, Jonathan and Kogkitsidou, Eleni and Gambette, Philippe and Sagot, Beno{\^i}t and Gabay, Simon},
  URL = {https://hal.inria.fr/hal-03596653},
  BOOKTITLE = {{Proceedings of the 13th Language Resources and Evaluation Conference}},
  ADDRESS = {Marseille, France},
  ORGANIZATION = {{European Language Resources Association}},
  YEAR = {2022},
  MONTH = Jun,
  HAL_ID = {hal-03540226},
  HAL_VERSION = {v1},
}

Please keep me posted if you use this data!

Contact

simon.gabay[at]unige.ch

Licence

Licence Creative Commons
This work is licensed under a Creative Commons Attribution 4.0 International Licence.

About

Parallel corpus for Early Modern French

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages