The CETEIcean extension is an extension to MediaWiki which implements the CETEIcean library (pronounce: /sɪˈti:ʃn/
) to let users collaborate on TEI XML documents and present them on the wiki. CETEIcean converts your source document using HTML5 Custom Elements (CE) of the Web Components standards, preserving much of the structure of the original document. This extension attempts to combine the best of both worlds: XSLT for the initial stage of processing XML and JavaScript for registering CE and applying custom behaviours.
Its support for working with TEI XML is twofold: first, TEI XML documents can be created, edited and displayed in a dedicated namespace of the wiki; and second, a parser function (#cetei
) can be used to embed documents, or even discrete sections of them, inline in wikitext.
Since this extension is created for the CODECS website (https://codecs.vanhamel.nl), a project published by the A. G. van Hamel Foundation for Celtic Studies, functionality at this stage is likely to be tied closely to the needs of the CODECS platform. It is not dependent on this environment, however, and you’re welcome to try it out and provide feedback or patches.
This extension creates a dedicated namespace with the Cetei:
namespace prefix, which is where TEI XML documents can be stored and displayed.
The section headed "Text" renders the document. The TEI Header is hidden by default, but its visibility can be toggled on and off. A message appears instead if none has been provided.
The section headed "About" is intended for metadata, i.e. information about the document. It lets you transclude wikitext content from a /doc
subpage, similar to how the Scribunto extension lets you associate documentation pages with Lua modules. If you have installed Semantic MediaWiki, you can add semantic properties. Just be aware that it is up to you to prevent semantic information from becoming duplicated as a result of transclusion. The rationale is that it should be up to you whether semantic data gets attached to the document (through transclusion) or to the /doc
subpage. See Customisation below for some help.
To allow others to inspect the shape of the document and maybe learn from it, the raw source code is directly exposed in the final tabbed section.
To best assist your users in editing documents, it is recommended that you have both CodeEditor and WikiEditor installed on your wiki. Here is why:
The extension hooks into CodeEditor to support XML editing with Ace. Ace is a popular embeddable code editor whose features include colour-coded syntax highlighting, tabs, automatic indentation, line numbering, code folding and syntax checking.
The toolbar at the top that comes with WikiEditor is configured and extended to serve as an aid to writing (TEI) XML. CodeEditor adds some useful features, including search/replace, soft wrap, indentation and a button to let you toggle back and forth between CodeEditor and the regular WikiEditor interface in case you have need of it.
The CETEIcean extension extends the toolbar further by introducing
- a new booklet section, "TEI XML", which lets you insert elements and code snippets into the document. Please be aware that the present arrangement is basic and only provisional. It is currently organised into a number of categories, such as "Preliminaries", "Verse" and "Dictionaries", and will be revised, re-arranged and expanded in the future.
- a button to launch the editor in full-screen mode.
Use the #cetei
parser function to retrieve a document, or select excerpts from it, on a regular wiki page:
To retrieve the document, use either doc
or url
. The doc
parameter expects the title of the wiki page you want to include.
{{#cetei:doc=Cetei:MyTEIpage
}}
Instead of doc
, you can also use the url
parameter, which expects a full URL that is both public and accessible to the server. This may refer to a wiki page, a document on the server, or possibly, if CORS is enabled for a particular remote source, an external document. To retrieve a wiki page in this way, use the fullurl
magic word with the action=raw
parameter, as shown below (note that it depends on your $wgServer
setting whether you should use the prefix https
).
{{#cetei:url=https://example.com/mywiki/my-tei-xml-file.xml
}}
{{#cetei:url=https:{{fullurl:Cetei:Some document|action=raw}}
}}
When either doc
or url
is used without a further argument, #cetei
will attempt to retrieve the full document. Alternatively, you can fetch one or multiple excerpts by running a simple XPath query: add your XPath expression to a second parameter called sel
, which is short for selector. The namespace prefix registered for this is ctc
.
{{#cetei:doc=...
|sel=//ctc:...
}}
Example. The following retrieves the paragraph (element p
) where the attribute xml:id
has a value of "p2"
:
{{#cetei:doc=Cetei:Some document
|sel=//ctc:p[@xml:id='p2']
}}
For reasons that are specific to MediaWiki, you cannot use the pipe character (|
) as OR operator, but there is a simple workaround: write {{!}}
instead.
{{#cetei:doc=...
|break1=pb---n---23
|break2=pb---n---24
}}
In (TEI) XML, not all units are necessarily encoded through matching pairs of opening and closing tags. A new page or column may start with a self-closing tag which marks a new beginning, e.g. <pb n="23" type="page" />"
, and may end with the one marking the next break (when there is none, you've probably reached the end of the document). XPath is not designed for this use case: such units are not part of DOM trees and probably impossible to align with them if the position of the self-closing tag is relatively free, e.g. within or after a paragraph. This experimental feature is intended to let you extract content between two self-closing tags; an attempt is then made to (semi-)repair the XML fragment by supplying the missing opening and closing tags; and a new version rendered as HTML is returned.
- Use the
break1
andbreak2
parameters to identify the first and final closing tags. - For each closing tag, use three consecutive hyphens to delimit the tag name, attribute and value, as in the example above.
{{#cetei-align:
|resources=Wikipage1^^Wikipage2
|resourcesep=^^
|selectors=//text:*[@xml:id='***']^^//tr:*[@n='***']
|align=1;1
2;2
3;3
4;4
|valsep=;
}}
The special page Special:CETEIcean
contains basic information about the extension and lists pages in the Cetei:
namespace.
The extension comes with system messages that can be customised if so desired. See the file /i18n/en.json
(English only for now). The following examples are worth mentioning explicitly:
- If you require a button or link to be added to the top right of a page in the extension namepace, i.e. on the opposite end of the tabbed headers on the left, you can add it to
MediaWiki:Cetei-top-right-content
(an empty div by default). - You are free to alter
MediaWiki:Cetei-edit-documentation-url
to set a different URL for the button that lets you edit the/doc
subpage. The parameter$1
will give you the title of that page. This can be useful if for instance, you prefer to use Page Forms or FlexForm instead of the regular wiki editor.
- Download the files and add the folder (
Ceteicean
) to your/extensions
directory. - Enable the extension in your
LocalSettings.php
file:
wfLoadExtension( 'Ceteicean' );
define("NS_CETEI", 350);
define("NS_CETEI_TALK", 351);
Because MediaWiki does not support retrieving globals from extensions, the latter two lines may be required, for instance when you want to add the namespace to Semantic MediaWiki’s $smwgNamespacesWithSemanticLinks
).
- Add configuration options if necessary (see below).
- Navigate to
Special:Version
on your wiki to verify that the extension is successfully installed. - You should be good to go.
$wgCeteiXsl
(default value:/extensions/CETEIcean/modules/ext.ctc.xsl
): XSL transformations using Custom Elements.$wgCeteiDTD
(default value:/extensions/CETEIcean/modules/ext.ctc.entities.dtd
): the DTD declaration containing character entity references$wgCeteiBehaviorsJsFile
(default value:/extensions/CETEIcean/modules/ext.ctc.behaviors.js
): JavaScript behaviours.$wgCeteiAllowEntitySubstitution
(default value:false
)$wgCeteiAllowUrl
(default value:false
)
The extension comes with a default set of JavaScript behaviours intended to add “custom styles, event handlers, and widgets ... to your TEI elements”. This set, which is defined in the file /modules/ext.ctc.behaviors.js
, is still somewhat experimental and may or may not suit your own particular use case. If you want, you can opt out and point the wiki to a file with your own custom behaviors. Add the following to your LocalSettings.php
, after the lines that enable the extension, and substitute the file location.
$wgCeteiBehaviorsJsFile = '/example-my-custom-behaviors.js';
In your custom file, the configuration should be assigned to a variable named configCustomBehaviors
.
var configCustomBehaviors = {
"tei": {
...
}
}
Relative path to the DTD file containing character entity references. For security reasons, any internal DTD that users include in the XML document itself will be automatically removed before XSL transformation.
Default: false (boolean). Whether entity substitution is allowed. See note on security below.
Default: false (boolean). Whether the parser function when used with url
should be allowed to retrieve the contents of a document from a public URL accessible to the server.
TEI XML files may contain strings that are intended to encode special characters or symbols. You will recognise these units by the ampersand and semi-colon on either side of the string, e.g. &ersir;
for the Insular et-symbol. It is up to the XML parser to substitute them on the basis of character entity references that are defined in a so-called DTD. It can be laborious for users if they are expected to point to a DTD every time they create a new XML document. More seriously, in a wiki environment that is designed to be open to users, character substitution may be especially prone to malicious XML External Entity (XXE) injections (look for LIBXML_NOENT on the web, e.g. here).
These issues are currently addressed in the following way:
- DTDs are ignored if they are defined or referenced directly in the document. Although the source code remains untouched, they are removed from the document before the XML parser can interpret them.
- A single DTD caters for all TEI XML documents on the wiki. CETEIcean comes with a default DTD, but site admins have the option to come up with their own definitions instead. See
$wgCeteiDTD
. - Character substitution is disallowed by default, just to be on the safe side, but can be switched back on through the
$wgCeteiAllowEntitySubstitution
config setting (boolean).
- It is mandatory that the TEI element contains a namespace declaration such as
xmlns="http://www.tei-c.org/ns/1.0"
. - Because some content retrieved with the
#cetei
parser function is lazy-loaded, you cannot reuse it for new purposes in wikitext. For queries with XPath, see the ExternalData extension. - When attempting to save especially large documents (1MB or over), you may hit the limits of processing power and memory. In part, this is due to the usual restrictions relating to
$wgMaxArticleSize
and the HTTP/HTTPS connector request size, but other factors may come into play, too. - This extension was not designed for a public wiki where anyone can edit. It is currently unknown if any additional security measures would be required.
- ACE tends not to play nice with character entity definitions and produces error warnings for every instance it fails to identify. In some documents, this may throw numerous error warnings saying "Entity not found", obscuring any messages that do matter. By way of a quick and dirty solution, you can hack into
CodeEditor/modules/ace/xml-worker.js
and suppress those warnings by commenting out the line beginningErrorHandler.error('entity not found:'+a)
. Make sure to purge cache afterwards, which you may have to attempt repeatedly because it can be stubborn. - Syntax errors in your XML document, such as missing tags that result in malformed XML, do not fail gracefully and will prevent the document from getting saved. To help the unsuspecting user, however, the Save button will throw a dialog box if any errors or warnings were detected.
- It is possible that there are still issues relating to certain types of caching, such as parser cache. You may notice that after a page edit, the output does not represent the latest revision and that a hard refresh is required to fetch it.
- Because this extension was first written and tested with MW 1.35, which does not offer support for ES6 with ResourceLoader, the code in CETEIcean’s JS files has been transpiled to ES5 using Babel js and a polyfill for custom elements is added as a dependency.
- 0.6. Added support for ranges in
#cetei-align
. Further default entities added (all from iso-grk1.ent, n/N with macron, etc.). Use 'displaytitle' to sort and show results in Special:CETEIcean. Reduced sensitivity to XML errors. Fixed preview in edit mode. Deactivated syntax highlighting for exceptionally lengthy documents to prevent it from freezing the browser. Withaction=info
, both#cetei
and#cetei-align
can provide self-documentation about parameters used. Styling changes. Removed 'beta' status. - 0.5. Added syntax highlighting to "Source code" tab in Cetei namespace (highlight.js). Changed output used for wiki search to be more search-friendly (rendered all entities, added section with attribute values). Added display title to indexing through ParserOutput. Added TEI XML to search profiles. Made certain notes collapsible/expandable. Special:CETEIcean improved and linked from AdminLinks. Custom dialog in event of error or warnings. Styling changes and minor modifications.
- 0.4. Added
#cetei-align
parser function. Added Ace editor for use in FlexForm and Page Forms (using#cetei-ace
to load JS). - 0.3. Added an experimental feature to
#cetei
for breaking out a fragment between two self-closing tags, typicallypb
ormls
/milestone
, having the XML repaired and retrieving an HTML rendering. This is intended for documents in which the position of such tags is too problematic and unpredictable for XPath selection. Extended list of character entities. - 0.2. Pre-processing now in XSLT, with continued support for 'behaviors'.
- 0.1. First release.