-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathtlgu.1.html
602 lines (447 loc) · 17.7 KB
/
tlgu.1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
<!-- Creator : groff version 1.22.4 -->
<!-- CreationDate: Wed Jul 28 08:24:16 2021 -->
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta name="generator" content="groff -Thtml, see www.gnu.org">
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<meta name="Content-Style" content="text/css">
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre { margin-top: 0; margin-bottom: 0; vertical-align: top }
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
</style>
<title>tlgu</title>
</head>
<body>
<h1 align="center">tlgu</h1>
<a href="#NAME">NAME</a><br>
<a href="#SYNOPSIS">SYNOPSIS</a><br>
<a href="#DESCRIPTION">DESCRIPTION</a><br>
<a href="#OPTIONS">OPTIONS</a><br>
<a href="#HISTORY AND INTENDED USE">HISTORY AND INTENDED USE</a><br>
<a href="#EXAMPLES">EXAMPLES</a><br>
<a href="#POST-PROCESSING EXAMPLES">POST-PROCESSING EXAMPLES</a><br>
<a href="#FURTHER DEVELOPMENT">FURTHER DEVELOPMENT</a><br>
<a href="#REFERENCES">REFERENCES</a><br>
<a href="#COPYRIGHT">COPYRIGHT</a><br>
<hr>
<h2>NAME
<a name="NAME"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em"><b>tlgu</b>
− convert beta code TLG and PHI CD-ROM txt files to
Unicode</p>
<h2>SYNOPSIS
<a name="SYNOPSIS"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em"><b>tlgu</b> [
<i>options</i> ] [ <i>input_file</i> ] [ <i>output_file</i>
]</p>
<h2>DESCRIPTION
<a name="DESCRIPTION"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em"><b>tlgu</b>
will convert an <i>input_file</i> from Thesaurus Linguae
Graeca (TLG) and Packard Humanities Institute (PHI)
representation to a Unicode (UTF-8) <i>output_file</i>. If
<i>input_file</i> is not specified, standard input will be
read. If <i>output_file</i> is not specified, the Unicode
text is directed to standard output. The TLG/PHI
representation consists of <b>beta-code</b> text and
<b>citation</b> information.</p>
<h2>OPTIONS
<a name="OPTIONS"></a>
</h2>
<table width="100%" border="0" rules="none" frame="void"
cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p style="margin-top: 1em"><b>−b</b></p></td>
<td width="8%"></td>
<td width="78%">
<p style="margin-top: 1em">inserts a form feed and citation
information (levels a, b, c, d) on every "book"
citation change. By default the program will output line
feeds only (see also <b>−p</b>).</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−p</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>observes paging instructions. By default the program
will output line feeds only.</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−r</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>primarily Roman text (PHI). Some TLG texts, notably
doccan1.txt and doccan2.txt are mostly roman texts lacking
explicit language change codes. Setting this option will
force a change to roman text after each citation block is
encountered.</p> </td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−v</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>highest-level reference citation is included before each
text line (v-level)</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−w</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>reference citation is included before each text line
(w-level)</p> </td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−x</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>reference citation is included before each text line
(x-level)</p> </td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−y</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>reference citation is included before each text line
(y-level)</p> </td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−z</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>lowest-level reference citation is included before each
text line (z-level).</p></td></tr>
</table>
<p style="margin-left:11%;"><b>−Z
<custom_citation_format_string></b></p>
<p style="margin-left:22%;">an arbitrary combination of
citation information is included before each text line; see
also -e option e.g. "%A/%B/%x/%y/%z\t" will output
the contents of the A, B <b>citation description</b> levels,
followed by x, y, z <b>citation reference</b> levels,
followed by a TAB character.</p>
<p style="margin-left:11%;"><b>−e
<custom_blank_citation_string></b></p>
<p style="margin-left:22%;">if there is no citation
information for a citation level defined with the -Z option
above, a single right-hand slash is substituted by default;
you may define any string with this option e.g.
"-" or "[NONE]" are valid inputs</p>
<table width="100%" border="0" rules="none" frame="void"
cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−B</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>inserts blank space (a tab) before each and every
line.</p> </td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−X</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>compact format; v, w, x citations are inserted as they
change at the beginning of each section.</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−Y</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>compact format; w, x, y citations are inserted as they
change at the beginning of each section.</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−N</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>no spaces; line ends and hyphens before an ID code are
removed while hyphens and spaces before page and column ends
are (still) retained.</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−C</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>citation debug information is output.</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−S</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>special code debug information is output.</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−V</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>block processing information is output (verbose).</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−U</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>vowels with acute accent are output using the Unicode
0x0370 codes rather than the 0x1F00 ones for compatibility
with most current (as of 2020) keyboard encoders.</p></td></tr>
<tr valign="top" align="left">
<td width="11%"></td>
<td width="3%">
<p><b>−W</b></p></td>
<td width="8%"></td>
<td width="78%">
<p>each work (book) is output as a separate file in the
form output_file-xxx.txt; if an output file is not
specified, this option has no effect.</p></td></tr>
</table>
<h2>HISTORY AND INTENDED USE
<a name="HISTORY AND INTENDED USE"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em">The purpose of
<b>tlgu</b> is to translate binary TLG/PHI-format files into
readable and editable text. It is based on an earlier
program written in 80x86 assembly language (1996) outputting
codes for a home-made font which used the prevalent hellenic
font encodings of that time complemented by dead accent
characters - not very attractive, but readable.</p>
<p style="margin-left:11%; margin-top: 1em">Then came
Unicode and a plethora of accented character glyphs;
Polytonic fonts are already available (Cardo, Gentium,
Athena, Athenian, Porson); new fonts are being created and
older fonts are being expanded as special-use code points
are included in the Unicode definition (musical symbols,
other special symbols). A notable effort since this note was
originally drafted is that of the Greek Font Society, now
featuring a great, and expanding, selection of open
polytonic fonts.</p>
<p style="margin-left:11%; margin-top: 1em">So, at this
point in time, <b>tlgu</b> will crunch a file which has been
formatted according to the published TLG/PHI format and
produce codes for most glyphs generally available. No
attempt has been made to introduce multi-character sequences
or formatting codes (font changes). If a code has not been
defined, the program will output the respective "code
family" glyph. You may use the <b>−S</b> option
to check such codes against the published beta code
definition.</p>
<p style="margin-left:11%; margin-top: 1em">July 2005 -
Troy A. Griffitts (scribe, crosswire org) contributed the
arbitrary citation output code and added per-line processing
of the input file.</p>
<p style="margin-left:11%; margin-top: 1em">April 2006 -
Final sigma will now be output at end-of-line (!) from
free-form input text (thank you Jan).</p>
<p style="margin-left:11%; margin-top: 1em">October 2011 -
stdout is used if output_file is not specified.</p>
<p style="margin-left:11%; margin-top: 1em">November 2011 -
citations (v, w, x) at the start of section changes (e-book
option)</p>
<p style="margin-left:11%; margin-top: 1em">May 2012 - Nick
White (nick white, durham ac uk) revised the input arguments
to use tlgu as a filter; stdin is used if input_file is not
specified</p>
<p style="margin-left:11%; margin-top: 1em">May 2020 -
Alternate output codes for vowels with acute accent (-U
option)</p>
<p style="margin-left:11%; margin-top: 1em">July 2021 -
Corrections to citation code</p>
<h2>EXAMPLES
<a name="EXAMPLES"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em"><b>./tlgu -r
DOCCAN2.TXT doccanu.txt</b> Translate the TLG canon to a
unicode text file. Note the use of the <b>-r</b> option
(this file expects Roman as the default font). <b><br>
./tlgu -x -y -z TLG1799.TXT tlg1799u.txt</b></p>
<p style="margin-left:22%;">Generate a continuous file with
the texts of granpa Euclides. Available citations (-x -y -z)
are Book//demonstratio/line as shown in the respective
"cit" field of doccan2.txt.</p>
<p style="margin-left:11%;"><b>./tlgu -b -B TLG1799.TXT
tlg1799u.txt</b></p>
<p style="margin-left:22%;">Generate the same texts, this
time with a page feed and book citation information on the
first page of each book and a tab before each line (use with
OOo versions earlier than 1.1.4).</p>
<p style="margin-left:11%;"><b>./tlgu -C TLG1799.TXT
tlg1799u.txt</b></p>
<p style="margin-left:22%;">See how the citation
information changes within each TLG block.</p>
<p style="margin-left:11%;"><b>./tlgu -S TLG1799.TXT
tlg1799u.txt | sort > symbols1799.txt</b></p>
<p style="margin-left:22%;">Check out the symbols used in a
work. Book and x, y, z references are printed on a separate
line for each symbol. Sort / grep the output to locate
specific symbols of interest; save in a file for later
use.</p>
<p style="margin-left:11%;"><b>./tlgu -W TLG0006.TXT
tlg0006u</b></p>
<p style="margin-left:22%;">Will produce separate files for
each work, named tlg006u-001.txt etc.</p>
<p style="margin-left:11%;"><b>./tlgu -Z
"%A/%B/%D/%c/%d/%Z/%x/%y/%z\t" -e "-"
chr0010.txt <br>
chr0010u.txt</b></p>
<p style="margin-left:22%;">Will generate a file with
citation description (A, B, D, Z) and citation reference (c,
d, x, y, z) levels, separated by "/" followed by a
TAB character and the respective text. Blank citation
elements will be filled with a single "-" e.g.
Asia/Smyrna/1222-1223 ac/IGChAs/Asia Min [Chr]/88/-/2A/7p1
[TAB] inscription text etc.</p>
<p style="margin-left:11%;"><b>./tlgu -r -N -X LAT0448.TXT
LAT0448.xx.TXT</b></p>
<p style="margin-left:22%;">will produce a compact version
of the Gaius Iulius Caesar texts with v and x citations
printed as they change; similarly, <b>./tlgu -r -N -Y
LAT2150.TXT LAT2150.yy.TXT</b> will produce a compact
version of Zeno’s texts.</p>
<h2>POST-PROCESSING EXAMPLES
<a name="POST-PROCESSING EXAMPLES"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em">I use the
OpenOffice/LibreOffice suite for most of my work. This
example shows one of many possible ways of using the search
and replace facility to create a readable version of the
Suda lexicon. <b><br>
./tlgu -B TLG4085.TXT tlg4085u.txt</b></p>
<p style="margin-left:22%;">A Unicode file with the text is
created</p>
<p style="margin-left:11%;"><b>Open the generated file with
Openoffice/LibreOffice:</b></p>
<p style="margin-left:22%;">File | Open | Filename:
tlg4085u.txt, File Type: Text Encoded −− Press
Open</p>
<p style="margin-left:22%; margin-top: 1em">The ASCII
Filter Options window appears. Select the Unicode (UTF-8)
character set and a proper Unicode font installed in your
machine (e.g. Cardo). Press OK.</p>
<p style="margin-left:11%;"><b>Replace angle brackets with
expanded text</b></p>
<p style="margin-left:22%;">Lexicon terms are enclosed in
<angle brackets>. The actual beta codes indicate the
use of expanded text for emphasis. Select Edit | Find &
Replace. The <b>Find & Replace</b> window appears.</p>
<p style="margin-left:22%; margin-top: 1em">In the
<b>Search For</b> field, type the following expression:
<b><[^<>]*></b> This means "find any
characters between angle brackets, not including angle
brackets".</p>
<p style="margin-left:22%; margin-top: 1em">In the
<b>Replace With</b> window insert a single ampersand:
<b>&</b> This means that we need to <b>add</b>
formatting information (this case) or additional text to the
text found. Press <b>More Options</b>, <b>Format...</b> and
select the <b>Position</b> tab; select Spacing Expanded by
2.0 points. Press OK.</p>
<p style="margin-left:22%; margin-top: 1em">Check the
<b>Regular Expressions</b> box and press <b>Replace
All</b>.</p>
<p style="margin-left:22%; margin-top: 1em">You may now
replace the angle brackets with nothings.</p>
<p style="margin-left:22%; margin-top: 1em">Repeat the
above procedure for titles enclosed in {braces}. Write a
macro...</p>
<p style="margin-left:11%;"><b>Other useful
information</b></p>
<p style="margin-left:22%;">If you are using your
wordprocessor with a locale setting other than Hellenic
(el_GR), the following invocation with the desired character
classification may prove useful for the occasional polytonic
editing:</p>
<p style="margin-left:22%; margin-top: 1em"><b>LC_CTYPE=el_GR.UTF-8
/usr/bin/soffice</b> (or
<b>/opt/libreoffice3.4/program/soffice</b> ).</p>
<p style="margin-left:22%; margin-top: 1em">I put my
default locale and keyboard definitions in my <b>.bashrc</b>
or <b>.profile</b>:</p>
<p style="margin-left:22%; margin-top: 1em"><b>export
LC_CTYPE=el_GR.UTF-8 <br>
setxkbmap us,el ,polytonic -option grp:ctrl_shift_toggle
-option grp_led:scroll</b></p>
<p style="margin-left:22%; margin-top: 1em">This way
multi-lingual text can be entered; keyboard layout switching
is done by pressing Ctrl/Shift; alternate keyboard layout is
indicated by the Scroll Lock light on the keyboard.</p>
<h2>FURTHER DEVELOPMENT
<a name="FURTHER DEVELOPMENT"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em">You may not
like the character output for a specific code. Check out the
<b>tlgcodes.h</b> file containing the special symbol and
punctuation codes and select one to suit you better. It will
probably be a while before the beta to Unicode
correspondence settles down.</p>
<p style="margin-left:11%; margin-top: 1em">Drop me a line,
if you need a new feature; let me know if you do find an
interesting applications that others can profit from.</p>
<h2>REFERENCES
<a name="REFERENCES"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em">There are
several texts describing the internal representation of
<b>PHI</b> and <b>TLG</b> text, ID data, citation data and
index files. The originator of this format is the Packard
Humanities Institute. The TLG is maintained by UCI −
see <b>www.tlg.uci.edu</b> − where you may find the
latest versions of the <b>TLG Beta Code Manual</b> and the
<b>TLG Beta Code Quick Reference Guide</b>.</p>
<p style="margin-left:11%; margin-top: 1em">Unicode
consortium (<b>www.unicode.org</b>) publications pertaining
to the codification of characters used in Hellenic
literature, scientific and musical texts.</p>
<p style="margin-left:11%; margin-top: 1em">The
OpenOffice/Libreoffice suite in its various editions
(<b>www.openoffice.org</b> - apache.org,
<b>www.libreoffice.org</b>, <b>www.neooffice.org</b>)
includes a word processor that you can use to load, process
and create new polytonic texts.</p>
<p style="margin-left:11%; margin-top: 1em">Greek Font
Society: <b>www.greekfontsociety.gr</b></p>
<h2>COPYRIGHT
<a name="COPYRIGHT"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em">Copyright (C)
2004, 2005, 2011, 2013, 2020, 2021 Dimitri Marinakis (dm,
ssa gr).</p>
<p style="margin-left:11%; margin-top: 1em">This file is
part of tlgu which is free software; you can redistribute it
and/or modify it under the terms of the GNU General Public
License (version 2) as published by the Free Software
Foundation.</p>
<p style="margin-left:11%; margin-top: 1em">tlgu is
distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.</p>
<p style="margin-left:11%; margin-top: 1em">You should have
received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
USA</p>
<hr>
</body>
</html>