Skip to content

Commit 7a2a2b2

Browse files
committed
Move abnormal code to be after running manually
1 parent 324cf0b commit 7a2a2b2

File tree

1 file changed

+72
-69
lines changed

1 file changed

+72
-69
lines changed

README.md

+72-69
Original file line numberDiff line numberDiff line change
@@ -265,8 +265,7 @@ m100-decomment --> m100-tokenize
265265
| m100-sanity<br/>m100-jumps<br/>m100-decomment<br/>m100-crunch<br/>m100-tokenize | Saves even more RAM, removing whitespace | tokenize -c |
266266
| m100-tokenize | Abnormal code is kept as is | |
267267

268-
<details><summary>Click to see more details about running these
269-
programs manually</summary><p>
268+
<details><summary>Click to see more details about running these programs manually</summary><p><ul>
270269

271270
### m100-tokenize synopsis
272271

@@ -372,74 +371,9 @@ If you find this to be a problem, please file an issue as it is
372371
potentially correctable using `open_memstream()`, but hackerb9 does
373372
not see the need.
374373

375-
</details> <!-- Running manually -->
374+
</ul></details> <!-- Running manually -->
376375

377-
378-
## Machine compatibility
379-
380-
Across the eight Kyotronic-85 sisters, there are actually only two
381-
different tokenized formats: "M100 BASIC" and "N82 BASIC". This
382-
program (currently) works only for the former, not the latter.
383-
384-
The three Radio-Shack portables (Models 100, 102 and 200), the Kyocera
385-
Kyotronic-85, and the Olivetti M10 all share the same tokenized BASIC.
386-
That means a single tokenized BASIC file _might_ work for any of
387-
those, presuming the program does not use CALL, PEEK, or POKE.
388-
However, the NEC family of portables -- the PC-8201, PC-8201A, and
389-
PC-8300 -- run N82 BASIC, which has a different tokenization format. A
390-
tokenized N82 BASIC file cannot run on an M100 computer and vice
391-
versa, even for programs which share the same ASCII BASIC source code.
392-
393-
### Checksum differences are not a compatibility problem
394-
395-
The .BA files generated by `tokenize` aim to be exactly the same, byte
396-
for byte, as the output from tokenizing on a Model 100 using `LOAD`
397-
and `SAVE`. There are some bytes, however, which can change and should
398-
be ignored when testing if two tokenized programs are identical.
399-
400-
<details><summary>Click to read details on line number pointers...</summary><ul>
401-
402-
A peculiar artifact of the [`.BA` file format][fileformat] is that it
403-
contains pointer locations offset by where the program happened to be
404-
in memory when it was saved. The pointers in the file are _never_ used
405-
as they are recalculated when the program is loaded into RAM.
406-
407-
To account for this variance when testing, the output of this program
408-
is intended to be byte-for-byte identical to:
409-
410-
1. A Model 100
411-
2. that has been freshly reset
412-
3. with no other BASIC programs on it
413-
4. running `LOAD "COM:88N1"` and `SAVE "FOO"` while a host computer sends the ASCII BASIC program over the serial port.
414-
415-
While the Tandy 102, Kyotronic-85, and M10 also appear to output files
416-
identical to the Model 100, the Tandy 200 does not. The 200 has more
417-
ROM than the other Model T computers, so it stores the first BASIC
418-
program at a slightly different RAM location (0xA000 instead of
419-
0x8000). This has no effect on compatibility between machines, but it
420-
does change the pointer offset.
421-
422-
Since two `.BA` files can be the identical program despite having
423-
different checksums, this project includes the `bacmp` program,
424-
described below.
425-
426-
[fileformat]: http://fileformats.archiveteam.org/wiki/Tandy_200_BASIC_tokenized_file "Reverse engineered file format documentation"
427-
428-
</ul></details> <!-- Line number pointers -->
429-
430-
## Why Lex?
431-
432-
This program is written in
433-
[Flex](https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/handouts/050%20Flex%20In%20A%20Nutshell.pdf),
434-
a lexical analyzer, because it made implementation trivial. The
435-
tokenizer itself, m100-tokenize, is mostly just a table of keywords
436-
and the corresponding byte they should emit. Flex handles special
437-
cases, like quoted strings and REMarks, easily.
438-
439-
The downside is that one must have flex installed to _modify_ the
440-
tokenizer. Flex is _not_ necessary to compile on a machine as flex
441-
generates portable C code. See the tokenize-cfiles.tar.gz in the
442-
github release or run `make cfiles`.
376+
<details><summary>Click for details on creating abnormal .BA files.</summary><ul>
443377

444378
## Abnormal code
445379

@@ -507,6 +441,75 @@ To run this on a Model 100, download
507441
[GOTO10.BA](https://github.com/hackerb9/tokenize/raw/main/degenerate/GOTO10.BA)
508442
which was created using m100-tokenizer.
509443

444+
</ul></details>
445+
446+
447+
## Machine compatibility
448+
449+
Across the eight Kyotronic-85 sisters, there are actually only two
450+
different tokenized formats: "M100 BASIC" and "N82 BASIC". This
451+
program (currently) works only for the former, not the latter.
452+
453+
The three Radio-Shack portables (Models 100, 102 and 200), the Kyocera
454+
Kyotronic-85, and the Olivetti M10 all share the same tokenized BASIC.
455+
That means a single tokenized BASIC file _might_ work for any of
456+
those, presuming the program does not use CALL, PEEK, or POKE.
457+
However, the NEC family of portables -- the PC-8201, PC-8201A, and
458+
PC-8300 -- run N82 BASIC. A tokenized N82 BASIC file cannot run on an
459+
M100 computer and vice versa, even for programs which share the same
460+
ASCII BASIC source code.
461+
462+
### Checksum differences are not a compatibility problem
463+
464+
The .BA files generated by `tokenize` aim to be exactly the same, byte
465+
for byte, as the output from tokenizing on a Model 100 using `LOAD`
466+
and `SAVE`. There are some bytes, however, which can change and should
467+
be ignored when testing if two tokenized programs are identical.
468+
469+
<details><summary>Click to read details on line number pointers...</summary><ul>
470+
471+
A peculiar artifact of the [`.BA` file format][fileformat] is that it
472+
contains pointer locations offset by where the program happened to be
473+
in memory when it was saved. The pointers in the file are _never_ used
474+
as they are recalculated when the program is loaded into RAM.
475+
476+
To account for this variance when testing, the output of this program
477+
is intended to be byte-for-byte identical to:
478+
479+
1. A Model 100
480+
2. that has been freshly reset
481+
3. with no other BASIC programs on it
482+
4. running `LOAD "COM:88N1"` and `SAVE "FOO"` while a host computer sends the ASCII BASIC program over the serial port.
483+
484+
While the Tandy 102, Kyotronic-85, and M10 also appear to output files
485+
identical to the Model 100, the Tandy 200 does not. The 200 has more
486+
ROM than the other Model T computers, so it stores the first BASIC
487+
program at a slightly different RAM location (0xA000 instead of
488+
0x8000). This has no effect on compatibility between machines, but it
489+
does change the pointer offset.
490+
491+
Since two `.BA` files can be the identical program despite having
492+
different checksums, this project includes the `bacmp` program,
493+
described below.
494+
495+
[fileformat]: http://fileformats.archiveteam.org/wiki/Tandy_200_BASIC_tokenized_file "Reverse engineered file format documentation"
496+
497+
</ul></details> <!-- Line number pointers -->
498+
499+
## Why Lex?
500+
501+
This program is written in
502+
[Flex](https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/handouts/050%20Flex%20In%20A%20Nutshell.pdf),
503+
a lexical analyzer, because it made implementation trivial. The
504+
tokenizer itself, m100-tokenize, is mostly just a table of keywords
505+
and the corresponding byte they should emit. Flex handles special
506+
cases, like quoted strings and REMarks, easily.
507+
508+
The downside is that one must have flex installed to _modify_ the
509+
tokenizer. Flex is _not_ necessary to compile on a machine as flex
510+
generates portable C code. See the tokenize-cfiles.tar.gz in the
511+
github release or run `make cfiles`.
512+
510513

511514
## Miscellaneous notes
512515

0 commit comments

Comments
 (0)