@@ -265,8 +265,7 @@ m100-decomment --> m100-tokenize
265
265
| m100-sanity<br />m100-jumps<br />m100-decomment<br />m100-crunch<br />m100-tokenize | Saves even more RAM, removing whitespace | tokenize -c |
266
266
| m100-tokenize | Abnormal code is kept as is | |
267
267
268
- <details ><summary >Click to see more details about running these
269
- programs manually</summary ><p >
268
+ <details ><summary >Click to see more details about running these programs manually</summary ><p ><ul >
270
269
271
270
### m100-tokenize synopsis
272
271
@@ -372,74 +371,9 @@ If you find this to be a problem, please file an issue as it is
372
371
potentially correctable using ` open_memstream() ` , but hackerb9 does
373
372
not see the need.
374
373
375
- </details > <!-- Running manually -->
374
+ </ul ></ details > <!-- Running manually -->
376
375
377
-
378
- ## Machine compatibility
379
-
380
- Across the eight Kyotronic-85 sisters, there are actually only two
381
- different tokenized formats: "M100 BASIC" and "N82 BASIC". This
382
- program (currently) works only for the former, not the latter.
383
-
384
- The three Radio-Shack portables (Models 100, 102 and 200), the Kyocera
385
- Kyotronic-85, and the Olivetti M10 all share the same tokenized BASIC.
386
- That means a single tokenized BASIC file _ might_ work for any of
387
- those, presuming the program does not use CALL, PEEK, or POKE.
388
- However, the NEC family of portables -- the PC-8201, PC-8201A, and
389
- PC-8300 -- run N82 BASIC, which has a different tokenization format. A
390
- tokenized N82 BASIC file cannot run on an M100 computer and vice
391
- versa, even for programs which share the same ASCII BASIC source code.
392
-
393
- ### Checksum differences are not a compatibility problem
394
-
395
- The .BA files generated by ` tokenize ` aim to be exactly the same, byte
396
- for byte, as the output from tokenizing on a Model 100 using ` LOAD `
397
- and ` SAVE ` . There are some bytes, however, which can change and should
398
- be ignored when testing if two tokenized programs are identical.
399
-
400
- <details ><summary >Click to read details on line number pointers...</summary ><ul >
401
-
402
- A peculiar artifact of the [ ` .BA ` file format] [ fileformat ] is that it
403
- contains pointer locations offset by where the program happened to be
404
- in memory when it was saved. The pointers in the file are _ never_ used
405
- as they are recalculated when the program is loaded into RAM.
406
-
407
- To account for this variance when testing, the output of this program
408
- is intended to be byte-for-byte identical to:
409
-
410
- 1 . A Model 100
411
- 2 . that has been freshly reset
412
- 3 . with no other BASIC programs on it
413
- 4 . running ` LOAD "COM:88N1" ` and ` SAVE "FOO" ` while a host computer sends the ASCII BASIC program over the serial port.
414
-
415
- While the Tandy 102, Kyotronic-85, and M10 also appear to output files
416
- identical to the Model 100, the Tandy 200 does not. The 200 has more
417
- ROM than the other Model T computers, so it stores the first BASIC
418
- program at a slightly different RAM location (0xA000 instead of
419
- 0x8000). This has no effect on compatibility between machines, but it
420
- does change the pointer offset.
421
-
422
- Since two ` .BA ` files can be the identical program despite having
423
- different checksums, this project includes the ` bacmp ` program,
424
- described below.
425
-
426
- [ fileformat ] : http://fileformats.archiveteam.org/wiki/Tandy_200_BASIC_tokenized_file " Reverse engineered file format documentation "
427
-
428
- </ul ></details > <!-- Line number pointers -->
429
-
430
- ## Why Lex?
431
-
432
- This program is written in
433
- [ Flex] ( https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/handouts/050%20Flex%20In%20A%20Nutshell.pdf ) ,
434
- a lexical analyzer, because it made implementation trivial. The
435
- tokenizer itself, m100-tokenize, is mostly just a table of keywords
436
- and the corresponding byte they should emit. Flex handles special
437
- cases, like quoted strings and REMarks, easily.
438
-
439
- The downside is that one must have flex installed to _ modify_ the
440
- tokenizer. Flex is _ not_ necessary to compile on a machine as flex
441
- generates portable C code. See the tokenize-cfiles.tar.gz in the
442
- github release or run ` make cfiles ` .
376
+ <details ><summary >Click for details on creating abnormal .BA files.</summary ><ul >
443
377
444
378
## Abnormal code
445
379
@@ -507,6 +441,75 @@ To run this on a Model 100, download
507
441
[ GOTO10.BA] ( https://github.com/hackerb9/tokenize/raw/main/degenerate/GOTO10.BA )
508
442
which was created using m100-tokenizer.
509
443
444
+ </ul ></details >
445
+
446
+
447
+ ## Machine compatibility
448
+
449
+ Across the eight Kyotronic-85 sisters, there are actually only two
450
+ different tokenized formats: "M100 BASIC" and "N82 BASIC". This
451
+ program (currently) works only for the former, not the latter.
452
+
453
+ The three Radio-Shack portables (Models 100, 102 and 200), the Kyocera
454
+ Kyotronic-85, and the Olivetti M10 all share the same tokenized BASIC.
455
+ That means a single tokenized BASIC file _ might_ work for any of
456
+ those, presuming the program does not use CALL, PEEK, or POKE.
457
+ However, the NEC family of portables -- the PC-8201, PC-8201A, and
458
+ PC-8300 -- run N82 BASIC. A tokenized N82 BASIC file cannot run on an
459
+ M100 computer and vice versa, even for programs which share the same
460
+ ASCII BASIC source code.
461
+
462
+ ### Checksum differences are not a compatibility problem
463
+
464
+ The .BA files generated by ` tokenize ` aim to be exactly the same, byte
465
+ for byte, as the output from tokenizing on a Model 100 using ` LOAD `
466
+ and ` SAVE ` . There are some bytes, however, which can change and should
467
+ be ignored when testing if two tokenized programs are identical.
468
+
469
+ <details ><summary >Click to read details on line number pointers...</summary ><ul >
470
+
471
+ A peculiar artifact of the [ ` .BA ` file format] [ fileformat ] is that it
472
+ contains pointer locations offset by where the program happened to be
473
+ in memory when it was saved. The pointers in the file are _ never_ used
474
+ as they are recalculated when the program is loaded into RAM.
475
+
476
+ To account for this variance when testing, the output of this program
477
+ is intended to be byte-for-byte identical to:
478
+
479
+ 1 . A Model 100
480
+ 2 . that has been freshly reset
481
+ 3 . with no other BASIC programs on it
482
+ 4 . running ` LOAD "COM:88N1" ` and ` SAVE "FOO" ` while a host computer sends the ASCII BASIC program over the serial port.
483
+
484
+ While the Tandy 102, Kyotronic-85, and M10 also appear to output files
485
+ identical to the Model 100, the Tandy 200 does not. The 200 has more
486
+ ROM than the other Model T computers, so it stores the first BASIC
487
+ program at a slightly different RAM location (0xA000 instead of
488
+ 0x8000). This has no effect on compatibility between machines, but it
489
+ does change the pointer offset.
490
+
491
+ Since two ` .BA ` files can be the identical program despite having
492
+ different checksums, this project includes the ` bacmp ` program,
493
+ described below.
494
+
495
+ [ fileformat ] : http://fileformats.archiveteam.org/wiki/Tandy_200_BASIC_tokenized_file " Reverse engineered file format documentation "
496
+
497
+ </ul ></details > <!-- Line number pointers -->
498
+
499
+ ## Why Lex?
500
+
501
+ This program is written in
502
+ [ Flex] ( https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/handouts/050%20Flex%20In%20A%20Nutshell.pdf ) ,
503
+ a lexical analyzer, because it made implementation trivial. The
504
+ tokenizer itself, m100-tokenize, is mostly just a table of keywords
505
+ and the corresponding byte they should emit. Flex handles special
506
+ cases, like quoted strings and REMarks, easily.
507
+
508
+ The downside is that one must have flex installed to _ modify_ the
509
+ tokenizer. Flex is _ not_ necessary to compile on a machine as flex
510
+ generates portable C code. See the tokenize-cfiles.tar.gz in the
511
+ github release or run ` make cfiles ` .
512
+
510
513
511
514
## Miscellaneous notes
512
515
0 commit comments