Skip to content

Latest commit

 

History

History
99 lines (67 loc) · 3.6 KB

print-nonascii.md

File metadata and controls

99 lines (67 loc) · 3.6 KB

print-nonascii(1) - print lines with non-ASCII characters

SYNOPSIS

Prints lines that contain non-ASCII characters.

print-nonascii [--<mode> [-r]] [-n] [-b] [file ...]
print-nonascii -q                        [file ...]

--<mode> prints abstract representations of non-ASCII chars.; one of:
  --caret, -v ... use caret notation, as cat -v would.
  --bash ... represent non-ASCII bytes as \xhh 
  --psh ... (PowerShell) represent non-ASCII Unicode characters as  
            Unicode escape sequences: <backtick>u{h...}

-r, --raw ... with --<mode>, print each matching line as-is too, first.

-n, --line-number ... prefix the output lines with their line number from  
 the original file, using format "<line-number>:" - decimal line numbers,  
 no padding, no space before or after the ":"

-b, --bare ... suppress per-input-filename headers

-q ... quiet mode: produce no output; signal presence of non-ASCII chars.  
       with exit code 0; exit code 100 signals that there are none.

Standard options: --help, --man, --version, --home

DESCRIPTION

Prints all input lines that contain at least one non-ASCII character.
Input can come from one or more files, or via stdin.

With multiple input files, the results are prefixed by a header identifying
the specific input file as follows; use -b to suppress it:

printf '\1###\t%s\n', FILENAME  # e.g., '\1###        test.txt'

Note the use of control character U+0001 (START OF HEADING) at the start of
the line. It will not print visibly, but can be used to more reliably filter
out the header lines later, if desired.

Options include prepending the line number and using one of several abstract
notations to represent the non-ASCII caracters.

Note that mode --psh only works properly with non-ASCII characters that are
UTF-8-encoded; any invalid-as-UTF-8 bytes result in warnings, and the
abstract representation will not be correct.

Note:

  • Only works properly with TEXT (files) as input.
  • Only --caret and --bash properly represent non-ASCII bytes that stem from
    an extended ASCII encoding rather than UTF-8.
  • A UTF-8 file with a BOM causes its first line to be printed, although
    the BOM characters don't print visibly; use any of the -- options
    to visualize them, e.g., with --caret, the BOM prints as M-oM-;M-?

STANDARD OPTIONS

All standard options must be provided as the only argument; all of them provide
information only.

  • -h, --help
    Prints the contents of the synopsis chapter to stdout for quick reference.

  • --man
    Displays this manual page, which is a helpful alternative to using man,
    if the manual page isn't installed.

  • --version
    Prints version information.

  • --home
    Opens this utility's home page in the system's default web browser.

LICENSE

Copyright (c) 2017 Michael Klement (mklement0@gmail.com), released under
the MIT license.

EXAMPLES

# Note: Byte sequence \xe2\x82\xac is U+20AC, the EURO sign.

# Print lines with non-ASCII chars. with per-byte Bash 
# hex. representations.
$ printf '1 \xe2\x82\xac\nall-ASCII line.' | print-nonascii --bash
1 \xe2\x82\xac

# Print lines with non-ASCII chars. with PowerShell Unicode escapes.
$ printf '1 \xe2\x82\xac\nall-ASCII line.' | print-nonascii --psh
1 `u{20ac}

# Print lines with non-ASCII chars. in caret notation, with line numbers:
$ printf '1 \xe2\x82\xac\nall-ASCII line.' | print-nonascii -v -n
1:1 M-bM-^BM-,