Skip to content

compress.q

Jaskirat Rajasansir edited this page Nov 3, 2021 · 3 revisions

On-Disk Compression Functions

This library provides functions to assist with the compression of HDB splayed and partitioned tables.

Objects

.compress.defaults

Provides a symbol reference to a default compression mode for each supported compression type:

Symbol Compression
`none (0; 0; 0)
`qipc (17; 1; 0)
`gzip (17; 2; 7)
`snappy (17; 3; 0)
`lz4hc (17; 4; 9)

Functions

.compress.getSplayStats[splayPath]

Provides the compression statistics (via -21!) for all columns in the specified splayed table folder and returns a table.

Any uncompressed columns will have a null compressed value

q) .compress.getSplayStats `:/tmp/hdb/2021.10.29/trade
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
-------------------------------------------------------------------------------------------
time   8000560          8000016            qipc         1         17               0
sym    1182897          10101304           qipc         1         17               0
price  8000560          8000016            qipc         1         17               0
vol    3500240          8000016            qipc         1         17               0

q).compress.getSplayStats `:/tmp/hdb/2021.01.23/trade
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
-------------------------------------------------------------------------------------------
time                    16                 none         0         0                0
sym                     4096               none         0         0                0
price                   16                 none         0         0                0
vol                     16                 none         0         0                0

.compress.getPartitionStats[hdbRoot; partVal]

Provides the compression statistics (via -21!) for all columns in all tables within the specified partition with the specified HDB. The returned table is the same as .compress.getSplayStats with part and table columns added

Note that this function is not par.txt aware. If using a segmented HDB, the hdbRoot parameter should be the segment root.

q) select sum compressedLength by part, table from .compress.getPartitionStats[`:/tmp/hdb; 2021.01.23]
part       table    | uncompressedLength
--------------------| ------------------
2021.01.23 tbl      | 4392
2021.01.23 tbl10    | 40
2021.01.23 tbl2     | 40
2021.01.23 trade    | 4144
2021.01.23 tradeComp| 4144

.compress.splay[sourceSplayPath; targetSplayPath; compressType; options]

Compresses a splayed table.

  • compressType: Can either be a symbol (one of none, qipc, gzip, snappy, lz4hc) or a 3-element integer list describing the compression type
  • options: A dictionary of options to modify the function's behaviour
    • recompress: If true, any compressed files will be recompressed (default is false)
    • inplace: If true, targetSplayPath can be the same as sourceSplayPath (default is false)

The function doesn't always compress every column in the splay. It will return a table information describing the operation that was performed; writeMode provides the detail to what was performed and why:

  • compress: The file was compressed
    • The file is uncompressed, or is compressed and the recompress option is true
  • copy: The file was copied (via the OS-specific copy command)
    • The file is either empty (0 = count) or is already compressed and the recompress option is missing or false
  • ignore: The file was ignored
    • The file would've been copied (as above) but was an inplace copy so nothing to do
q) .compress.getSplayStats `:/tmp/hdb/2021.10.29/trade
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
-------------------------------------------------------------------------------------------
time                    8000016            none         0         0                0
sym                     10101304           none         0         0                0
price                   8000016            none         0         0                0
vol                     8000016            none         0         0                0

q) .compress.splay[`:/tmp/hdb/2021.10.29/trade; `:/tmp/hdb/2021.10.29/tradeComp; `lz4hc; ()!()]
...
column source                           target                               compressed inplace empty writeMode
---------------------------------------------------------------------------------------------------------------
time   :/tmp/hdb/2021.10.29/trade/time  :/tmp/hdb/2021.10.29/tradeComp/time  0          0       0     compress
sym    :/tmp/hdb/2021.10.29/trade/sym   :/tmp/hdb/2021.10.29/tradeComp/sym   0          0       0     compress
price  :/tmp/hdb/2021.10.29/trade/price :/tmp/hdb/2021.10.29/tradeComp/price 0          0       0     compress
vol    :/tmp/hdb/2021.10.29/trade/vol   :/tmp/hdb/2021.10.29/tradeComp/vol   0          0       0     compress

q) .compress.getSplayStats `:/tmp/hdb/2021.10.29/tradeComp
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
-------------------------------------------------------------------------------------------
time   8000560          8000016            lz4hc        4         17               9
sym    1139049          10101304           lz4hc        4         17               9
price  6728031          8000016            lz4hc        4         17               9
vol    2927839          8000016            lz4hc        4         17               9

.compress.partition[sourceRoot; targetRoot; partVal; tbls; compressType; options]

Compresses multiple splayed tables within a HDB partition

  • tbls: Either a list of tables to compress or COMP_ALL can be specified to compress all tables
  • options: A dictionary of options to modify the function's behaviour
    • recompress: If true, any compressed files will be recompressed (default is false)
    • inplace: If true, sourceRoot can be the same as targetRoot (default is false)
    • srcParTxt: If true, any par.txt in sourceRoot will be used to find the specified partition (default is true)
    • tgtParTxt: If true, any par.txt in targetRoot will be used to write the specified partition (default is true)

NOTE: There is no interaction with the sym file in the source or target HDBs with this function. It is expected that the sym file is shared across both the source and target.

The same information is returned as .compress.splay with part and table columns added.

Clone this wiki locally