Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RowBinary decoding issue when their sizes exceeds 127 bytes #248

Merged
merged 1 commit into from
Feb 25, 2025

Conversation

psantos10
Copy link
Contributor

@psantos10 psantos10 commented Feb 25, 2025

This PR fixes a "bug" in Ch.RowBinary module where the decoding of type strings fails wgen the size of the string exceeds 127 bytes.

The issue raised when working with Enum16 with more than 8 possible values. As their type strings surpass the 127 byte limit, FunctionClauseError is raised. (** (FunctionClauseError) no function clause matching in Ch.RowBinary.decode_types/3):

** (FunctionClauseError) no function clause matching in Ch.RowBinary.decode_types/3

    The following arguments were given to Ch.RowBinary.decode_types/3:

        # 1
        <<98, 108, 101, 40, 83, 116, 114, 105, 110, 103, 41, 8, 68, 97, 116, 101, 84,
          105, 109, 101, 125, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 11, 114, 101, 119, 97,
          114, 100, 95, 116, 121, 112, 101, 151, 87, 184, 103>>

        # 2
        2

        # 3
        [
          <<34, 69, 110, 117, 109, 56, 40, 39, 86, 65, 76, 73, 68, 39, 32, 61, 32, 48,
            44, 32, 39, 67, 65, 78, 67, 69, 76, 69, 68, 39, 32, 61, 32, 49, 41, 16, 78,
            117, 108, 108, 97>>,
          <<1, 69, 110, 117, 109, 49, 54, 40, 39, 66, 69, 84, 39, 32, 61, 32, 48, 44,
            32, 39, 68, 69, 80, 79, 83, 73, 84, 39, 32, 61, 32, 49, 44, 32, 39, 78, 69,
            87, 95, 65, 67, 84, 73, 86, 69, 39, 32, 61, ...>>,
          "UInt64"
        ]

    Attempted function clauses (showing 3 out of 3):

        defp decode_types(<<>>, 0, _types)
        defp decode_types(<<rest::binary>>, 0, types)
        defp decode_types(<<size, type::binary-size(size), rest::binary>> = binary, count, acc)

    (ch 0.3.1) lib/ch/row_binary.ex:589: Ch.RowBinary.decode_types/3

While investigating the strange behavior, noticed that the original implementation of decode_types/3 function, incorrectly assumes that type string sizes were always encoded as a single byte. This assumption works only for sizes up to 127 bytes. For larger type strings, like those generated by Enum16 with 9 or more values, ClickHouse employs a multi-byte varint encoding.

So fix it, based on how decode_names/4 works, replaced the single-byte size assumption with pattern matching for varint sizes, levering the existing modules varints patterns.

@ruslandoga
Copy link
Contributor

👋 @psantos10

Thank you!

@ruslandoga ruslandoga merged commit 63f23cc into plausible:master Feb 25, 2025
13 checks passed
@psantos10 psantos10 deleted the fix-decode-types-varint branch February 25, 2025 14:22
@ruslandoga
Copy link
Contributor

ruslandoga commented Feb 25, 2025

Published as v0.3.2: https://hex.pm/packages/ch/0.3.2

And thank you again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants