From 9e12c3a531dca5f9f5ec9f6206bbe0ff26e2b01b Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Wed, 3 Mar 2021 20:43:33 +0100 Subject: [PATCH 01/17] Refactor registry system: no direct dependencies; expose standard hash.Hash; be a data carrier. --- Dependencies: Previously, depending on the go-mulithash package would create direct dependencies to several other modules for their various hash function implementations. This meant that instead of go-multihash being a lightweight, easy-to-accept dependency itself, it became something which would noticably increase the size of your go.mod file, your package graph, your download sizes during development, and most concerningly, your compile output size in final products. Now, there is a registry system (see the Register function), and the main go-multihash package *only* populates the registry with hashes that are available from the golang standard library by default. This means you gain no transitive dependencies on other libraries by importing the go-multihash package, and your binaries will not be bloated by hashers you don't use. (Your go.mod file may still show more repos; but they don't end up in your builds unless you actually refer to them). There are now several new packages in `go-multihash/register/*`. These can be imported to register the hashes in those packages. If you want all the hashes that were previously available, just make sure to import "go-mulithash/register/all" somewhere in your program. (You can register hashes, too, without making PRs to this library; these packages are just here for convenience and easy use.) **This is a breaking change** if you used hashes not found in the golang stdlib, such as blake2 and sha3. However, to update, all you need to do is ensure the relevant `go-multihash/register/*` package is imported anywhere in your program -- an easy change. A `go-multihash/register/all` package can be imported to get a hasher registered for all of the same multihash codes as before (but will correspondingly add the dependency weight back too, of course). --- Standard hash.Hash: Previously, go-multihash had its own definition of a `HashFunc` interface, and only exposed hashing through the `multihash.Sum` method. The problem with this was these definitions did not support streaming: one had to have an entire chunk of memory loaded at once, in a single contiguous byte slice, in order to hash it. (A second, admittedly much more minor, problem with this was that one often had to write glue code to turn a `hash.Hash` into a `multihash.HashFunc`, and since most of golang uses the standard lib `hash.Hash` definition already, this was generally avoidable friction.) Now, the Register function operates in terms of standard `hash.Hash`, and there is a `GetHasher` function which can get you a `hash.Hash`. (Okay, to be more precise, these functions take and return a factory function for a `hash.Hash`. You get the idea.) Since the standard `hash.Hash` interface can operate streamingly, now it's easy to use go-multihash in a streaming way. The `multihash.Sum` method works the same as always. --- Be a data carrier: Previously, go-multihash contained checks that any multihash indicator codes being handled were required to have a hash function registered for them. This made it very difficult to use go-multihash in a "forward compatible" way (and it also made a lot of practical bumps for this dependency-extraction refactor). Now, go-multihash is willing to carry data, even if it doesn't know what kind of hash function would be associated with an indicator code. (Methods that you'd expect to parse things do still parse the varints, making sure they're sanely formatted. They just don't inspect and whitelist the actual integers anymore.) I removed the `ValidCode` predicate entirely. It doesn't seem to serve any good purpose anymore. --- Other: I have not touched the `Codes` and `Names` maps in this diff. I think we should probably review (and probably remove) these, and instead direct people to use the go-multicodec package instead, which has the two advantages of decoupling registration of an implementation versus simply having a description, and also being automatically generated from the multiformats table. However, I wanted to check on feelings about this before doing the work (especially because they're somewhat entangled with a bunch of the tests in this package, making their removal somewhat nontrivial). Most of the test files are now `package multihash_test`. This makes for some colorful diffs, but is not otherwise interesting. The reason for this is because the dependency separation process now requires the tests to import those `register/*` packages, and to avoid a cycle, that means, well, `package mulithash_test`. I think there's probably more work to be done in making this library really shine. For example, in reviewing the `Encode` function, I see some allocations that look very likely to be avoidable... if the function was redesigned to be more aware of how it's likely to be used. However, I took no action on this, in part because this diff is big enough already, and in part because I think it might be reasonable to re-examine the relationship of this code to go-cid at the same time. I dropped `TestSmallerLengthHashID`. It appeared to be testing an API that wasn't actually exported... and the nearest API that *is* exported (Sum) has a general contract of truncating a hash upon short length, so it was overall unclear what this test should be checking. Review might be needed on this. The situation for murmur3 is still in need of resolution. It's commented out entirely for now. Questions are noted in the diff. There's a 'register/miniosha256' package which sets the sha2-256 implementation to a non-stdlib one. If you don't import this package, you still get a sha2-256; it's just the stdlib one. I did not include this in the 'register/all' group. (Maybe it's faster; maybe it's not; but it's definitely not required, and I'm getting some reports it also shows weird on profiles, so I tend to think maybe one should really have to explicitly ask for this one.) --- errata.go | 35 +++ multihash.go | 17 +- multihash/main.go | 1 + multihash_test.go | 10 - register/all/multihash_all.go | 23 ++ register/blake2/multihash_blake2.go | 63 +++++ register/miniosha256/multihash_miniosha256.go | 23 ++ register/murmur3/multihash_murmur3.go | 38 +++ register/sha3/multihash_sha3.go | 62 +++++ registry.go | 68 ++++++ spec_test.go | 11 +- sum.go | 228 ++---------------- sum_test.go | 162 +++++-------- 13 files changed, 396 insertions(+), 345 deletions(-) create mode 100644 errata.go create mode 100644 register/all/multihash_all.go create mode 100644 register/blake2/multihash_blake2.go create mode 100644 register/miniosha256/multihash_miniosha256.go create mode 100644 register/murmur3/multihash_murmur3.go create mode 100644 register/sha3/multihash_sha3.go create mode 100644 registry.go diff --git a/errata.go b/errata.go new file mode 100644 index 0000000..d3065ce --- /dev/null +++ b/errata.go @@ -0,0 +1,35 @@ +package multihash + +import ( + "bytes" + "crypto/sha256" + "hash" +) + +type identityMultihash struct { + bytes.Buffer +} + +func (identityMultihash) BlockSize() int { + return 32 // A prefered block size is nonsense for the "identity" "hash". An arbitrary but unsurprising and positive nonzero number has been chosen to minimize the odds of fascinating bugs. +} + +func (x *identityMultihash) Size() int { + return x.Len() +} + +func (x *identityMultihash) Sum(digest []byte) []byte { + return x.Bytes() +} + +type doubleSha256 struct { + hash.Hash +} + +func (x doubleSha256) Sum(digest []byte) []byte { + intermediate := [sha256.Size]byte{} + x.Hash.Sum(intermediate[0:0]) + h2 := sha256.New() + h2.Write(intermediate[:]) + return h2.Sum(digest) +} diff --git a/multihash.go b/multihash.go index 370e259..35b3610 100644 --- a/multihash.go +++ b/multihash.go @@ -231,15 +231,11 @@ func FromB58String(s string) (m Multihash, err error) { // Cast casts a buffer onto a multihash, and returns an error // if it does not work. func Cast(buf []byte) (Multihash, error) { - dm, err := Decode(buf) + _, err := Decode(buf) if err != nil { return Multihash{}, err } - if !ValidCode(dm.Code) { - return Multihash{}, ErrUnknownCode - } - return Multihash(buf), nil } @@ -267,9 +263,8 @@ func Decode(buf []byte) (*DecodedMultihash, error) { // Encode a hash digest along with the specified function code. // Note: the length is derived from the length of the digest itself. func Encode(buf []byte, code uint64) ([]byte, error) { - if !ValidCode(code) { - return nil, ErrUnknownCode - } + // REVIEW: if we remove the strict ValidCode check, this can no longer error. Change signiture? + // REVIEW: this function always causes heap allocs... but when used, this value is almost always going to be appended to another buffer (either as part of CID creation, or etc) -- should this whole function be rethought and alternatives offered? newBuf := make([]byte, varint.UvarintSize(code)+varint.UvarintSize(uint64(len(buf)))+len(buf)) n := varint.PutUvarint(newBuf, code) @@ -285,12 +280,6 @@ func EncodeName(buf []byte, name string) ([]byte, error) { return Encode(buf, Names[name]) } -// ValidCode checks whether a multihash code is valid. -func ValidCode(code uint64) bool { - _, ok := Codes[code] - return ok -} - // readMultihashFromBuf reads a multihash from the given buffer, returning the // individual pieces of the multihash. // Note: the returned digest is a slice over the passed in data and should be diff --git a/multihash/main.go b/multihash/main.go index 3f874c3..343045e 100644 --- a/multihash/main.go +++ b/multihash/main.go @@ -8,6 +8,7 @@ import ( mh "github.com/multiformats/go-multihash" mhopts "github.com/multiformats/go-multihash/opts" + _ "github.com/multiformats/go-multihash/register/all" ) var usage = `usage: %s [options] [FILE] diff --git a/multihash_test.go b/multihash_test.go index a6f6563..b49bf04 100644 --- a/multihash_test.go +++ b/multihash_test.go @@ -224,16 +224,6 @@ func ExampleDecode() { // obj: sha1 0x11 20 0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33 } -func TestValidCode(t *testing.T) { - for i := uint64(0); i < 0xff; i++ { - _, ok := tCodes[i] - - if ValidCode(i) != ok { - t.Error("ValidCode incorrect for: ", i) - } - } -} - func TestCast(t *testing.T) { for _, tc := range testCases { ob, err := hex.DecodeString(tc.hex) diff --git a/register/all/multihash_all.go b/register/all/multihash_all.go new file mode 100644 index 0000000..1037913 --- /dev/null +++ b/register/all/multihash_all.go @@ -0,0 +1,23 @@ +/* + This package has no purpose except to perform registration of mulithashes. + + It is meant to be used as a side-effecting import, e.g. + + import ( + _ "github.com/multiformats/go-multihash/register/all" + ) + + This package registers many multihashes at once. + Importing it will increase the size of your dependency tree significantly. + It's recommended that you import this package if you're building some + kind of data broker application, which may need to handle many different kinds of hashes; + if you're building an application which you know only handles a specific hash, + importing this package may bloat your builds unnecessarily. +*/ +package all + +import ( + _ "github.com/multiformats/go-multihash/register/blake2" + _ "github.com/multiformats/go-multihash/register/murmur3" + _ "github.com/multiformats/go-multihash/register/sha3" +) diff --git a/register/blake2/multihash_blake2.go b/register/blake2/multihash_blake2.go new file mode 100644 index 0000000..258e2a5 --- /dev/null +++ b/register/blake2/multihash_blake2.go @@ -0,0 +1,63 @@ +/* + This package has no purpose except to perform registration of multihashes. + + It is meant to be used as a side-effecting import, e.g. + + import ( + _ "github.com/multiformats/go-multihash/register/blake2" + ) + + This package registers several multihashes for the blake2 family + (both the 's' and the 'b' variants, and in a variety of sizes). +*/ +package blake2 + +import ( + "hash" + + "github.com/minio/blake2b-simd" + "golang.org/x/crypto/blake2s" + + "github.com/multiformats/go-multihash" +) + +const ( + BLAKE2B_MIN = 0xb201 + BLAKE2B_MAX = 0xb240 + BLAKE2S_MIN = 0xb241 + BLAKE2S_MAX = 0xb260 +) + +func init() { + // BLAKE2S + // This package only enables support for 32byte (256 bit) blake2s. + multihash.Register(BLAKE2S_MIN+31, func() hash.Hash { h, _ := blake2s.New256(nil); return h }) + + // BLAKE2B + // There's a whole range of these. + for c := uint64(BLAKE2B_MIN); c <= BLAKE2B_MAX; c++ { + size := int(c - BLAKE2B_MIN + 1) + + // special case these lengths to avoid allocations. + switch size { + case 32: + multihash.Register(c, blake2b.New256) + continue + case 64: + multihash.Register(c, blake2b.New512) + continue + } + + // Ok, allocate away. + // (The config object here being a pointer is a tad unfortunate, + // but we manage amortize it away by making them just once anyway.) + cfg := &blake2b.Config{Size: uint8(size)} + multihash.Register(c, func() hash.Hash { + hasher, err := blake2b.New(cfg) + if err != nil { + panic(err) + } + return hasher + }) + } +} diff --git a/register/miniosha256/multihash_miniosha256.go b/register/miniosha256/multihash_miniosha256.go new file mode 100644 index 0000000..6c72ff9 --- /dev/null +++ b/register/miniosha256/multihash_miniosha256.go @@ -0,0 +1,23 @@ +/* + This package has no purpose except to perform registration of multihashes. + + It is meant to be used as a side-effecting import, e.g. + + import ( + _ "github.com/multiformats/go-multihash/register/miniosha256" + ) + + This package registers alternative implementations for sha2-256, using + the github.com/minio/sha256-simd library. +*/ +package miniosha256 + +import ( + "github.com/minio/sha256-simd" + + "github.com/multiformats/go-multihash" +) + +func init() { + multihash.Register(0x12, sha256.New) +} diff --git a/register/murmur3/multihash_murmur3.go b/register/murmur3/multihash_murmur3.go new file mode 100644 index 0000000..f5b2051 --- /dev/null +++ b/register/murmur3/multihash_murmur3.go @@ -0,0 +1,38 @@ +/* + This package has no purpose except to perform registration of multihashes. + + It is meant to be used as a side-effecting import, e.g. + + import ( + _ "github.com/multiformats/go-multihash/register/murmur3" + ) + + This package registers multihashes for the murmur3 family. +*/ +package murmur3 + +// import ( +// "hash" +// +// "github.com/gxed/hashland/murmur3" +// +// "github.com/multiformats/go-multihash" +// ) + +func init() { + // REVIEW: what go-multihash has done historically is New32, but this doesn't match what the multihash table says, which is 128! Resolution needed. + // REVIEW: I have also heard that something in ipfs unixfsv1 uses a murmur hash, but that is yet different than this. Resolution needed. + // REVIEW: these bit-twiddling things *are* in fact load-bearing somehow. If you just return murmur3.New32 without this wrapper type, it produces different results. Resolution needed. + + // multihash.Register(0x22, func() hash.Hash { return murmur3.New32() }) + + // -or-, what previously existed: + + // number := murmur3.Sum32(data) + // bytes := make([]byte, 4) + // for i := range bytes { + // bytes[i] = byte(number & 0xff) + // number >>= 8 + // } + // return bytes, nil +} diff --git a/register/sha3/multihash_sha3.go b/register/sha3/multihash_sha3.go new file mode 100644 index 0000000..225159d --- /dev/null +++ b/register/sha3/multihash_sha3.go @@ -0,0 +1,62 @@ +/* + This package has no purpose except to perform registration of multihashes. + + It is meant to be used as a side-effecting import, e.g. + + import ( + _ "github.com/multiformats/go-multihash/register/sha3" + ) + + This package registers several multihashes for the sha3 family. + This also includes some functions known as "shake" and "keccak", + since they share much of their implementation and come in the same repos. +*/ +package sha3 + +import ( + "hash" + + "golang.org/x/crypto/sha3" + + "github.com/multiformats/go-multihash" +) + +func init() { + multihash.Register(0x14, sha3.New512) + multihash.Register(0x15, sha3.New384) + multihash.Register(0x16, sha3.New256) + multihash.Register(0x17, sha3.New224) + multihash.Register(0x18, func() hash.Hash { return shakeNormalizer{sha3.NewShake128(), 128 / 8 * 2} }) + multihash.Register(0x19, func() hash.Hash { return shakeNormalizer{sha3.NewShake256(), 256 / 8 * 2} }) + multihash.Register(0x1B, sha3.NewLegacyKeccak256) + multihash.Register(0x1D, sha3.NewLegacyKeccak512) +} + +// sha3.ShakeHash presents a somewhat odd interface, and requires a wrapper to normalize it to the usual hash.Hash interface. +// +// Some of the fiddly bits required by this normalization probably makes it undesirable for use in the highest performance applications; +// There's at least one extra allocation in constructing it (sha3.ShakeHash is an interface, so that's one heap escape; and there's a second heap escape when this normalizer struct gets boxed into a hash.Hash interface), +// and there's at least one extra allocation in getting a sum out of it (because reading a shake hash is a mutation (!) and the API only provides cloning as a way to escape this). +// Fun. +type shakeNormalizer struct { + sha3.ShakeHash + size int +} + +func (shakeNormalizer) BlockSize() int { + return 32 // Shake doesn't have a prefered block size, apparently. An arbitrary but unsurprising and positive nonzero number has been chosen to minimize the odds of fascinating bugs. +} + +func (x shakeNormalizer) Size() int { + return x.size +} + +func (x shakeNormalizer) Sum(digest []byte) []byte { + if len(digest) < x.size { + digest = make([]byte, x.size) + } + digest = digest[0:x.size] + h2 := x.Clone() // clone it, because reading mutates this kind of hash (!) which is not the standard contract for a Hash.Sum method. + h2.Read(digest) // not capable of underreading. See sha3.ShakeSum256 for similar usage. + return digest +} diff --git a/registry.go b/registry.go new file mode 100644 index 0000000..f4b1c7e --- /dev/null +++ b/registry.go @@ -0,0 +1,68 @@ +package multihash + +import ( + "crypto/md5" + "crypto/sha1" + "crypto/sha256" + "crypto/sha512" + "hash" +) + +// registry is a simple map which maps a multihash indicator number +// to a standard golang Hash interface. +// +// Multihash indicator numbers are reserved and described in +// https://github.com/multiformats/multicodec/blob/master/table.csv . +// The keys used in this map must match those reservations. +// +// Hashers which are available in the golang stdlib will be registered automatically. +// Others can be added using the Register function. +var registry = make(map[uint64]func() hash.Hash) + +// Register adds a new hash to the set available from GetHasher and Sum. +// +// Register has a global effect and should only be used at package init time to avoid data races. +// +// The indicator code should be per the numbers reserved and described in +// https://github.com/multiformats/multicodec/blob/master/table.csv . +// +// If Register is called with the same indicator code more than once, the last call wins. +// In practice, this means that if an application has a strong opinion about what implementation to use for a certain hash +// (e.g., perhaps they want to override the sha256 implementation to use a special hand-rolled assembly variant +// rather than the stdlib one which is registered by default), +// then this can be done by making a Register call with that effect at init time in the application's main package. +// This should have the desired effect because the root of the import tree has its init time effect last. +func Register(indicator uint64, hasherFactory func() hash.Hash) { + registry[indicator] = hasherFactory +} + +// GetHasher returns a new hash.Hash according to the indicator code number provided. +// +// The indicator code should be per the numbers reserved and described in +// https://github.com/multiformats/multicodec/blob/master/table.csv . +// +// The actual hashers available are determined by what has been registered. +// The registry automatically contains those hashers which are available in the golang standard libraries +// (which includes md5, sha1, sha256, sha384, sha512, and the "identity" mulithash, among others). +// Other hash implementations can be made available by using the Register function. +// The 'go-mulithash/register/*' packages can also be imported to gain more common hash functions. +// +// If an error is returned, it will be ErrSumNotSupported. +func GetHasher(indicator uint64) (hash.Hash, error) { + factory, exists := registry[indicator] + if !exists { + return nil, ErrSumNotSupported // REVIEW: it's unfortunate that this error doesn't say what code was missing. Also "NotSupported" is a bit of a misnomer now. + } + return factory(), nil +} + +func init() { + Register(0x00, func() hash.Hash { return &identityMultihash{} }) + Register(0xd5, md5.New) + Register(0x11, sha1.New) + Register(0x12, sha256.New) + Register(0x13, sha512.New) + Register(0x1f, sha256.New224) + Register(0x20, sha512.New384) + Register(0x56, func() hash.Hash { return &doubleSha256{sha256.New()} }) +} diff --git a/spec_test.go b/spec_test.go index 08d550e..592f7be 100644 --- a/spec_test.go +++ b/spec_test.go @@ -1,4 +1,4 @@ -package multihash +package multihash_test import ( "encoding/csv" @@ -7,6 +7,9 @@ import ( "strconv" "strings" "testing" + + "github.com/multiformats/go-multihash" + _ "github.com/multiformats/go-multihash/register/all" ) func TestSpec(t *testing.T) { @@ -55,7 +58,7 @@ func TestSpec(t *testing.T) { expectedFunctions[code] = name } - for code, name := range Codes { + for code, name := range multihash.Codes { expectedName, ok := expectedFunctions[code] if !ok { t.Errorf("multihash %q (%x) not defined in the spec", name, code) @@ -104,7 +107,7 @@ func TestSpecVectors(t *testing.T) { expectedStr := testCase[3] t.Run(fmt.Sprintf("%d/%s/%s", i, function, lengthStr), func(t *testing.T) { - code, ok := Names[function] + code, ok := multihash.Names[function] if !ok { t.Skipf("skipping %s: not supported", function) return @@ -119,7 +122,7 @@ func TestSpecVectors(t *testing.T) { t.Fatal("expected the length to be a multiple of 8") } - actual, err := Sum([]byte(input), code, int(length/8)) + actual, err := multihash.Sum([]byte(input), code, int(length/8)) if err != nil { t.Fatalf("failed to hash: %s", err) } diff --git a/sum.go b/sum.go index d6bf2f9..131c532 100644 --- a/sum.go +++ b/sum.go @@ -1,17 +1,8 @@ package multihash import ( - "crypto/md5" - "crypto/sha1" - "crypto/sha512" "errors" "fmt" - - blake2b "github.com/minio/blake2b-simd" - sha256 "github.com/minio/sha256-simd" - murmur3 "github.com/spaolacci/murmur3" - blake2s "golang.org/x/crypto/blake2s" - sha3 "golang.org/x/crypto/sha3" ) // ErrSumNotSupported is returned when the Sum function code is not implemented @@ -19,25 +10,25 @@ var ErrSumNotSupported = errors.New("Function not implemented. Complain to lib m var ErrLenTooLarge = errors.New("requested length was too large for digest") -// HashFunc is a hash function that hashes data into digest. -// -// The length is the size the digest will be truncated to. While the hash -// function isn't responsible for truncating the digest, it may want to error if -// the length is invalid for the hash function (e.g., truncation would make the -// hash useless). -type HashFunc func(data []byte, length int) (digest []byte, err error) - -// funcTable maps multicodec values to hash functions. -var funcTable = make(map[uint64]HashFunc) - // Sum obtains the cryptographic sum of a given buffer. The length parameter // indicates the length of the resulting digest and passing a negative value // use default length values for the selected hash function. func Sum(data []byte, code uint64, length int) (Multihash, error) { - if !ValidCode(code) { - return nil, fmt.Errorf("invalid multihash code %d", code) + // Get the algorithm. + hasher, err := GetHasher(code) + if err != nil { + return nil, err } + // Feed data in. + hasher.Write(data) + + // Compute hash. + // Use a fixed size array here: should keep things on the stack. + var space [64]byte + sum := hasher.Sum(space[0:0]) + + // Deal with any truncation. if length < 0 { var ok bool length, ok = DefaultLengths[code] @@ -45,197 +36,14 @@ func Sum(data []byte, code uint64, length int) (Multihash, error) { return nil, fmt.Errorf("no default length for code %d", code) } } - - hashFunc, ok := funcTable[code] - if !ok { - return nil, ErrSumNotSupported - } - - d, err := hashFunc(data, length) - if err != nil { - return nil, err - } - if len(d) < length { + if len(sum) < length { return nil, ErrLenTooLarge } - if length >= 0 { - d = d[:length] - } - return Encode(d, code) -} - -func sumBlake2s32(data []byte, _ int) ([]byte, error) { - d := blake2s.Sum256(data) - return d[:], nil -} -func sumBlake2b(data []byte, size int) ([]byte, error) { - // special case these lengths to avoid allocations. - switch size { - case 32: - hash := blake2b.Sum256(data) - return hash[:], nil - case 64: - hash := blake2b.Sum512(data) - return hash[:], nil - } - - // Ok, allocate away. - hasher, err := blake2b.New(&blake2b.Config{Size: uint8(size)}) - if err != nil { - return nil, err - } - - if _, err := hasher.Write(data); err != nil { - return nil, err - } - - return hasher.Sum(nil)[:], nil -} - -func sumID(data []byte, length int) ([]byte, error) { - if length >= 0 && length != len(data) { - return nil, fmt.Errorf("the length of the identity hash (%d) must be equal to the length of the data (%d)", - length, len(data)) - - } - return data, nil -} - -func sumSHA1(data []byte, length int) ([]byte, error) { - a := sha1.Sum(data) - return a[0:20], nil -} - -func sumSHA256(data []byte, length int) ([]byte, error) { - a := sha256.Sum256(data) - return a[0:32], nil -} - -func sumMD5(data []byte, length int) ([]byte, error) { - a := md5.Sum(data) - return a[0:md5.Size], nil -} - -func sumDoubleSHA256(data []byte, length int) ([]byte, error) { - val, _ := sumSHA256(data, len(data)) - return sumSHA256(val, len(val)) -} - -func sumSHA512(data []byte, length int) ([]byte, error) { - a := sha512.Sum512(data) - return a[0:64], nil -} -func sumKeccak256(data []byte, length int) ([]byte, error) { - h := sha3.NewLegacyKeccak256() - h.Write(data) - return h.Sum(nil), nil -} - -func sumKeccak512(data []byte, length int) ([]byte, error) { - h := sha3.NewLegacyKeccak512() - h.Write(data) - return h.Sum(nil), nil -} - -func sumSHA3_512(data []byte, length int) ([]byte, error) { - a := sha3.Sum512(data) - return a[:], nil -} - -func sumMURMUR3(data []byte, length int) ([]byte, error) { - number := murmur3.Sum32(data) - bytes := make([]byte, 4) - for i := range bytes { - bytes[i] = byte(number & 0xff) - number >>= 8 - } - return bytes, nil -} - -func sumSHAKE128(data []byte, length int) ([]byte, error) { - bytes := make([]byte, 32) - sha3.ShakeSum128(bytes, data) - return bytes, nil -} - -func sumSHAKE256(data []byte, length int) ([]byte, error) { - bytes := make([]byte, 64) - sha3.ShakeSum256(bytes, data) - return bytes, nil -} - -func sumSHA3_384(data []byte, length int) ([]byte, error) { - a := sha3.Sum384(data) - return a[:], nil -} - -func sumSHA3_256(data []byte, length int) ([]byte, error) { - a := sha3.Sum256(data) - return a[:], nil -} - -func sumSHA3_224(data []byte, length int) ([]byte, error) { - a := sha3.Sum224(data) - return a[:], nil -} - -func registerStdlibHashFuncs() { - RegisterHashFunc(IDENTITY, sumID) - RegisterHashFunc(SHA1, sumSHA1) - RegisterHashFunc(SHA2_512, sumSHA512) - RegisterHashFunc(MD5, sumMD5) -} - -func registerNonStdlibHashFuncs() { - RegisterHashFunc(SHA2_256, sumSHA256) - RegisterHashFunc(DBL_SHA2_256, sumDoubleSHA256) - - RegisterHashFunc(KECCAK_256, sumKeccak256) - RegisterHashFunc(KECCAK_512, sumKeccak512) - - RegisterHashFunc(SHA3_224, sumSHA3_224) - RegisterHashFunc(SHA3_256, sumSHA3_256) - RegisterHashFunc(SHA3_384, sumSHA3_384) - RegisterHashFunc(SHA3_512, sumSHA3_512) - - RegisterHashFunc(MURMUR3_128, sumMURMUR3) - - RegisterHashFunc(SHAKE_128, sumSHAKE128) - RegisterHashFunc(SHAKE_256, sumSHAKE256) - - // Blake family of hash functions - // BLAKE2S - // - // We only support 32byte (256 bit) - RegisterHashFunc(BLAKE2S_MIN+31, sumBlake2s32) - // BLAKE2B - for c := uint64(BLAKE2B_MIN); c <= BLAKE2B_MAX; c++ { - size := int(c - BLAKE2B_MIN + 1) - RegisterHashFunc(c, func(buf []byte, _ int) ([]byte, error) { - return sumBlake2b(buf, size) - }) - } -} - -func init() { - registerStdlibHashFuncs() - registerNonStdlibHashFuncs() -} - -// RegisterHashFunc adds an entry to the package-level code -> hash func map. -// The hash function must return at least the requested number of bytes. If it -// returns more, the hash will be truncated. -func RegisterHashFunc(code uint64, hashFunc HashFunc) error { - if !ValidCode(code) { - return fmt.Errorf("code %v not valid", code) - } - - _, ok := funcTable[code] - if ok { - return fmt.Errorf("hash func for code %v already registered", code) + sum = sum[:length] } - funcTable[code] = hashFunc - return nil + // Put the multihash metainfo bytes at the front of the buffer. + // FIXME: this does many avoidable allocations, but it's the shape of the Encode method arguments that forces this. + return Encode(sum, code) } diff --git a/sum_test.go b/sum_test.go index 37e256c..9dc4446 100644 --- a/sum_test.go +++ b/sum_test.go @@ -1,4 +1,4 @@ -package multihash +package multihash_test import ( "bytes" @@ -6,6 +6,9 @@ import ( "fmt" "runtime" "testing" + + "github.com/multiformats/go-multihash" + _ "github.com/multiformats/go-multihash/register/all" ) type SumTestCase struct { @@ -16,65 +19,63 @@ type SumTestCase struct { } var sumTestCases = []SumTestCase{ - {ID, 3, "foo", "0003666f6f"}, - {ID, -1, "foofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoo", "0030666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f"}, - {SHA1, -1, "foo", "11140beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"}, - {SHA1, 10, "foo", "110a0beec7b5ea3f0fdbc95d"}, - {SHA2_256, -1, "foo", "12202c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae"}, - {SHA2_256, 31, "foo", "121f2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7"}, - {SHA2_256, 32, "foo", "12202c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae"}, - {SHA2_256, 16, "foo", "12102c26b46b68ffc68ff99b453c1d304134"}, - {SHA2_512, -1, "foo", "1340f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc6638326e282c41be5e4254d8820772c5518a2c5a8c0c7f7eda19594a7eb539453e1ed7"}, - {SHA2_512, 32, "foo", "1320f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc663832"}, - {SHA3, 32, "foo", "14204bca2b137edc580fe50a88983ef860ebaca36c857b1f492839d6d7392452a63c"}, - {SHA3_512, 16, "foo", "14104bca2b137edc580fe50a88983ef860eb"}, - {SHA3_512, -1, "foo", "14404bca2b137edc580fe50a88983ef860ebaca36c857b1f492839d6d7392452a63c82cbebc68e3b70a2a1480b4bb5d437a7cba6ecf9d89f9ff3ccd14cd6146ea7e7"}, - {SHA3_224, -1, "beep boop", "171c0da73a89549018df311c0a63250e008f7be357f93ba4e582aaea32b8"}, - {SHA3_224, 16, "beep boop", "17100da73a89549018df311c0a63250e008f"}, - {SHA3_256, -1, "beep boop", "1620828705da60284b39de02e3599d1f39e6c1df001f5dbf63c9ec2d2c91a95a427f"}, - {SHA3_256, 16, "beep boop", "1610828705da60284b39de02e3599d1f39e6"}, - {SHA3_384, -1, "beep boop", "153075a9cff1bcfbe8a7025aa225dd558fb002769d4bf3b67d2aaf180459172208bea989804aefccf060b583e629e5f41e8d"}, - {SHA3_384, 16, "beep boop", "151075a9cff1bcfbe8a7025aa225dd558fb0"}, - {DBL_SHA2_256, 32, "foo", "5620c7ade88fc7a21498a6a5e5c385e1f68bed822b72aa63c4a9a48a02c2466ee29e"}, - {BLAKE2B_MAX, -1, "foo", "c0e40240ca002330e69d3e6b84a46a56a6533fd79d51d97a3bb7cad6c2ff43b354185d6dc1e723fb3db4ae0737e120378424c714bb982d9dc5bbd7a0ab318240ddd18f8d"}, - {BLAKE2B_MAX, 64, "foo", "c0e40240ca002330e69d3e6b84a46a56a6533fd79d51d97a3bb7cad6c2ff43b354185d6dc1e723fb3db4ae0737e120378424c714bb982d9dc5bbd7a0ab318240ddd18f8d"}, - {BLAKE2B_MAX - 32, -1, "foo", "a0e40220b8fe9f7f6255a6fa08f668ab632a8d081ad87983c77cd274e48ce450f0b349fd"}, - {BLAKE2B_MAX - 32, 32, "foo", "a0e40220b8fe9f7f6255a6fa08f668ab632a8d081ad87983c77cd274e48ce450f0b349fd"}, - {BLAKE2B_MAX - 19, -1, "foo", "ade4022dca82ab956d5885e3f5db10cca94182f01a6ca2c47f9f4228497dcc9f4a0121c725468b852a71ec21fcbeb725df"}, - {BLAKE2B_MAX - 19, 45, "foo", "ade4022dca82ab956d5885e3f5db10cca94182f01a6ca2c47f9f4228497dcc9f4a0121c725468b852a71ec21fcbeb725df"}, - {BLAKE2B_MAX - 16, -1, "foo", "b0e40230e629ee880953d32c8877e479e3b4cb0a4c9d5805e2b34c675b5a5863c4ad7d64bb2a9b8257fac9d82d289b3d39eb9cc2"}, - {BLAKE2B_MAX - 16, 48, "foo", "b0e40230e629ee880953d32c8877e479e3b4cb0a4c9d5805e2b34c675b5a5863c4ad7d64bb2a9b8257fac9d82d289b3d39eb9cc2"}, - {BLAKE2B_MIN + 19, -1, "foo", "94e40214983ceba2afea8694cc933336b27b907f90c53a88"}, - {BLAKE2B_MIN + 19, 20, "foo", "94e40214983ceba2afea8694cc933336b27b907f90c53a88"}, - {BLAKE2B_MIN, -1, "foo", "81e4020152"}, - {BLAKE2B_MIN, 1, "foo", "81e4020152"}, - {BLAKE2S_MAX, 32, "foo", "e0e4022008d6cad88075de8f192db097573d0e829411cd91eb6ec65e8fc16c017edfdb74"}, - {MURMUR3_128, 4, "beep boop", "2204243ddb9e"}, - {KECCAK_256, 32, "foo", "1b2041b1a0649752af1b28b3dc29a1556eee781e4a4c3a1f7f53f90fa834de098c4d"}, - {KECCAK_512, -1, "beep boop", "1d40e161c54798f78eba3404ac5e7e12d27555b7b810e7fd0db3f25ffa0c785c438331b0fbb6156215f69edf403c642e5280f4521da9bd767296ec81f05100852e78"}, - {SHAKE_128, 32, "foo", "1820f84e95cb5fbd2038863ab27d3cdeac295ad2d4ab96ad1f4b070c0bf36078ef08"}, - {SHAKE_256, 64, "foo", "19401af97f7818a28edfdfce5ec66dbdc7e871813816d7d585fe1f12475ded5b6502b7723b74e2ee36f2651a10a8eaca72aa9148c3c761aaceac8f6d6cc64381ed39"}, - {MD5, -1, "foo", "d50110acbd18db4cc2f85cedef654fccc4a4d8"}, + {multihash.ID, 3, "foo", "0003666f6f"}, + {multihash.ID, -1, "foofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoo", "0030666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f666f6f"}, + {multihash.SHA1, -1, "foo", "11140beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"}, + {multihash.SHA1, 10, "foo", "110a0beec7b5ea3f0fdbc95d"}, + {multihash.SHA2_256, -1, "foo", "12202c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae"}, + {multihash.SHA2_256, 31, "foo", "121f2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7"}, + {multihash.SHA2_256, 32, "foo", "12202c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae"}, + {multihash.SHA2_256, 16, "foo", "12102c26b46b68ffc68ff99b453c1d304134"}, + {multihash.SHA2_512, -1, "foo", "1340f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc6638326e282c41be5e4254d8820772c5518a2c5a8c0c7f7eda19594a7eb539453e1ed7"}, + {multihash.SHA2_512, 32, "foo", "1320f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc663832"}, + {multihash.SHA3, 32, "foo", "14204bca2b137edc580fe50a88983ef860ebaca36c857b1f492839d6d7392452a63c"}, + {multihash.SHA3_512, 16, "foo", "14104bca2b137edc580fe50a88983ef860eb"}, + {multihash.SHA3_512, -1, "foo", "14404bca2b137edc580fe50a88983ef860ebaca36c857b1f492839d6d7392452a63c82cbebc68e3b70a2a1480b4bb5d437a7cba6ecf9d89f9ff3ccd14cd6146ea7e7"}, + {multihash.SHA3_224, -1, "beep boop", "171c0da73a89549018df311c0a63250e008f7be357f93ba4e582aaea32b8"}, + {multihash.SHA3_224, 16, "beep boop", "17100da73a89549018df311c0a63250e008f"}, + {multihash.SHA3_256, -1, "beep boop", "1620828705da60284b39de02e3599d1f39e6c1df001f5dbf63c9ec2d2c91a95a427f"}, + {multihash.SHA3_256, 16, "beep boop", "1610828705da60284b39de02e3599d1f39e6"}, + {multihash.SHA3_384, -1, "beep boop", "153075a9cff1bcfbe8a7025aa225dd558fb002769d4bf3b67d2aaf180459172208bea989804aefccf060b583e629e5f41e8d"}, + {multihash.SHA3_384, 16, "beep boop", "151075a9cff1bcfbe8a7025aa225dd558fb0"}, + {multihash.DBL_SHA2_256, 32, "foo", "5620c7ade88fc7a21498a6a5e5c385e1f68bed822b72aa63c4a9a48a02c2466ee29e"}, + {multihash.BLAKE2B_MAX, -1, "foo", "c0e40240ca002330e69d3e6b84a46a56a6533fd79d51d97a3bb7cad6c2ff43b354185d6dc1e723fb3db4ae0737e120378424c714bb982d9dc5bbd7a0ab318240ddd18f8d"}, + {multihash.BLAKE2B_MAX, 64, "foo", "c0e40240ca002330e69d3e6b84a46a56a6533fd79d51d97a3bb7cad6c2ff43b354185d6dc1e723fb3db4ae0737e120378424c714bb982d9dc5bbd7a0ab318240ddd18f8d"}, + {multihash.BLAKE2B_MAX - 32, -1, "foo", "a0e40220b8fe9f7f6255a6fa08f668ab632a8d081ad87983c77cd274e48ce450f0b349fd"}, + {multihash.BLAKE2B_MAX - 32, 32, "foo", "a0e40220b8fe9f7f6255a6fa08f668ab632a8d081ad87983c77cd274e48ce450f0b349fd"}, + {multihash.BLAKE2B_MAX - 19, -1, "foo", "ade4022dca82ab956d5885e3f5db10cca94182f01a6ca2c47f9f4228497dcc9f4a0121c725468b852a71ec21fcbeb725df"}, + {multihash.BLAKE2B_MAX - 19, 45, "foo", "ade4022dca82ab956d5885e3f5db10cca94182f01a6ca2c47f9f4228497dcc9f4a0121c725468b852a71ec21fcbeb725df"}, + {multihash.BLAKE2B_MAX - 16, -1, "foo", "b0e40230e629ee880953d32c8877e479e3b4cb0a4c9d5805e2b34c675b5a5863c4ad7d64bb2a9b8257fac9d82d289b3d39eb9cc2"}, + {multihash.BLAKE2B_MAX - 16, 48, "foo", "b0e40230e629ee880953d32c8877e479e3b4cb0a4c9d5805e2b34c675b5a5863c4ad7d64bb2a9b8257fac9d82d289b3d39eb9cc2"}, + {multihash.BLAKE2B_MIN + 19, -1, "foo", "94e40214983ceba2afea8694cc933336b27b907f90c53a88"}, + {multihash.BLAKE2B_MIN + 19, 20, "foo", "94e40214983ceba2afea8694cc933336b27b907f90c53a88"}, + {multihash.BLAKE2B_MIN, -1, "foo", "81e4020152"}, + {multihash.BLAKE2B_MIN, 1, "foo", "81e4020152"}, + {multihash.BLAKE2S_MAX, 32, "foo", "e0e4022008d6cad88075de8f192db097573d0e829411cd91eb6ec65e8fc16c017edfdb74"}, + {multihash.MURMUR3_128, 4, "beep boop", "2204243ddb9e"}, + {multihash.KECCAK_256, 32, "foo", "1b2041b1a0649752af1b28b3dc29a1556eee781e4a4c3a1f7f53f90fa834de098c4d"}, + {multihash.KECCAK_512, -1, "beep boop", "1d40e161c54798f78eba3404ac5e7e12d27555b7b810e7fd0db3f25ffa0c785c438331b0fbb6156215f69edf403c642e5280f4521da9bd767296ec81f05100852e78"}, + {multihash.SHAKE_128, 32, "foo", "1820f84e95cb5fbd2038863ab27d3cdeac295ad2d4ab96ad1f4b070c0bf36078ef08"}, + {multihash.SHAKE_256, 64, "foo", "19401af97f7818a28edfdfce5ec66dbdc7e871813816d7d585fe1f12475ded5b6502b7723b74e2ee36f2651a10a8eaca72aa9148c3c761aaceac8f6d6cc64381ed39"}, + {multihash.MD5, -1, "foo", "d50110acbd18db4cc2f85cedef654fccc4a4d8"}, } func TestSum(t *testing.T) { - for _, tc := range sumTestCases { - - m1, err := FromHexString(tc.hex) + m1, err := multihash.FromHexString(tc.hex) if err != nil { t.Error(err) continue } - m2, err := Sum([]byte(tc.input), tc.code, tc.length) + m2, err := multihash.Sum([]byte(tc.input), tc.code, tc.length) if err != nil { t.Error(tc.code, "sum failed.", err) continue } if !bytes.Equal(m1, m2) { - t.Error(tc.code, Codes[tc.code], "sum failed.", m1, m2) + t.Error(tc.code, multihash.Codes[tc.code], "sum failed.", m1, m2) t.Error(hex.EncodeToString(m2)) } @@ -84,7 +85,7 @@ func TestSum(t *testing.T) { } s2 := m1.B58String() - m3, err := FromB58String(s2) + m3, err := multihash.FromB58String(s2) if err != nil { t.Error("failed to decode b58") } else if !bytes.Equal(m3, m1) { @@ -98,7 +99,7 @@ func TestSum(t *testing.T) { func BenchmarkSum(b *testing.B) { tc := sumTestCases[0] for i := 0; i < b.N; i++ { - Sum([]byte(tc.input), tc.code, tc.length) + multihash.Sum([]byte(tc.input), tc.code, tc.length) } } @@ -111,7 +112,7 @@ func BenchmarkBlake2B(b *testing.B) { b.ResetTimer() b.ReportAllocs() for i := 0; i < b.N; i++ { - m, err := Sum(arr, BLAKE2B_MIN+si/8-1, -1) + m, err := multihash.Sum(arr, multihash.BLAKE2B_MIN+si/8-1, -1) if err != nil { b.Fatal(err) } @@ -122,75 +123,22 @@ func BenchmarkBlake2B(b *testing.B) { } } -// Test that the identity hash function checks -// its `length` arguemnt for values that are -// different to its `data` length. -func TestSmallerLengthHashID(t *testing.T) { - - data := []byte("Identity hash input data.") - dataLength := len(data) - - // Normal case: `length == len(data)`. - _, err := sumID(data, dataLength) - if err != nil { - t.Fatal(err) - } - - // Unconstrained length (-1): also allowed. - _, err = sumID(data, -1) - if err != nil { - t.Fatal(err) - } - - // Any other variation of those two scenarios should fail. - for l := (dataLength - 1); l >= 0; l-- { - _, err = sumID(data, l) - if err == nil { - t.Fatal(fmt.Sprintf("identity hash of length %d smaller than data length %d didn't fail", - l, dataLength)) - } - } -} - -// Ensure that invalid codecs can't be registered, and existing hash funcs -// won't be overwritten. -func TestRegisterHashFunc(t *testing.T) { - tests := []struct { - code uint64 - shouldErr bool - }{ - {9999, true}, - {SHA1, true}, - } - - doesNothing := func(data []byte, length int) ([]byte, error) { - return []byte{}, nil - } - - for _, tt := range tests { - err := RegisterHashFunc(tt.code, doesNothing) - if err != nil && !tt.shouldErr { - t.Error(err) - } - } -} - func TestTooLargeLength(t *testing.T) { - _, err := Sum([]byte("test"), SHA2_256, 33) - if err != ErrLenTooLarge { + _, err := multihash.Sum([]byte("test"), multihash.SHA2_256, 33) + if err != multihash.ErrLenTooLarge { t.Fatal("bad error", err) } } func TestBasicSum(t *testing.T) { - for code, name := range Codes { - defaultLen, ok := DefaultLengths[code] + for code, name := range multihash.Codes { + defaultLen, ok := multihash.DefaultLengths[code] if !ok { defaultLen = 32 } - _, err := Sum([]byte("test"), code, defaultLen) + _, err := multihash.Sum([]byte("test"), code, defaultLen) switch err { - case ErrSumNotSupported, nil: + case multihash.ErrSumNotSupported, nil: default: t.Errorf("unexpected error for %s: %s", name, err) } From 382c37a002ea1e8bfd7fe333a72ae8832f6c452d Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Sat, 6 Mar 2021 13:25:01 +0100 Subject: [PATCH 02/17] Return to using constant pool from root package. It seems likely that we should replace many of these with values from https://github.com/multiformats/go-multicodec in the future, however, we have a new issue to track that: https://github.com/multiformats/go-multihash/issues/137 And as noted in https://github.com/multiformats/go-multihash/pull/136#discussion_r588875295 it's time to reign in the amount of work going on in this PR. --- register/miniosha256/multihash_miniosha256.go | 2 +- register/sha3/multihash_sha3.go | 16 ++++++++-------- registry.go | 14 ++++++-------- 3 files changed, 15 insertions(+), 17 deletions(-) diff --git a/register/miniosha256/multihash_miniosha256.go b/register/miniosha256/multihash_miniosha256.go index 6c72ff9..1b9028d 100644 --- a/register/miniosha256/multihash_miniosha256.go +++ b/register/miniosha256/multihash_miniosha256.go @@ -19,5 +19,5 @@ import ( ) func init() { - multihash.Register(0x12, sha256.New) + multihash.Register(multihash.SHA2_256, sha256.New) } diff --git a/register/sha3/multihash_sha3.go b/register/sha3/multihash_sha3.go index 225159d..f8b1692 100644 --- a/register/sha3/multihash_sha3.go +++ b/register/sha3/multihash_sha3.go @@ -22,14 +22,14 @@ import ( ) func init() { - multihash.Register(0x14, sha3.New512) - multihash.Register(0x15, sha3.New384) - multihash.Register(0x16, sha3.New256) - multihash.Register(0x17, sha3.New224) - multihash.Register(0x18, func() hash.Hash { return shakeNormalizer{sha3.NewShake128(), 128 / 8 * 2} }) - multihash.Register(0x19, func() hash.Hash { return shakeNormalizer{sha3.NewShake256(), 256 / 8 * 2} }) - multihash.Register(0x1B, sha3.NewLegacyKeccak256) - multihash.Register(0x1D, sha3.NewLegacyKeccak512) + multihash.Register(multihash.SHA3_512, sha3.New512) + multihash.Register(multihash.SHA3_384, sha3.New384) + multihash.Register(multihash.SHA3_256, sha3.New256) + multihash.Register(multihash.SHA3_224, sha3.New224) + multihash.Register(multihash.SHAKE_128, func() hash.Hash { return shakeNormalizer{sha3.NewShake128(), 128 / 8 * 2} }) + multihash.Register(multihash.SHAKE_256, func() hash.Hash { return shakeNormalizer{sha3.NewShake256(), 256 / 8 * 2} }) + multihash.Register(multihash.KECCAK_256, sha3.NewLegacyKeccak256) + multihash.Register(multihash.KECCAK_512, sha3.NewLegacyKeccak512) } // sha3.ShakeHash presents a somewhat odd interface, and requires a wrapper to normalize it to the usual hash.Hash interface. diff --git a/registry.go b/registry.go index f4b1c7e..7b28751 100644 --- a/registry.go +++ b/registry.go @@ -57,12 +57,10 @@ func GetHasher(indicator uint64) (hash.Hash, error) { } func init() { - Register(0x00, func() hash.Hash { return &identityMultihash{} }) - Register(0xd5, md5.New) - Register(0x11, sha1.New) - Register(0x12, sha256.New) - Register(0x13, sha512.New) - Register(0x1f, sha256.New224) - Register(0x20, sha512.New384) - Register(0x56, func() hash.Hash { return &doubleSha256{sha256.New()} }) + Register(IDENTITY, func() hash.Hash { return &identityMultihash{} }) + Register(MD5, md5.New) + Register(SHA1, sha1.New) + Register(SHA2_256, sha256.New) + Register(SHA2_512, sha512.New) + Register(DBL_SHA2_256, func() hash.Hash { return &doubleSha256{sha256.New()} }) } From 5647e22384ee5fb8aec908865bb40c9a45fe9f90 Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Sat, 6 Mar 2021 13:30:34 +0100 Subject: [PATCH 03/17] Optimize doubleSha256 construction. Avoid allocations with more reuse the same slice for results. --- errata.go | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/errata.go b/errata.go index d3065ce..8a91b66 100644 --- a/errata.go +++ b/errata.go @@ -27,9 +27,8 @@ type doubleSha256 struct { } func (x doubleSha256) Sum(digest []byte) []byte { - intermediate := [sha256.Size]byte{} - x.Hash.Sum(intermediate[0:0]) + digest = x.Hash.Sum(digest) h2 := sha256.New() - h2.Write(intermediate[:]) - return h2.Sum(digest) + h2.Write(digest) + return h2.Sum(digest[0:0]) } From 6897150357f6028f38cf9140166d4a5ed44da17f Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Sat, 6 Mar 2021 13:32:54 +0100 Subject: [PATCH 04/17] Drop murmur3 implementation. Agreed upon in discussion at: https://github.com/multiformats/go-multihash/pull/136#discussion_r588418533 The "REVIEW" comments in the outgoing diff identify the concerns that led to this choice. --- go.mod | 1 - go.sum | 2 -- register/all/multihash_all.go | 1 - register/murmur3/multihash_murmur3.go | 38 --------------------------- sum_test.go | 1 - 5 files changed, 43 deletions(-) delete mode 100644 register/murmur3/multihash_murmur3.go diff --git a/go.mod b/go.mod index 9a825bb..2b31327 100644 --- a/go.mod +++ b/go.mod @@ -5,7 +5,6 @@ require ( github.com/minio/sha256-simd v0.1.1-0.20190913151208-6de447530771 github.com/mr-tron/base58 v1.1.3 github.com/multiformats/go-varint v0.0.5 - github.com/spaolacci/murmur3 v1.1.0 golang.org/x/crypto v0.0.0-20190611184440-5c40567a22f8 ) diff --git a/go.sum b/go.sum index 6425497..3a25627 100644 --- a/go.sum +++ b/go.sum @@ -10,8 +10,6 @@ github.com/multiformats/go-varint v0.0.4 h1:CplQWhUouUgTZ53vNFE8VoWr2VjaKXci+xyr github.com/multiformats/go-varint v0.0.4/go.mod h1:3Ls8CIEsrijN6+B7PbrXRPxHRPuXSrVKRY101jdMZYE= github.com/multiformats/go-varint v0.0.5 h1:XVZwSo04Cs3j/jS0uAEPpT3JY6DzMcVLLoWOSnCxOjg= github.com/multiformats/go-varint v0.0.5/go.mod h1:3Ls8CIEsrijN6+B7PbrXRPxHRPuXSrVKRY101jdMZYE= -github.com/spaolacci/murmur3 v1.1.0 h1:7c1g84S4BPRrfL5Xrdp6fOJ206sU9y293DDHaoy0bLI= -github.com/spaolacci/murmur3 v1.1.0/go.mod h1:JwIasOWyU6f++ZhiEuf87xNszmSA2myDM2Kzu9HwQUA= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20190611184440-5c40567a22f8 h1:1wopBVtVdWnn03fZelqdXTqk7U7zPQCb+T4rbU9ZEoU= golang.org/x/crypto v0.0.0-20190611184440-5c40567a22f8/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= diff --git a/register/all/multihash_all.go b/register/all/multihash_all.go index 1037913..711c4d7 100644 --- a/register/all/multihash_all.go +++ b/register/all/multihash_all.go @@ -18,6 +18,5 @@ package all import ( _ "github.com/multiformats/go-multihash/register/blake2" - _ "github.com/multiformats/go-multihash/register/murmur3" _ "github.com/multiformats/go-multihash/register/sha3" ) diff --git a/register/murmur3/multihash_murmur3.go b/register/murmur3/multihash_murmur3.go deleted file mode 100644 index f5b2051..0000000 --- a/register/murmur3/multihash_murmur3.go +++ /dev/null @@ -1,38 +0,0 @@ -/* - This package has no purpose except to perform registration of multihashes. - - It is meant to be used as a side-effecting import, e.g. - - import ( - _ "github.com/multiformats/go-multihash/register/murmur3" - ) - - This package registers multihashes for the murmur3 family. -*/ -package murmur3 - -// import ( -// "hash" -// -// "github.com/gxed/hashland/murmur3" -// -// "github.com/multiformats/go-multihash" -// ) - -func init() { - // REVIEW: what go-multihash has done historically is New32, but this doesn't match what the multihash table says, which is 128! Resolution needed. - // REVIEW: I have also heard that something in ipfs unixfsv1 uses a murmur hash, but that is yet different than this. Resolution needed. - // REVIEW: these bit-twiddling things *are* in fact load-bearing somehow. If you just return murmur3.New32 without this wrapper type, it produces different results. Resolution needed. - - // multihash.Register(0x22, func() hash.Hash { return murmur3.New32() }) - - // -or-, what previously existed: - - // number := murmur3.Sum32(data) - // bytes := make([]byte, 4) - // for i := range bytes { - // bytes[i] = byte(number & 0xff) - // number >>= 8 - // } - // return bytes, nil -} diff --git a/sum_test.go b/sum_test.go index 9dc4446..3402431 100644 --- a/sum_test.go +++ b/sum_test.go @@ -52,7 +52,6 @@ var sumTestCases = []SumTestCase{ {multihash.BLAKE2B_MIN, -1, "foo", "81e4020152"}, {multihash.BLAKE2B_MIN, 1, "foo", "81e4020152"}, {multihash.BLAKE2S_MAX, 32, "foo", "e0e4022008d6cad88075de8f192db097573d0e829411cd91eb6ec65e8fc16c017edfdb74"}, - {multihash.MURMUR3_128, 4, "beep boop", "2204243ddb9e"}, {multihash.KECCAK_256, 32, "foo", "1b2041b1a0649752af1b28b3dc29a1556eee781e4a4c3a1f7f53f90fa834de098c4d"}, {multihash.KECCAK_512, -1, "beep boop", "1d40e161c54798f78eba3404ac5e7e12d27555b7b810e7fd0db3f25ffa0c785c438331b0fbb6156215f69edf403c642e5280f4521da9bd767296ec81f05100852e78"}, {multihash.SHAKE_128, 32, "foo", "1820f84e95cb5fbd2038863ab27d3cdeac295ad2d4ab96ad1f4b070c0bf36078ef08"}, From 8c1b61b492e7972d15d78c5f459b123d2f8d17da Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Sat, 6 Mar 2021 13:34:17 +0100 Subject: [PATCH 05/17] Use unexported constants to describe blake ranges. Per discussion in: https://github.com/multiformats/go-multihash/pull/136#discussion_r588418103 --- register/blake2/multihash_blake2.go | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/register/blake2/multihash_blake2.go b/register/blake2/multihash_blake2.go index 258e2a5..3d20041 100644 --- a/register/blake2/multihash_blake2.go +++ b/register/blake2/multihash_blake2.go @@ -22,21 +22,21 @@ import ( ) const ( - BLAKE2B_MIN = 0xb201 - BLAKE2B_MAX = 0xb240 - BLAKE2S_MIN = 0xb241 - BLAKE2S_MAX = 0xb260 + blake2b_min = 0xb201 + blake2b_max = 0xb240 + blake2s_min = 0xb241 + blake2s_max = 0xb260 ) func init() { - // BLAKE2S + // blake2s // This package only enables support for 32byte (256 bit) blake2s. - multihash.Register(BLAKE2S_MIN+31, func() hash.Hash { h, _ := blake2s.New256(nil); return h }) + multihash.Register(blake2s_min+31, func() hash.Hash { h, _ := blake2s.New256(nil); return h }) - // BLAKE2B + // blake2b // There's a whole range of these. - for c := uint64(BLAKE2B_MIN); c <= BLAKE2B_MAX; c++ { - size := int(c - BLAKE2B_MIN + 1) + for c := uint64(blake2b_min); c <= blake2b_max; c++ { + size := int(c - blake2b_min + 1) // special case these lengths to avoid allocations. switch size { From 57ca955ee1b79cc1accfabf2f7390a07f14745cb Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Sat, 6 Mar 2021 13:37:46 +0100 Subject: [PATCH 06/17] Continue to use minio sha2 in multihash command. --- multihash/main.go | 1 + 1 file changed, 1 insertion(+) diff --git a/multihash/main.go b/multihash/main.go index 343045e..26f3fd8 100644 --- a/multihash/main.go +++ b/multihash/main.go @@ -9,6 +9,7 @@ import ( mh "github.com/multiformats/go-multihash" mhopts "github.com/multiformats/go-multihash/opts" _ "github.com/multiformats/go-multihash/register/all" + _ "github.com/multiformats/go-multihash/register/miniosha256" ) var usage = `usage: %s [options] [FILE] From 10afd220b75c0f4bde1a524c2dbf2ab39b5876b4 Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Sat, 6 Mar 2021 13:42:11 +0100 Subject: [PATCH 07/17] Drop DefaultLengths; can now get from Hash.Size. Discussed in https://github.com/multiformats/go-multihash/pull/136#discussion_r588878137 --- multihash.go | 24 ------------------------ sum.go | 7 +------ sum_test.go | 6 +----- 3 files changed, 2 insertions(+), 35 deletions(-) diff --git a/multihash.go b/multihash.go index 35b3610..d70c0df 100644 --- a/multihash.go +++ b/multihash.go @@ -80,7 +80,6 @@ func init() { name := fmt.Sprintf("blake2b-%d", n*8) Names[name] = c Codes[c] = name - DefaultLengths[c] = int(n) } // Add blake2s (32 codes) @@ -89,7 +88,6 @@ func init() { name := fmt.Sprintf("blake2s-%d", n*8) Names[name] = c Codes[c] = name - DefaultLengths[c] = int(n) } } @@ -142,28 +140,6 @@ var Codes = map[uint64]string{ MD5: "md5", } -// DefaultLengths maps a hash code to it's default length -var DefaultLengths = map[uint64]int{ - IDENTITY: -1, - SHA1: 20, - SHA2_256: 32, - SHA2_512: 64, - SHA3_224: 28, - SHA3_256: 32, - SHA3_384: 48, - SHA3_512: 64, - DBL_SHA2_256: 32, - KECCAK_224: 28, - KECCAK_256: 32, - MURMUR3_128: 4, - KECCAK_384: 48, - KECCAK_512: 64, - SHAKE_128: 32, - SHAKE_256: 64, - X11: 64, - MD5: 16, -} - func uvarint(buf []byte) (uint64, []byte, error) { n, c, err := varint.FromUvarint(buf) if err != nil { diff --git a/sum.go b/sum.go index 131c532..b6cdb87 100644 --- a/sum.go +++ b/sum.go @@ -2,7 +2,6 @@ package multihash import ( "errors" - "fmt" ) // ErrSumNotSupported is returned when the Sum function code is not implemented @@ -30,11 +29,7 @@ func Sum(data []byte, code uint64, length int) (Multihash, error) { // Deal with any truncation. if length < 0 { - var ok bool - length, ok = DefaultLengths[code] - if !ok { - return nil, fmt.Errorf("no default length for code %d", code) - } + length = hasher.Size() } if len(sum) < length { return nil, ErrLenTooLarge diff --git a/sum_test.go b/sum_test.go index 3402431..0ad3b75 100644 --- a/sum_test.go +++ b/sum_test.go @@ -131,11 +131,7 @@ func TestTooLargeLength(t *testing.T) { func TestBasicSum(t *testing.T) { for code, name := range multihash.Codes { - defaultLen, ok := multihash.DefaultLengths[code] - if !ok { - defaultLen = 32 - } - _, err := multihash.Sum([]byte("test"), code, defaultLen) + _, err := multihash.Sum([]byte("test"), code, -1) switch err { case multihash.ErrSumNotSupported, nil: default: From 442f64ad9d4812d72bdc6219f3cc158a92d4e54e Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Mon, 8 Mar 2021 15:29:48 +0100 Subject: [PATCH 08/17] Remove misplaced optimism about escape analysis. --- sum.go | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/sum.go b/sum.go index b6cdb87..b0563f2 100644 --- a/sum.go +++ b/sum.go @@ -22,10 +22,9 @@ func Sum(data []byte, code uint64, length int) (Multihash, error) { // Feed data in. hasher.Write(data) - // Compute hash. - // Use a fixed size array here: should keep things on the stack. - var space [64]byte - sum := hasher.Sum(space[0:0]) + // Compute final hash. + // A new slice is allocated. FUTURE: see other comment below about allocation, and review together with this line to try to improve. + sum := hasher.Sum(nil) // Deal with any truncation. if length < 0 { @@ -39,6 +38,6 @@ func Sum(data []byte, code uint64, length int) (Multihash, error) { } // Put the multihash metainfo bytes at the front of the buffer. - // FIXME: this does many avoidable allocations, but it's the shape of the Encode method arguments that forces this. + // FUTURE: try to improve allocations here. Encode does several which are probably avoidable, but it's the shape of the Encode method arguments that forces this. return Encode(sum, code) } From 67c208ae7671aaf78685a83bf91427f8a0965aad Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Tue, 9 Mar 2021 08:00:41 +0100 Subject: [PATCH 09/17] Resolving review topics around Encode function. Leaving the error return in, per https://github.com/multiformats/go-multihash/pull/136#discussion_r587708992 May also be interesting to note: I did check if ValidCode (or, roughly the same thing but perhaps renamed to KnownCode) could be reintroduced, and the answer is... actually no, not without broaching other issues. A body of `_, ok := registry[code]` is enough to open a can of worms. See https://github.com/multiformats/go-multihash/pull/136#discussion_r589712826 --- multihash.go | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/multihash.go b/multihash.go index d70c0df..9eeb9ca 100644 --- a/multihash.go +++ b/multihash.go @@ -238,10 +238,10 @@ func Decode(buf []byte) (*DecodedMultihash, error) { // Encode a hash digest along with the specified function code. // Note: the length is derived from the length of the digest itself. +// +// The error return is legacy; it is always nil. func Encode(buf []byte, code uint64) ([]byte, error) { - // REVIEW: if we remove the strict ValidCode check, this can no longer error. Change signiture? - // REVIEW: this function always causes heap allocs... but when used, this value is almost always going to be appended to another buffer (either as part of CID creation, or etc) -- should this whole function be rethought and alternatives offered? - + // FUTURE: this function always causes heap allocs... but when used, this value is almost always going to be appended to another buffer (either as part of CID creation, or etc) -- should this whole function be rethought and alternatives offered? newBuf := make([]byte, varint.UvarintSize(code)+varint.UvarintSize(uint64(len(buf)))+len(buf)) n := varint.PutUvarint(newBuf, code) n += varint.PutUvarint(newBuf[n:], uint64(len(buf))) From bc5cc892202afdc535352ab232b5971c2e88f027 Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Wed, 10 Mar 2021 08:20:19 +0100 Subject: [PATCH 10/17] Replace need for DefaultLength in opts processing. This creates a hasher long enough to ask it its properties. This is arguably creating garbage; on the other hand, I don't think this is a codepath ever likely to be used in a hot loop anywhere. We can extract this to something that caches the properties later if it proves necessary. It quietly defaults to zero for unknown codes. I don't know if this makes sense, but it's what the old code would have done, so it's what the new code will do, and I'm not looking deeper into it. At this point I'm just trying to make surgically minimal alterations and get this changeset as a whole wrapped up so things can move on. --- opts/opts.go | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/opts/opts.go b/opts/opts.go index 745c83f..809693d 100644 --- a/opts/opts.go +++ b/opts/opts.go @@ -110,8 +110,13 @@ func (o *Options) ParseError() error { } o.Length = o.Length / 8 - if o.Length > mh.DefaultLengths[o.AlgorithmCode] { - o.Length = mh.DefaultLengths[o.AlgorithmCode] + h, _ := mh.GetHasher(o.AlgorithmCode) + hsize := 0 + if h != nil { + hsize = h.Size() + } + if o.Length > hsize { + o.Length = hsize } } return nil From 7be6719125318eb95e1b2697fa40976c2dcc2395 Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Wed, 10 Mar 2021 08:37:20 +0100 Subject: [PATCH 11/17] Proactively reject registration of nil functions. Surely no one would try to do this, nor then be surprised if it creates problems. On the other hand: if someone *does* do this, and the error doesn't appear until arbitrarily far away when the map is read, it's a pain to diagnose... and checking it up front is cheap. So, here we go: check it up front. --- registry.go | 3 +++ 1 file changed, 3 insertions(+) diff --git a/registry.go b/registry.go index 7b28751..70652c3 100644 --- a/registry.go +++ b/registry.go @@ -33,6 +33,9 @@ var registry = make(map[uint64]func() hash.Hash) // then this can be done by making a Register call with that effect at init time in the application's main package. // This should have the desired effect because the root of the import tree has its init time effect last. func Register(indicator uint64, hasherFactory func() hash.Hash) { + if hasherFactory == nil { + panic("not sensible to attempt to register a nil function") + } registry[indicator] = hasherFactory } From cbd218cd90910541ab7a67d6a52ba999716ff2d7 Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Wed, 10 Mar 2021 08:50:25 +0100 Subject: [PATCH 12/17] Improve situation around ErrSumNotSupported. The number that's missing is now reported in the error. This is done using the golang error wrap feature. We're keeping a constant value for ErrSumNotSupported for compatibility and change avoidance reasons. Code that cares about this exact error will still have to change; it is now necessary to use `errors.Is` to detect it. The text in the ErrSumNotSupported value is also updated, because the text no longer made sense given an open registry system. As a target-of-opportunity fix, it also now follows golang normative conventions for error messages (no capitalization, no punctuation). --- registry.go | 5 +++-- sum.go | 2 +- sum_test.go | 6 ++++-- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/registry.go b/registry.go index 70652c3..39a1cca 100644 --- a/registry.go +++ b/registry.go @@ -5,6 +5,7 @@ import ( "crypto/sha1" "crypto/sha256" "crypto/sha512" + "fmt" "hash" ) @@ -50,11 +51,11 @@ func Register(indicator uint64, hasherFactory func() hash.Hash) { // Other hash implementations can be made available by using the Register function. // The 'go-mulithash/register/*' packages can also be imported to gain more common hash functions. // -// If an error is returned, it will be ErrSumNotSupported. +// If an error is returned, it will match `errors.Is(err, ErrSumNotSupported)`. func GetHasher(indicator uint64) (hash.Hash, error) { factory, exists := registry[indicator] if !exists { - return nil, ErrSumNotSupported // REVIEW: it's unfortunate that this error doesn't say what code was missing. Also "NotSupported" is a bit of a misnomer now. + return nil, fmt.Errorf("unknown multihash code %d (0x%x): %w", indicator, indicator, ErrSumNotSupported) } return factory(), nil } diff --git a/sum.go b/sum.go index b0563f2..53c1805 100644 --- a/sum.go +++ b/sum.go @@ -5,7 +5,7 @@ import ( ) // ErrSumNotSupported is returned when the Sum function code is not implemented -var ErrSumNotSupported = errors.New("Function not implemented. Complain to lib maintainer.") +var ErrSumNotSupported = errors.New("no such hash registered") var ErrLenTooLarge = errors.New("requested length was too large for digest") diff --git a/sum_test.go b/sum_test.go index 0ad3b75..c2e9c1b 100644 --- a/sum_test.go +++ b/sum_test.go @@ -3,6 +3,7 @@ package multihash_test import ( "bytes" "encoding/hex" + "errors" "fmt" "runtime" "testing" @@ -132,8 +133,9 @@ func TestTooLargeLength(t *testing.T) { func TestBasicSum(t *testing.T) { for code, name := range multihash.Codes { _, err := multihash.Sum([]byte("test"), code, -1) - switch err { - case multihash.ErrSumNotSupported, nil: + switch { + case errors.Is(err, multihash.ErrSumNotSupported): + case err == nil: default: t.Errorf("unexpected error for %s: %s", name, err) } From 393ba0d0da68bda54a4eaf8f1c17ec9460c4c31f Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Wed, 10 Mar 2021 09:59:32 +0100 Subject: [PATCH 13/17] Revive TestSmallerLengthHashID, and add a special case for identity multihash that rejects truncations. See https://github.com/multiformats/go-multihash/pull/136#discussion_r587880523 for discussion. This change means Sum behaves slightly differently for identity multihashes than it does for any other multihash. I'm not keeping score on the number of ways identity multihash is weird anymore, just documenting it and keeping tests passing. The error message is lifted from the old `sumID` function verbatim. --- sum.go | 7 +++++++ sum_test.go | 27 +++++++++++++++++++++++++++ 2 files changed, 34 insertions(+) diff --git a/sum.go b/sum.go index 53c1805..c18deb1 100644 --- a/sum.go +++ b/sum.go @@ -2,6 +2,7 @@ package multihash import ( "errors" + "fmt" ) // ErrSumNotSupported is returned when the Sum function code is not implemented @@ -27,6 +28,7 @@ func Sum(data []byte, code uint64, length int) (Multihash, error) { sum := hasher.Sum(nil) // Deal with any truncation. + // Unless it's an identity multihash. Those have different rules. if length < 0 { length = hasher.Size() } @@ -34,6 +36,11 @@ func Sum(data []byte, code uint64, length int) (Multihash, error) { return nil, ErrLenTooLarge } if length >= 0 { + if code == IDENTITY { + if length != len(sum) { + return nil, fmt.Errorf("the length of the identity hash (%d) must be equal to the length of the data (%d)", length, len(sum)) + } + } sum = sum[:length] } diff --git a/sum_test.go b/sum_test.go index c2e9c1b..512231c 100644 --- a/sum_test.go +++ b/sum_test.go @@ -123,6 +123,33 @@ func BenchmarkBlake2B(b *testing.B) { } } +func TestSmallerLengthHashID(t *testing.T) { + + data := []byte("Identity hash input data.") + dataLength := len(data) + + // Normal case: `length == len(data)`. + _, err := multihash.Sum(data, multihash.ID, dataLength) + if err != nil { + t.Fatal(err) + } + + // Unconstrained length (-1): also allowed. + _, err = multihash.Sum(data, multihash.ID, -1) + if err != nil { + t.Fatal(err) + } + + // Any other variation of those two scenarios should fail. + for l := (dataLength - 1); l >= 0; l-- { + _, err = multihash.Sum(data, multihash.ID, l) + if err == nil { + t.Fatal(fmt.Sprintf("identity hash of length %d smaller than data length %d didn't fail", + l, dataLength)) + } + } +} + func TestTooLargeLength(t *testing.T) { _, err := multihash.Sum([]byte("test"), multihash.SHA2_256, 33) if err != multihash.ErrLenTooLarge { From aff257041a0fdea8fbd110440b9fe3cd199bf7d7 Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Wed, 10 Mar 2021 10:56:39 +0100 Subject: [PATCH 14/17] travis: update go version. We'll need something a little more recent than 1.11.x to be able to use the fmt "%w" and errors.Is features. --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 09f9a4c..02f3c37 100644 --- a/.travis.yml +++ b/.travis.yml @@ -4,7 +4,7 @@ os: language: go go: - - 1.11.x + - 1.15.x env: global: From 4b5e8aaf4ad0db7d9157c3c051e00d0e4ac6bf9b Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Wed, 10 Mar 2021 11:18:30 +0100 Subject: [PATCH 15/17] update dependencies, and go mod tidy. I swear I did not mean to be doing this on a branch, but: https://travis-ci.com/github/multiformats/go-multihash/builds/219591152 - it became necessary to update the go version in CI for %w features - that apparently adds pointer alignment checking to the compiler... - which flunks some stuff in x/crypto... So, okay, we're updating libraries now! --- go.mod | 9 +++++---- go.sum | 28 ++++++++++++++-------------- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/go.mod b/go.mod index 2b31327..05b10f1 100644 --- a/go.mod +++ b/go.mod @@ -2,10 +2,11 @@ module github.com/multiformats/go-multihash require ( github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1 - github.com/minio/sha256-simd v0.1.1-0.20190913151208-6de447530771 - github.com/mr-tron/base58 v1.1.3 - github.com/multiformats/go-varint v0.0.5 - golang.org/x/crypto v0.0.0-20190611184440-5c40567a22f8 + github.com/minio/sha256-simd v1.0.0 + github.com/mr-tron/base58 v1.2.0 + github.com/multiformats/go-varint v0.0.6 + golang.org/x/crypto v0.0.0-20210220033148-5ea612d1eb83 + golang.org/x/sys v0.0.0-20210309074719-68d13333faf2 // indirect ) go 1.13 diff --git a/go.sum b/go.sum index 3a25627..46bff15 100644 --- a/go.sum +++ b/go.sum @@ -1,20 +1,20 @@ +github.com/klauspost/cpuid/v2 v2.0.4 h1:g0I61F2K2DjRHz1cnxlkNSBIaePVoJIjjnHui8QHbiw= +github.com/klauspost/cpuid/v2 v2.0.4/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg= github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1 h1:lYpkrQH5ajf0OXOcUbGjvZxxijuBwbbmlSxLiuofa+g= github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1/go.mod h1:pD8RvIylQ358TN4wwqatJ8rNavkEINozVn9DtGI3dfQ= -github.com/minio/sha256-simd v0.1.1-0.20190913151208-6de447530771 h1:MHkK1uRtFbVqvAgvWxafZe54+5uBxLluGylDiKgdhwo= -github.com/minio/sha256-simd v0.1.1-0.20190913151208-6de447530771/go.mod h1:B5e1o+1/KgNmWrSQK08Y6Z1Vb5pwIktudl0J58iy0KM= -github.com/mr-tron/base58 v1.1.3 h1:v+sk57XuaCKGXpWtVBX8YJzO7hMGx4Aajh4TQbdEFdc= -github.com/mr-tron/base58 v1.1.3/go.mod h1:BinMc/sQntlIE1frQmRFPUoPA1Zkr8VRgBdjWI2mNwc= -github.com/multiformats/go-varint v0.0.3 h1:1OZFaq4XbSNQE6ujqgr6/EIZlgHE7DmojAFsLqAJ26M= -github.com/multiformats/go-varint v0.0.3/go.mod h1:3Ls8CIEsrijN6+B7PbrXRPxHRPuXSrVKRY101jdMZYE= -github.com/multiformats/go-varint v0.0.4 h1:CplQWhUouUgTZ53vNFE8VoWr2VjaKXci+xyrKyyFuSw= -github.com/multiformats/go-varint v0.0.4/go.mod h1:3Ls8CIEsrijN6+B7PbrXRPxHRPuXSrVKRY101jdMZYE= -github.com/multiformats/go-varint v0.0.5 h1:XVZwSo04Cs3j/jS0uAEPpT3JY6DzMcVLLoWOSnCxOjg= -github.com/multiformats/go-varint v0.0.5/go.mod h1:3Ls8CIEsrijN6+B7PbrXRPxHRPuXSrVKRY101jdMZYE= +github.com/minio/sha256-simd v1.0.0 h1:v1ta+49hkWZyvaKwrQB8elexRqm6Y0aMLjCNsrYxo6g= +github.com/minio/sha256-simd v1.0.0/go.mod h1:OuYzVNI5vcoYIAmbIvHPl3N3jUzVedXbKy5RFepssQM= +github.com/mr-tron/base58 v1.2.0 h1:T/HDJBh4ZCPbU39/+c3rRvE0uKBQlU27+QI8LJ4t64o= +github.com/mr-tron/base58 v1.2.0/go.mod h1:BinMc/sQntlIE1frQmRFPUoPA1Zkr8VRgBdjWI2mNwc= +github.com/multiformats/go-varint v0.0.6 h1:gk85QWKxh3TazbLxED/NlDVv8+q+ReFJk7Y2W/KhfNY= +github.com/multiformats/go-varint v0.0.6/go.mod h1:3Ls8CIEsrijN6+B7PbrXRPxHRPuXSrVKRY101jdMZYE= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= -golang.org/x/crypto v0.0.0-20190611184440-5c40567a22f8 h1:1wopBVtVdWnn03fZelqdXTqk7U7zPQCb+T4rbU9ZEoU= -golang.org/x/crypto v0.0.0-20190611184440-5c40567a22f8/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= +golang.org/x/crypto v0.0.0-20210220033148-5ea612d1eb83 h1:/ZScEX8SfEmUGRHs0gxpqteO5nfNW6axyZbBdw9A12g= +golang.org/x/crypto v0.0.0-20210220033148-5ea612d1eb83/go.mod h1:jdWPYTVW3xRLrWPugEBEK3UY2ZEsg3UU495nc5E+M+I= golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= -golang.org/x/sys v0.0.0-20190412213103-97732733099d h1:+R4KGOnez64A81RvjARKc4UT5/tI9ujCIVX+P5KiHuI= -golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210309074719-68d13333faf2 h1:46ULzRKLh1CwgRq2dC5SlBzEqqNCi8rreOZnNrbqcIY= +golang.org/x/sys v0.0.0-20210309074719-68d13333faf2/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/term v0.0.0-20201117132131-f5c789dd3221/go.mod h1:Nr5EML6q2oocZ2LXRh80K7BxOlk5/8JxuGnuhpl+muw= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= From 1a96911f4911167fa39e59c1e5f5ce9ea754dba4 Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Wed, 10 Mar 2021 22:56:27 +0100 Subject: [PATCH 16/17] Reintroduce DefaultLengths; populate during Register. While it was possible to remove all use of this from this repo, when attempting propagate changes to downstreams consuming it, it became apparently that other repos also rely on this symbol. Whether or not those usages are important and intentional, whether they're actually worth maintaining, and whether they'd be replacable with other approaches... is not considered at this time. (Probably we should be asking this! The first occasions where this cropped up are in other functions that have been marked "deprecated" since... 2018! But... chasing those things down and straightening them out is becoming problematic. Perhaps we'll be more ready to revisit these things at a later date.) --- registry.go | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/registry.go b/registry.go index 39a1cca..64e8af3 100644 --- a/registry.go +++ b/registry.go @@ -38,6 +38,7 @@ func Register(indicator uint64, hasherFactory func() hash.Hash) { panic("not sensible to attempt to register a nil function") } registry[indicator] = hasherFactory + DefaultLengths[indicator] = hasherFactory().Size() } // GetHasher returns a new hash.Hash according to the indicator code number provided. @@ -60,6 +61,12 @@ func GetHasher(indicator uint64) (hash.Hash, error) { return factory(), nil } +// DefaultLengths maps a multihash indicator code to the output size for that hash, in units of bytes. +// +// This map is populated when a hash function is registered by the Register function. +// It's effectively a shortcut for asking Size() on the hash.Hash. +var DefaultLengths = map[uint64]int{} + func init() { Register(IDENTITY, func() hash.Hash { return &identityMultihash{} }) Register(MD5, md5.New) From cebc9f8bf8d8725a8b23e4a2cfb7f61e2bcdcf64 Mon Sep 17 00:00:00 2001 From: Eric Myhre Date: Thu, 11 Mar 2021 01:15:52 +0100 Subject: [PATCH 17/17] Add core package (min deps) and give root package full transitive deps again. The transitive dependencies in the root package are still managed by this whole registry system (which now resides in the 'core' package), so we have parity and *mostly* just one suite of code to maintain. You can now use this 'core' package when interested in dependency minimization. In exchange, the "register/*" imports may be required. You can just also just yank updates to go-multihash, and if you were already using it (and happen to be using one of these now-optional(-but-only-if-you-use-core))... nothing should actually change; the "register/*" imports won't be required because the root package does them for you. Some constants are replicated. This was the minimum step necessary to avoid import cycles. I'm not spending time prettifying it because we really probably ought to be refactoring this to use the package of constants in go-multicodec that's automatically generated, and yet, that is a scope limit we're trying not to cross during this changeset. --- errata.go => core/errata.go | 0 core/magic.go | 26 +++++++ core/registry.go | 77 ++++++++++++++++++ register/blake2/multihash_blake2.go | 2 +- register/miniosha256/multihash_miniosha256.go | 2 +- register/sha3/multihash_sha3.go | 2 +- registry.go | 78 ++++--------------- sum.go | 4 +- 8 files changed, 125 insertions(+), 66 deletions(-) rename errata.go => core/errata.go (100%) create mode 100644 core/magic.go create mode 100644 core/registry.go diff --git a/errata.go b/core/errata.go similarity index 100% rename from errata.go rename to core/errata.go diff --git a/core/magic.go b/core/magic.go new file mode 100644 index 0000000..78ae8c1 --- /dev/null +++ b/core/magic.go @@ -0,0 +1,26 @@ +package multihash + +import "errors" + +// ErrSumNotSupported is returned when the Sum function code is not implemented +var ErrSumNotSupported = errors.New("no such hash registered") + +// constants +const ( + IDENTITY = 0x00 + SHA1 = 0x11 + SHA2_256 = 0x12 + SHA2_512 = 0x13 + SHA3_224 = 0x17 + SHA3_256 = 0x16 + SHA3_384 = 0x15 + SHA3_512 = 0x14 + KECCAK_224 = 0x1A + KECCAK_256 = 0x1B + KECCAK_384 = 0x1C + KECCAK_512 = 0x1D + SHAKE_128 = 0x18 + SHAKE_256 = 0x19 + MD5 = 0xd5 + DBL_SHA2_256 = 0x56 +) diff --git a/core/registry.go b/core/registry.go new file mode 100644 index 0000000..64e8af3 --- /dev/null +++ b/core/registry.go @@ -0,0 +1,77 @@ +package multihash + +import ( + "crypto/md5" + "crypto/sha1" + "crypto/sha256" + "crypto/sha512" + "fmt" + "hash" +) + +// registry is a simple map which maps a multihash indicator number +// to a standard golang Hash interface. +// +// Multihash indicator numbers are reserved and described in +// https://github.com/multiformats/multicodec/blob/master/table.csv . +// The keys used in this map must match those reservations. +// +// Hashers which are available in the golang stdlib will be registered automatically. +// Others can be added using the Register function. +var registry = make(map[uint64]func() hash.Hash) + +// Register adds a new hash to the set available from GetHasher and Sum. +// +// Register has a global effect and should only be used at package init time to avoid data races. +// +// The indicator code should be per the numbers reserved and described in +// https://github.com/multiformats/multicodec/blob/master/table.csv . +// +// If Register is called with the same indicator code more than once, the last call wins. +// In practice, this means that if an application has a strong opinion about what implementation to use for a certain hash +// (e.g., perhaps they want to override the sha256 implementation to use a special hand-rolled assembly variant +// rather than the stdlib one which is registered by default), +// then this can be done by making a Register call with that effect at init time in the application's main package. +// This should have the desired effect because the root of the import tree has its init time effect last. +func Register(indicator uint64, hasherFactory func() hash.Hash) { + if hasherFactory == nil { + panic("not sensible to attempt to register a nil function") + } + registry[indicator] = hasherFactory + DefaultLengths[indicator] = hasherFactory().Size() +} + +// GetHasher returns a new hash.Hash according to the indicator code number provided. +// +// The indicator code should be per the numbers reserved and described in +// https://github.com/multiformats/multicodec/blob/master/table.csv . +// +// The actual hashers available are determined by what has been registered. +// The registry automatically contains those hashers which are available in the golang standard libraries +// (which includes md5, sha1, sha256, sha384, sha512, and the "identity" mulithash, among others). +// Other hash implementations can be made available by using the Register function. +// The 'go-mulithash/register/*' packages can also be imported to gain more common hash functions. +// +// If an error is returned, it will match `errors.Is(err, ErrSumNotSupported)`. +func GetHasher(indicator uint64) (hash.Hash, error) { + factory, exists := registry[indicator] + if !exists { + return nil, fmt.Errorf("unknown multihash code %d (0x%x): %w", indicator, indicator, ErrSumNotSupported) + } + return factory(), nil +} + +// DefaultLengths maps a multihash indicator code to the output size for that hash, in units of bytes. +// +// This map is populated when a hash function is registered by the Register function. +// It's effectively a shortcut for asking Size() on the hash.Hash. +var DefaultLengths = map[uint64]int{} + +func init() { + Register(IDENTITY, func() hash.Hash { return &identityMultihash{} }) + Register(MD5, md5.New) + Register(SHA1, sha1.New) + Register(SHA2_256, sha256.New) + Register(SHA2_512, sha512.New) + Register(DBL_SHA2_256, func() hash.Hash { return &doubleSha256{sha256.New()} }) +} diff --git a/register/blake2/multihash_blake2.go b/register/blake2/multihash_blake2.go index 3d20041..de8f51c 100644 --- a/register/blake2/multihash_blake2.go +++ b/register/blake2/multihash_blake2.go @@ -18,7 +18,7 @@ import ( "github.com/minio/blake2b-simd" "golang.org/x/crypto/blake2s" - "github.com/multiformats/go-multihash" + "github.com/multiformats/go-multihash/core" ) const ( diff --git a/register/miniosha256/multihash_miniosha256.go b/register/miniosha256/multihash_miniosha256.go index 1b9028d..66eccd5 100644 --- a/register/miniosha256/multihash_miniosha256.go +++ b/register/miniosha256/multihash_miniosha256.go @@ -15,7 +15,7 @@ package miniosha256 import ( "github.com/minio/sha256-simd" - "github.com/multiformats/go-multihash" + "github.com/multiformats/go-multihash/core" ) func init() { diff --git a/register/sha3/multihash_sha3.go b/register/sha3/multihash_sha3.go index f8b1692..db70b2b 100644 --- a/register/sha3/multihash_sha3.go +++ b/register/sha3/multihash_sha3.go @@ -18,7 +18,7 @@ import ( "golang.org/x/crypto/sha3" - "github.com/multiformats/go-multihash" + "github.com/multiformats/go-multihash/core" ) func init() { diff --git a/registry.go b/registry.go index 64e8af3..1ca1790 100644 --- a/registry.go +++ b/registry.go @@ -1,77 +1,31 @@ package multihash import ( - "crypto/md5" - "crypto/sha1" - "crypto/sha256" - "crypto/sha512" - "fmt" "hash" -) -// registry is a simple map which maps a multihash indicator number -// to a standard golang Hash interface. -// -// Multihash indicator numbers are reserved and described in -// https://github.com/multiformats/multicodec/blob/master/table.csv . -// The keys used in this map must match those reservations. -// -// Hashers which are available in the golang stdlib will be registered automatically. -// Others can be added using the Register function. -var registry = make(map[uint64]func() hash.Hash) + mhreg "github.com/multiformats/go-multihash/core" -// Register adds a new hash to the set available from GetHasher and Sum. -// -// Register has a global effect and should only be used at package init time to avoid data races. -// -// The indicator code should be per the numbers reserved and described in -// https://github.com/multiformats/multicodec/blob/master/table.csv . + _ "github.com/multiformats/go-multihash/register/all" + _ "github.com/multiformats/go-multihash/register/miniosha256" +) + +// Register is an alias for Register in the core package. // -// If Register is called with the same indicator code more than once, the last call wins. -// In practice, this means that if an application has a strong opinion about what implementation to use for a certain hash -// (e.g., perhaps they want to override the sha256 implementation to use a special hand-rolled assembly variant -// rather than the stdlib one which is registered by default), -// then this can be done by making a Register call with that effect at init time in the application's main package. -// This should have the desired effect because the root of the import tree has its init time effect last. +// Consider using the core package instead of this multihash package; +// that package does not introduce transitive dependencies except for those you opt into, +// and will can result in smaller application builds. func Register(indicator uint64, hasherFactory func() hash.Hash) { - if hasherFactory == nil { - panic("not sensible to attempt to register a nil function") - } - registry[indicator] = hasherFactory - DefaultLengths[indicator] = hasherFactory().Size() + mhreg.Register(indicator, hasherFactory) } -// GetHasher returns a new hash.Hash according to the indicator code number provided. -// -// The indicator code should be per the numbers reserved and described in -// https://github.com/multiformats/multicodec/blob/master/table.csv . -// -// The actual hashers available are determined by what has been registered. -// The registry automatically contains those hashers which are available in the golang standard libraries -// (which includes md5, sha1, sha256, sha384, sha512, and the "identity" mulithash, among others). -// Other hash implementations can be made available by using the Register function. -// The 'go-mulithash/register/*' packages can also be imported to gain more common hash functions. +// Register is an alias for Register in the core package. // -// If an error is returned, it will match `errors.Is(err, ErrSumNotSupported)`. +// Consider using the core package instead of this multihash package; +// that package does not introduce transitive dependencies except for those you opt into, +// and will can result in smaller application builds. func GetHasher(indicator uint64) (hash.Hash, error) { - factory, exists := registry[indicator] - if !exists { - return nil, fmt.Errorf("unknown multihash code %d (0x%x): %w", indicator, indicator, ErrSumNotSupported) - } - return factory(), nil + return mhreg.GetHasher(indicator) } // DefaultLengths maps a multihash indicator code to the output size for that hash, in units of bytes. -// -// This map is populated when a hash function is registered by the Register function. -// It's effectively a shortcut for asking Size() on the hash.Hash. -var DefaultLengths = map[uint64]int{} - -func init() { - Register(IDENTITY, func() hash.Hash { return &identityMultihash{} }) - Register(MD5, md5.New) - Register(SHA1, sha1.New) - Register(SHA2_256, sha256.New) - Register(SHA2_512, sha512.New) - Register(DBL_SHA2_256, func() hash.Hash { return &doubleSha256{sha256.New()} }) -} +var DefaultLengths = mhreg.DefaultLengths diff --git a/sum.go b/sum.go index c18deb1..6d01fe6 100644 --- a/sum.go +++ b/sum.go @@ -3,10 +3,12 @@ package multihash import ( "errors" "fmt" + + mhreg "github.com/multiformats/go-multihash/core" ) // ErrSumNotSupported is returned when the Sum function code is not implemented -var ErrSumNotSupported = errors.New("no such hash registered") +var ErrSumNotSupported = mhreg.ErrSumNotSupported var ErrLenTooLarge = errors.New("requested length was too large for digest")