slug | id | title | date | comments | tags | description | |
---|---|---|---|---|---|---|---|
2018-07-21-data-partition-and-routing |
2018-07-21-data-partition-and-routing |
Data Partition and Routing |
2018-07-20 11:54 |
true |
|
The advantages of implementing data partition and routing are availability and read efficiency while consistency is the weakness. The routing abstract model is essentially two maps: key-partition map and partition-machine map. |
large dataset ⟶ scale out ⟶ data shard / partition ⟶ 1) routing for data access 2) replica for availability
- Pros
- availability
- read(parallelization, single read efficiency)
- Cons
- consistency
The routing abstract model is essentially just two maps: 1) key-partition map 2) partition-machine map
-
hash and mod
- (+) simple
- (-) flexibility (tight coupling two maps: adding and removing nodes (partition-machine map) disrupt existing key-partition map)
-
Virtual buckets: key--(hash)-->vBucket, vBucket--(table lookup)-->servers
- Usercase: Membase a.k.a Couchbase, Riak
- (+) flexibility, decoupling two maps
- (-) centralized lookup table
-
Consistent hashing and DHT
- [Chord] implementation
- virtual nodes: for load balance in heterogeneous data center
- Usercase: Dynamo, Cassandra
- (+) flexibility, hashing space decouples two maps. two maps use the same hash, but adding and removing nodes ==only impact succeeding nodes==.
- (-) network complexity, hard to maintain
sort by primary key, shard by range of primary key
range-server lookup table (e.g. HBase .META. table) + local tree-based index (e.g. LSM, B+)
(+) search for a range (-) log(n)
Usercase: Yahoo PNUTS, Azure, Bigtable