Architecture#
Strongbox is a distributed, eventually consistent, high available object store for arbitrarily large objects.
Design goals#
Must-haves:
- The data store shall be high available. Specifically:
- Unavailability of one node shall have no impact at all.
- Unavailability of two nodes shall not impact read or write functionality. It is acceptable that a two-node-outage affects availability of some objects, which are only single-available for the duration of the outage.
- It shall be possible to recover from a catastrophic loss of a node.
- The object store shall scale horizontally, regarding diskspace and throughput.
- Objects can be arbitrarily large.
- No-master design: all nodes are equal.
Wants:
- Multi-tenant support, segregating objects into different namespaces.
- Support for heterogenous hardware.
- 100Mbit/s sustained write-performance, 200Mbit/s sustained read-performance.
- Deleting objects shall have little impact on the performance.
Concepts#
The Strongbox store consists of a number of nodes forming a loose cluster. The Strongbox common library, included in the service or application facing the users, provides a Strongbox client to access to the store. All Strongbox store nodes share a relational database instance and have access to a message broker for asynchronous communication. In additional, each Strongbox store node uses its local filesystem to persist the payload.
Streams vs Containers & Chunks#
The data in the domain of users, service or applications using Strongbox is seen as a stream from which Bytes can be read or written to. Strongbox store nodes hold data in chunks of maximum 4MB size. The Strongbox common library translates between the two, splitting a stream into chunks, or assembling it again from chunks. The collection of chunks representing a stream is called a container.
The Strongbox common library uploads chunks to the store nodes, or fetches chunks from them. Chunk uploads of size exceeding 4MB are rejected by the store. Chunks are identified by their unique ID and addressed by the combination of tenant UID, container UID and chunk UID. The Strongbox store helps clients to identify which chunks of a container are necessary when assembling a stream, or a slice thereof.
Consistent Hash Ring#
Chunks are distributed across the Strongbox store cluster. However, not every Strongbox store node holds all chunks. Chunks are distributed across the cluster following a deterministic and reproducible allocation called consistent hashing:
- The Integer number range is connected at the ends, forming a circle or ring.
- The Strongbox store nodes claim equal parts of the ring as belonging to them. It is now called a ring node.
- The
hashcode()
of a chunk address, an Integer, points to position on the ring, belonging to a ring node.
As a result, anyone with the same understanding of
- the participating ring nodes
- the algorithm of
hashcode()
can reproduce the mapping of a chunk address to the responsible Strongbox store node. This process is called locating a chunk on the ring. When a chunk address maps to a ring node, it is considered local to it.
Replication of Chunks#
Chunks are replicated to multiple ring nodes to protect against individual node failure affecting overall service. Since all Strongbox store nodes are equal, any of them can be addressed when storing or fetching chunks. The ring nodes are selected by locating the chunk on the ring, selecting the ring node that claims the section on the hash ring, and then going forward in the ring clockwise, selecting the following ring nodes as well.
The standard number of replicas is three.
As a consequence, when locating a chunk on the ring, the result is a list of three completely equal ring nodes. It is recommended that clients shuffle the list before selecting a ring node to connect to.
Push- vs Pull-Replication#
When a client uploads a new chunk, it uses the hash ring to locate three ring nodes responsible for the chunk. A single ring node is randomly selected from the set, and the client uploads data to the ring node.
The receiving ring node performs the same lookup to confirm the chunk is local to it, and to determine the other two responsible ring nodes.
The ring node first stores the chunk in its local filesystem and registers it in the relational database. In order to achieve required redundancy, it replicates it to the other two ring nodes by uploading the chunk to them ("push-replication"). The ring node holds the clients upload request until the first replication was successful.
If a ring node is not available due to planned maintenance or temporary outage, the replication is queued for later. When that ring node becomes available again, it works through its queue of replication requests using pull-replication: The address of the chunk, contained in the replication request, is located on the ring, and the chunk payload is requested from the responsible ring nodes. If successful, the data is stored in the local filesystem, and the availability expectation is fulfilled.
Multi-Tenancy#
The Strongbox separates data from different user-facing applications or services in individual tenants. Chunks are adressed by the combination of tenant UID, container UID and chunk UID.
A tenant must be registered in the Strongbox store by an operator first, before an application can start using it.