You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the current shard allocation and balancing mechanisms, it is possible to have a situation where, for instance, given a 2-nodes cluster and a table with 4 shards and 1 replica, 3 primaries and 1 replica go to one node and 3 replicas and 1 primary to the other, instead of 2 primaries and 2 replicas on each.
In the large majority of cases this is not a problem, but in very busy systems, ingestion degradation of up to 25% can be seen as nodes with more primary shards will get fully utilized on the CPU while nodes with less primary but mostly replica shards aren't.
The main reason that primary shards aren't evenly balanced relates to that all of the current balancing logic (and related settings like cluster.routing.allocation.balance.index/shard) does not distinguish between a primary and a replica shard. Additionally, the available settings to control the cluster/index.total_shards_per_node will also not distinguish between a primary and a replica, but using these settings for shard balancing would be a kind of a workaround anyhow as the intention for these settings is more of a protection than a control mechanism and also can lead to a situation where no shards can be allocated at all.
Reduce primary write load so it will be almost the same as a replica write
Introduce primary-only related balancing, e.g. backport related changes of OpenSearch (as they do segment-based replication, their primaries will have more load in general)
Improve balancing logic to take write load into account similar like Elasticsearch did: Improve shard balancing elastic/elasticsearch#91603 although this seems to target a bit of a different problem of hot nodes in general and not related to primary vs. replica shard distribution.
Considered Alternatives
Disable automatic balancing of primary shards by setting cluster.routing.rebalance.enable to replicas.
Use ALTER TABLE ... REROUTE commands to redistribute primary shards.
Re-enable automatic balancing by resetting cluster.routing.rebalance.enable
The text was updated successfully, but these errors were encountered:
Problem Statement
With the current shard allocation and balancing mechanisms, it is possible to have a situation where, for instance, given a 2-nodes cluster and a table with 4 shards and 1 replica, 3 primaries and 1 replica go to one node and 3 replicas and 1 primary to the other, instead of 2 primaries and 2 replicas on each.
In the large majority of cases this is not a problem, but in very busy systems, ingestion degradation of up to 25% can be seen as nodes with more primary shards will get fully utilized on the CPU while nodes with less primary but mostly replica shards aren't.
The main reason that primary shards aren't evenly balanced relates to that all of the current balancing logic (and related settings like
cluster.routing.allocation.balance.index/shard
) does not distinguish between a primary and a replica shard. Additionally, the available settings to control thecluster/index.total_shards_per_node
will also not distinguish between a primary and a replica, but using these settings for shard balancing would be a kind of a workaround anyhow as the intention for these settings is more of a protection than a control mechanism and also can lead to a situation where no shards can be allocated at all.Related to elastic/elasticsearch#41543, elastic/elasticsearch#17213, #14594.
Possible Solutions
Considered Alternatives
cluster.routing.rebalance.enable
toreplicas
.ALTER TABLE ... REROUTE
commands to redistribute primary shards.cluster.routing.rebalance.enable
The text was updated successfully, but these errors were encountered: