Improve primary shards balancing/reduce primary shard write overhead #15919

hlcianfagna · 2024-04-25T08:12:14Z

Problem Statement

With the current shard allocation and balancing mechanisms, it is possible to have a situation where, for instance, given a 2-nodes cluster and a table with 4 shards and 1 replica, 3 primaries and 1 replica go to one node and 3 replicas and 1 primary to the other, instead of 2 primaries and 2 replicas on each.
In the large majority of cases this is not a problem, but in very busy systems, ingestion degradation of up to 25% can be seen as nodes with more primary shards will get fully utilized on the CPU while nodes with less primary but mostly replica shards aren't.

The main reason that primary shards aren't evenly balanced relates to that all of the current balancing logic (and related settings like cluster.routing.allocation.balance.index/shard) does not distinguish between a primary and a replica shard. Additionally, the available settings to control the cluster/index.total_shards_per_node will also not distinguish between a primary and a replica, but using these settings for shard balancing would be a kind of a workaround anyhow as the intention for these settings is more of a protection than a control mechanism and also can lead to a situation where no shards can be allocated at all.

Related to elastic/elasticsearch#41543, elastic/elasticsearch#17213, #14594.

Possible Solutions

Reduce primary write load so it will be almost the same as a replica write
Introduce primary-only related balancing, e.g. backport related changes of OpenSearch (as they do segment-based replication, their primaries will have more load in general)
Improve balancing logic to take write load into account similar like Elasticsearch did: Improve shard balancing elastic/elasticsearch#91603 although this seems to target a bit of a different problem of hot nodes in general and not related to primary vs. replica shard distribution.

Considered Alternatives

Disable automatic balancing of primary shards by setting cluster.routing.rebalance.enable to replicas.
Use ALTER TABLE ... REROUTE commands to redistribute primary shards.
Re-enable automatic balancing by resetting cluster.routing.rebalance.enable

The text was updated successfully, but these errors were encountered:

hlcianfagna · 2024-04-26T07:33:33Z

The following may also be relevant:

[Segment Replication] Introduce primary weight factor for primary shards distribution opensearch-project/OpenSearch#6017
[Segment Replication] Swap role of primary and replica shard opensearch-project/OpenSearch#6481
[Segment Replication] Allocation and rebalancing based on average primary shard count per index opensearch-project/OpenSearch#6422

hlcianfagna added feature: performance feature: administration labels Apr 25, 2024

seut changed the title ~~Option to balance primary shards~~ Improve primary shards balancing/reduce primary shard write overhead May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve primary shards balancing/reduce primary shard write overhead #15919

Improve primary shards balancing/reduce primary shard write overhead #15919

hlcianfagna commented Apr 25, 2024 •

edited by hammerhead

hlcianfagna commented Apr 26, 2024

Improve primary shards balancing/reduce primary shard write overhead #15919

Improve primary shards balancing/reduce primary shard write overhead #15919

Comments

hlcianfagna commented Apr 25, 2024 • edited by hammerhead

Problem Statement

Possible Solutions

Considered Alternatives

hlcianfagna commented Apr 26, 2024

hlcianfagna commented Apr 25, 2024 •

edited by hammerhead