Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Availability improvements coming? Use Replicas for reads? #7602

Open
gnat opened this issue May 17, 2024 · 0 comments
Open

High Availability improvements coming? Use Replicas for reads? #7602

gnat opened this issue May 17, 2024 · 0 comments

Comments

@gnat
Copy link

gnat commented May 17, 2024

So I've been extensively investigating the following horizontal scale databases:

  • Citus
  • Vitess
  • CockroachDB
  • Yugabyte

Citus wins in a lot of ways... performance, ease of re-balancing, licensing, overall architectural simplicity of how tables are sharded and how you define sharding keys.. it's elegant! (In Vitess land, maintaining the VSchema and specifying shard boundaries yourself is a pain in the butt. Bravo to Citus.)

The biggest downside I've hit regarding Citus is the poor HA story in 2024. Please correct me if I'm missing something (please share your thoughts, solutions!).

  • Replicas seem to be "stand by" only and not usable for reads? This means adding 2x hardware that sits idle, just to achieve HA.
    • If you want synchronous writes, you're gonna have 3x the hardware sitting idle because the first replica needs its own replica to not hang.
  • The ideal place to have HA logic is in Citus. Patroni is more moving parts when Citus could just talk to itself. Creates coordination issues like the one above, and blows up Ops complexity.
    • Patroni also seems to require Ops intervention to "reset" it (or a risky "fail back")

As a newbie citus.shard_replication_factor = 2 looked like a low friction path forward for HA (no manual Ops intervention is wonderful!), but it breaks in my testing for this purpose (and apparently) can't be used for HA at all even if you're willing to give up FKs and sacrifice a bit of consistency. HA should ideally be this easy, even if the usable feature set is a bit limited, so we all don't have to wait years for a better HA solution.

  • Vitess HA is built-in with VTOrc- just put more replicas online whenever, they will be promoted as needed. Queries to both primary and replicas re-route in VTGate.
  • CockroachDB and Yugabyte just get HA for free with the replication model.

Citus really needs a simple baked in answer to HA. The Citus default would be an under-replicated dangerous state in the other databases listed.


Side note, example of the community getting confused by this:

These look great at first, but...

Not sure if Percona is aware but these tutorials only work once. shard_replication_factor = 2 breaks the 2nd time you fail / recover a node (ex: citus_disable_node() then citus_activate_node() then rebalance_table_shards())... You can witness the replicas disappear using: SELECT * from citus_shards;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant