Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load data to a table but different sum shards size in worker nodes #7600

Open
yinan8128 opened this issue May 15, 2024 · 1 comment
Open

Comments

@yinan8128
Copy link

yinan8128 commented May 15, 2024

Hi Team,

We load data to 1 table, but sum shards size are different in worker nodes

ALTER TABLE ONLY distribute_table ADD CONSTRAINT distribute_table_pkey PRIMARY KEY (tenant_uuid,id);

SELECT create_distributed_table('distribute_table', 'tenant_uuid');

psql -h localhost -p 5432 -d load15_taskmanager -U tm_dflt_user -f /opt/task_manager_dump/distribute_table_data.sql

SELECT nodename,sum(shard_size) FROM citus_shards where table_name='distribute_table'::regclass group by nodename;
        nodename         |     sum
-------------------------+-------------
 citus_vm_02 | 18296504320
 citus_vm_03 |  5631139840
 citus_vm_04 |  5836931072
 citus_vm_05 |  5470191616
(4 rows)

SELECT nodename,count(*) FROM citus_shards where table_name='tasks_export'::regclass group by nodename;
        nodename         | count
-------------------------+-------
 taskmanager_citus_vm_02 |    16
 taskmanager_citus_vm_03 |    16
 taskmanager_citus_vm_04 |    16
 taskmanager_citus_vm_05 |    16
(4 rows)

The shard count is same in each work node, so it is because that some tenant have more records than others,

Please help answer:
1.We know that rebalance can balance size, but just it is difficult for storage planning, for example we want to migrate normal postgresql 400GB to 4 worker nodes citusdb, plan 100GB per worker node, is there any solution can load data have same sum shard size in worker nodes?

2.If question 1 no solution:
2.1. If every time is the first work node have the biggest sum shards size?
2.2. If question 2.1 answer is yes, what biggest size percentage of total size will be in the first work node, 50% 200GB or other value?
2.3. If question 2.1 answer is yes, what biggest size percentage of total size will be in other work nodes?

@yinan8128
Copy link
Author

Hi Team,

Could you please help check? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant