Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniformly distribute data in all shards #7539

Open
tu5har opened this issue Feb 27, 2024 Discussed in #7535 · 0 comments
Open

Uniformly distribute data in all shards #7539

tu5har opened this issue Feb 27, 2024 Discussed in #7535 · 0 comments

Comments

@tu5har
Copy link

tu5har commented Feb 27, 2024

Discussed in #7535

Originally posted by tu5har February 24, 2024
PG: 15
Citus: 12.1

Hi All,

We have a large table distributed on a date column(event_date) with a shard count 128.
It has data for more than 400 days and the total size has reached almost 4TB.

We checked the shard sizes, seems the data is not uniform across all the shards. Some shards are fat having more than 100GB of data while many have very less data. Data size for each day is almost the same around, So the expectation is all shards should have almost similar sizes; and not be so skewed. This impacts the query latency when it hits a fat shard.

Can someone let us know how to redistribute data among the shards uniformly?
Below is a snapshot of shard sizes

SELECT * FROM run_command_on_shards('daily_data', $cmd$
  SELECT json_build_object(
    'shard_name', '%1$s',
    'size',       pg_size_pretty(pg_table_size('%1$s'))
  );
$cmd$)
shardid | success | result
-- | -- | --
102170 | TRUE | {"shard_name" : "daily_data_102170", "size"   : "45 GB"}
102171 | TRUE | {"shard_name" : "daily_data_102171", "size"   : "39 GB"}
102172 | TRUE | {"shard_name" : "daily_data_102172", "size"   : "16 GB"}
102173 | TRUE | {"shard_name" : "daily_data_102173", "size"   : "39 GB"}
102174 | TRUE | {"shard_name" : "daily_data_102174", "size"   : "37 GB"}
102175 | TRUE | {"shard_name" : "daily_data_102175", "size"   : "60 GB"}
102176 | TRUE | {"shard_name" : "daily_data_102176", "size"   : "27 GB"}
102177 | TRUE | {"shard_name" : "daily_data_102177", "size"   : "60 GB"}
102178 | TRUE | {"shard_name" : "daily_data_102178", "size"   : "10 GB"}
102179 | TRUE | {"shard_name" : "daily_data_102179", "size"   : "25 GB"}
102180 | TRUE | {"shard_name" : "daily_data_102180", "size"   : "19 GB"}
102181 | TRUE | {"shard_name" : "daily_data_102181", "size"   : "31 GB"}
102182 | TRUE | {"shard_name" : "daily_data_102182", "size"   : "54 GB"}
102183 | TRUE | {"shard_name" : "daily_data_102183", "size"   : "47 GB"}
102184 | TRUE | {"shard_name" : "daily_data_102184", "size"   : "112 GB"}
102185 | TRUE | {"shard_name" : "daily_data_102185", "size"   : "21 GB"}
102186 | TRUE | {"shard_name" : "daily_data_102186", "size"   : "15 GB"}
102187 | TRUE | {"shard_name" : "daily_data_102187", "size"   : "7567 MB"}
102188 | TRUE | {"shard_name" : "daily_data_102188", "size"   : "8192 bytes"}
102189 | TRUE | {"shard_name" : "daily_data_102189", "size"   : "14 GB"}
102190 | TRUE | {"shard_name" : "daily_data_102190", "size"   : "53 GB"}
102191 | TRUE | {"shard_name" : "daily_data_102191", "size"   : "49 GB"}
102192 | TRUE | {"shard_name" : "daily_data_102192", "size"   : "48 GB"}
102193 | TRUE | {"shard_name" : "daily_data_102193", "size"   : "16 GB"}
102194 | TRUE | {"shard_name" : "daily_data_102194", "size"   : "40 GB"}
102195 | TRUE | {"shard_name" : "daily_data_102195", "size"   : "50 GB"}
102196 | TRUE | {"shard_name" : "daily_data_102196", "size"   : "56 GB"}
102197 | TRUE | {"shard_name" : "daily_data_102197", "size"   : "37 GB"}
102198 | TRUE | {"shard_name" : "daily_data_102198", "size"   : "33 GB"}
102199 | TRUE | {"shard_name" : "daily_data_102199", "size"   : "44 GB"}
102200 | TRUE | {"shard_name" : "daily_data_102200", "size"   : "9117 MB"}
102201 | TRUE | {"shard_name" : "daily_data_102201", "size"   : "9344 MB"}
102202 | TRUE | {"shard_name" : "daily_data_102202", "size"   : "30 GB"}
102203 | TRUE | {"shard_name" : "daily_data_102203", "size"   : "33 GB"}
102204 | TRUE | {"shard_name" : "daily_data_102204", "size"   : "60 GB"}
102205 | TRUE | {"shard_name" : "daily_data_102205", "size"   : "23 GB"}
102206 | TRUE | {"shard_name" : "daily_data_102206", "size"   : "25 GB"}
102207 | TRUE | {"shard_name" : "daily_data_102207", "size"   : "9212 MB"}
102208 | TRUE | {"shard_name" : "daily_data_102208", "size"   : "8192 bytes"}
102209 | TRUE | {"shard_name" : "daily_data_102209", "size"   : "71 GB"}
102210 | TRUE | {"shard_name" : "daily_data_102210", "size"   : "4295 MB"}
102211 | TRUE | {"shard_name" : "daily_data_102211", "size"   : "2553 MB"}
102212 | TRUE | {"shard_name" : "daily_data_102212", "size"   : "4334 MB"}
102213 | TRUE | {"shard_name" : "daily_data_102213", "size"   : "25 GB"}
.....
......
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant