Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared-data cluster Query Profile Metric Problems with accuracy #45669

Closed
kaka-zb opened this issue May 15, 2024 · 4 comments
Closed

Shared-data cluster Query Profile Metric Problems with accuracy #45669

kaka-zb opened this issue May 15, 2024 · 4 comments
Labels
type/bug Something isn't working

Comments

@kaka-zb
Copy link

kaka-zb commented May 15, 2024

Backgrounds

We tried to use sr replace elasticsearch to build the logging platform, and then during the testing process, we found that the query latency was very high, so we used query profile to locate the problem.

Problem description

image
image

With this query profile result, it looks like there's a data skew problem,but there is something wrong with the metric data.

According to the sr document, InstanceNum means Number of all FragmentInstances for this Fragment, so the number of fragmentInstances for this Fragment is 6, but the iotime metrics above seems to be not correct, max iotime divided by 6 is also much larger than the average iotime.

StarRocks version

  • 3.2.6-2585333 (shared-data StarRocks cluster)
@kaka-zb kaka-zb added the type/bug Something isn't working label May 15, 2024
@kaka-zb
Copy link
Author

kaka-zb commented May 15, 2024

query_profile.txt

@kaka-zb kaka-zb changed the title Query Profile Metric Merging and MIN/MAX Values Problems with accuracy Shared-data cluster Query Profile Metric Problems with accuracy May 15, 2024
@wupan-olo
Copy link

instance是6,pipeline还有还有自己的并行度:

Pipeline (id=0):
         - ActiveTime: 178.376ms
           - __MAX_OF_ActiveTime: 742.777ms
           - __MIN_OF_ActiveTime: 1.555ms
         - BlockByInputEmpty: 2.810K (2810)
           - __MAX_OF_BlockByInputEmpty: 502
           - __MIN_OF_BlockByInputEmpty: 4
         - BlockByOutputFull: 0
         - BlockByPrecondition: 0
         - DegreeOfParallelism: 4
         - DriverPrepareTime: 41.537us
           - __MAX_OF_DriverPrepareTime: 87.626us
           - __MIN_OF_DriverPrepareTime: 25.631us
         - DriverTotalTime: 1s687ms
           - __MAX_OF_DriverTotalTime: 5s857ms
           - __MIN_OF_DriverTotalTime: 709.111ms

并行度是4,所以io task一共是 4*6

@before-Sunrise
Copy link
Contributor

instance number is not the smallest concurrency granularity in StarRocks,every instance will use multi threads in io thread pool to execute io task

@kaka-zb
Copy link
Author

kaka-zb commented May 16, 2024

got it !

so, according to profile results blew, can we surmise that there is a data skew problem, whether or not it hits the block cache.

a demo that hits the cache

- IOTimeLocalDisk: 355.869ms
 - __MAX_OF_IOTimeLocalDisk: 25s710ms
 - __MIN_OF_IOTimeLocalDisk: 51.257us
- IOTimeRemote: 0ns

a demo that misses the cache

- IOTimeLocalDisk: 6.265ms
 - __MAX_OF_IOTimeLocalDisk: 70.878ms
 - __MIN_OF_IOTimeLocalDisk: 0ns
- IOTimeRemote: 4s430ms
 - __MAX_OF_IOTimeRemote: 55s242ms
 - __MIN_OF_IOTimeRemote: 0ns

FYI, we enabled the Block Cache, by set starlet_use_star_cache=true

@kaka-zb kaka-zb closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants