-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(compactor): Compactor potential oom risk of builder #16802
base: main
Are you sure you want to change the base?
Conversation
…nto li0k/test_builder_oom
…nto li0k/test_builder_oom
…nto li0k/test_builder_oom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the result of bench_multi_builder
prior and after this PR.
metrics: Arc<CompactorMetrics>, | ||
task_progress: Option<Arc<TaskProgress>>, | ||
split_table_outputs: Vec<SplitTableOutput>, | ||
ssts: &Vec<LocalSstableInfo>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What benefit can we get if we pass &Vec<LocalSstableInfo>
instead of Vec<SplitTableOutput>
here? After this PR, we need to create a new vec via collect
every time before calling report_progress
, which seems like burden and overhead that can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, when report_progress was called, the vec was built from scratch and the LocalSstableInfo was taken from the SplitTableOutput, which is similar.
…nto li0k/test_builder_oom
…nto li0k/test_builder_oom
Test with local minio-ssd
|
…nto li0k/test_builder_oom
…nto li0k/test_builder_oom
Intuitively this change will significantly slow down SST building. |
Yes, this pr prevents sst from being generated asynchronously, which will slow down the execution pipeline, and mirco bench has demonstrated significant performance degradation (only mircobench but nexmark). But I haven't found a better way yet, what do you think? @zwang28 |
@@ -328,6 +328,7 @@ message CompactTask { | |||
JOIN_HANDLE_FAILED = 11; | |||
TRACK_SST_OBJECT_ID_FAILED = 12; | |||
NO_AVAIL_CPU_RESOURCE_CANCELED = 13; | |||
HEARTBEAT_PROGRESS_CANCELED = 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it unused ?
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
The following PR addresses the issue of out-of-memory (OOM) errors caused by inaccurate memory estimation in the compactor component. With the implementation of
StreamingUploader
andSstableBuilder
, we will now pass the buffer toStreamingUploader
and trigger an asynchronous upload when the buffer size is met. We will then harvest these join handles when compaction is completed. However, if the object store service capability is poor, a large amount of memory may be consumed by in-flight requests, leading to OOM errors in the compactor component. Asynchronous operations may bring better pipeline effects and higher throughput, but it may worsen things in the above situation. Therefore, some changes were made in this PRchange
refactor
HeartbeatProgressCacnel
to distinguishlack of visible progress
andno heartbeat
related to #15946
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.