Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spill is not work in Joins #22006

Open
AkiRaT0117 opened this issue May 17, 2024 · 1 comment
Open

spill is not work in Joins #22006

AkiRaT0117 opened this issue May 17, 2024 · 1 comment
Assignees

Comments

@AkiRaT0117
Copy link

AkiRaT0117 commented May 17, 2024

I referred to the official Trino document

my configuration is as follows:

spill-enabled=true
spiller-spill-path=/tmp/spill
query.max-memory=800MB
query.max-memory-per-node=100MB

When I use Order by or Aggregations operations. I see that the spill is activated. as shown in the following image.
image

When I use the join operator, it does not successfully spill as before, but instead returns the following error.

io.trino.ExceededMemoryLimitException: Query exceeded per-node memory limit of 100MB [Allocated: 99.37MB, Delta: 956.92kB, Top Consumers: {HashBuilderOperator=98.16MB, LazyOutputBuffer=1.20MB, PagePartitioner=3.05kB}] at io.trino.ExceededMemoryLimitException.exceededLocalUserMemoryLimit(ExceededMemoryLimitException.java:40) at io.trino.memory.QueryContext.enforceUserMemoryLimit(QueryContext.java:330) at io.trino.memory.QueryContext.updateUserMemory(QueryContext.java:165) at io.trino.memory.QueryContext.lambda$addTaskContext$0(QueryContext.java:250) at io.trino.memory.QueryContext$QueryMemoryReservationHandler.reserveMemory(QueryContext.java:311) at io.trino.memory.context.RootAggregatedMemoryContext.updateBytes(RootAggregatedMemoryContext.java:37) at io.trino.memory.context.ChildAggregatedMemoryContext.updateBytes(ChildAggregatedMemoryContext.java:38)

My sql like this:
select * from <table1> t1 inner join <table2> t2 on t1.column1 = t2.column1 and t1.column2 = t2.column2;

Stage Performance:
image

I wonder know what caused the join operation unable to spill. Is there any parameter or method to resolve.

@sug-ghosh
Copy link

In Trino, you can enable join spilling by setting the following session properties:

SET SESSION join_distribution_type = 'PARTITIONED';

join-distribution-type = 'BROADCAST' or 'PARTITIONED'.

In a broadcast join, one of the tables (usually the smaller one) is broadcast to all the nodes that hold parts of the larger table. Each node then performs a local join of the broadcast table with the part of the larger table it has. This operation is performed entirely in memory.

Spilling is a mechanism that comes into play when there's not enough memory to hold intermediate results of a computation. In the case of a broadcast join, the entire smaller table must fit into memory. If it doesn't, Trino doesn't perform the join and throws an out of memory exception.

The reason for this is efficiency. The idea behind a broadcast join is to save on the data shuffling across the network. If the smaller table has to be spilled to disk, it defeats the purpose of the broadcast join as disk I/O operations are generally slower than in-memory operations.

@sug-ghosh sug-ghosh self-assigned this May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants