-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: correct error when logging timed out and cancelled queries #124371
base: master
Are you sure you want to change the base?
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
298f0dd
to
58d99cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm continuing to think through this but publishing a few initial thoughts.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kyle-a-wong and @xinhaoz)
pkg/sql/conn_executor_exec.go
line 1869 at r3 (raw file):
bulkJobId = res.GetBulkJobId() } planner.maybeLogStatement(
can you explain how removing this maybeLogStatement
instance doesn't reduce logging in certain scenarios? I assume it's here to capture scenarios that the one in line 853 does not.
pkg/sql/conn_executor_exec.go
line 853 at r4 (raw file):
} p.maybeLogStatement(
Maybe it will help to discuss what's going on in person, but is it possible and perhaps less disruptive to move the location of maybeLogStatement
instead of moving the error rewriting logic into this defer?
pkg/sql/conn_executor_exec.go
line 983 at r4 (raw file):
} dispatchToExecEngine = true
why is this here?
58d99cd
to
db0b2af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kyle-a-wong)
pkg/sql/conn_executor_exec.go
line 1869 at r3 (raw file):
Previously, dhartunian (David Hartunian) wrote…
can you explain how removing this
maybeLogStatement
instance doesn't reduce logging in certain scenarios? I assume it's here to capture scenarios that the one in line 853 does not.
Hmm I'll try that. I thought it might be important to preserve where logging was occuring in the defer / cleanup process but maybe moving it further down in the defer chain might just work 🤔 It'll be cleaner if we can keep the error setting where it currently is.
pkg/sql/conn_executor_exec.go
line 983 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
why is this here?
Added a comment for this at declaration. The previous behaviour here was to log for the admin audit log if we did not dispatch to the exec engine. I'm trying to preserve what seems like was 'special case' behaviour for logging in that case, where we needed to init the plan in a particular way.
db0b2af
to
fcc9e83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kyle-a-wong)
pkg/sql/conn_executor_exec.go
line 853 at r4 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Maybe it will help to discuss what's going on in person, but is it possible and perhaps less disruptive to move the location of
maybeLogStatement
instead of moving the error rewriting logic into this defer?
on the main issue, i had a similar instinct: #122722 (comment)
i feel like it would be less disruptive, which is especially relevant here since we'd want this fix backported. (so it should be the smallest diff we can achieve.) i'm happy to discuss the challenges with that here or in person.
Previously, rafiss (Rafi Shamim) wrote…
@rafiss Ah yeah I ended up doing both here (moving the logging to execStmt and also moving the error setting logic). Logging already existed in execStmtInOpenState for statements that don't get dispatched to the execution engine. I was not sure if the order of operations in execStmtInOpenState was significant w.r.t logging so I kept logging where it currently was and moved the error writing there. Perhaps moving the logging to happen after query cancellation is innocuous though, and then we can keep the error writing logic as is. I'm going to try that and see if anything breaks :) |
Previously, xinhaoz (Xin Hao Zhang) wrote…
There are a few things that make moving logging within execStmtInOpenState to occur after the error setting more difficult than the current solution of moving the error setting to the logging. Currently the order of operations is (omitting ops irrelevant to this situation): log, finish instrumentation helper (which finishes the tracing span used for some logging operations), set the query cancellation and timeout errors + cancel the query (these last 2 things happen in the same deferred function). The order above needs to be preserved as far as I've been able to tell since logging relies on the I.H. not being cleaned up, and parts of the instrumentation helper cleanup rely on the query's context still being active. It seems to me like it's easier to move the error setting logic, but I might be missing some context - @rafiss is there a specific reason it needs to occur right before query cancellation? I also noticed setting a query timeout error is not in the processCleanupFn for pausible portals - what's important there in terms of the order of operations that need to be preserved for pausible portals? I've currently moved it to be within the processCleanupFn for logging but can mirror the old behaviour with some additional comments if given the context. |
Setting a query timeout or cancellation error occurs in execStmtInOpenState, while logging occurs in dispatchToExecutionEngine (called on by the former). This previously resulted in the incorrect pg error code being surfaced when logging cancelled queries. This commit moves statement logging from dispatchToExecutionEngine to execStmtInOpenState so that we have the correct error information at the time of logging. Preserving pausible portal logging behaviour: There are some nuances in logging when we have a pausable portal that must be preserved. For pausible portals, logging was triggered as a cleanup action in `execStmtInOpenState` by appending it to a list of cleanup fns in `dispatch`. This is now accomplished by performing logging in the `processCleanupFunc` helper within `execStmtInOpenState`. Epic: none Fixes: cockroachdb#122722 Release note (bug fix): Fixes a bug in logging where error code was misreported for cancelled queries. Affected channels: SQL_PERF (slow query logs), SQL_EXEC (sql exec logs).
fcc9e83
to
f2e86d0
Compare
Note to reviewers: only the latest commit should be reviewed in this PR.
Setting a query timeout or cancellation error occurs in execStmtInOpenState,
while logging occurs in dispatchToExecutionEngine (called on by the former).
This previously resulted in the incorrect pg error code being surfaced
when logging cancelled queries. This commit moves statement logging
from dispatchToExecutionEngine to execStmtInOpenState so that we have the
correct error information at the time of logging.
Preserving pausible portal logging behaviour:
There are some nuances in logging when we have a pausable portal that
must be preserved. For pausible portals, logging was triggered as a cleanup
action in
execStmtInOpenState
by appending it to a list of cleanup fnsin
dispatch
. This is now accomplished by performing logging in theprocessCleanupFunc
helper withinexecStmtInOpenState
.Epic: none
Fixes: #122722
Release note (bug fix): Fixes a bug in logging where error code was misreported
for cancelled queries. Affected channels: SQL_PERF (slow query logs), SQL_EXEC
(sql exec logs).