Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgwire: replace deadline and connection polling with separate goroutine #25585

Closed
nvanbenschoten opened this issue May 16, 2018 · 10 comments · Fixed by #124373
Closed

pgwire: replace deadline and connection polling with separate goroutine #25585

nvanbenschoten opened this issue May 16, 2018 · 10 comments · Fixed by #124373
Assignees
Labels
A-sql-pgwire pgwire protocol issues. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented May 16, 2018

From #25328 (comment):

Rather than setting a deadline on the read and periodically polling whether the session is cancelled, we could fire up another goroutine which blocks on ctx.Done() and then closes the connection. I recall there was some reason this approach wasn't taken when this code was written, but taking another look it isn't clear to me why it wouldn't work. This would be a bit more involved of a change, so I'm fine with the current PR as a stop-gap solution.

The reason we use a deadline approach in the first place is because:

even when the context is canceled, you still might need to use the connection to send back an error (e.g. sending an AdminShutdownError when draining).

However, this can be worked around by half closing the connection. That is, we can close the connection for reading but still write to it. See TCPConn.CloseRead.

Jira issue: CRDB-5704

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-pgwire pgwire protocol issues. labels May 16, 2018
@nvanbenschoten nvanbenschoten added this to the Later milestone May 16, 2018
@knz knz added this to Triage in (DEPRECATED) SQL Front-end, Lang & Semantics via automation Jul 9, 2018
@knz
Copy link
Contributor

knz commented Jul 9, 2018

I think this was done @andreimatei right?

@nvanbenschoten
Copy link
Member Author

No this was not. We still poll.

@knz knz moved this from Triage to Backlog in (DEPRECATED) SQL Front-end, Lang & Semantics Jul 10, 2018
@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@jordanlewis jordanlewis moved this from Triage to Lower priority backlog in [DEPRECATED] Old SQLExec board. Don't move stuff here May 7, 2019
@asubiotto asubiotto added the E-starter Might be suitable for a starter project for new employees or team members. label May 20, 2019
@Zyqsempai
Copy link

Is this one still actual?
If yes, i will be glad to take care of it, but i need some more information(since it's my first issue) about where to start?

@asubiotto
Copy link
Contributor

@Zyqsempai thanks for your interest! Yes, this issue is still available. The current landscape is that we create a readTimeoutConn here:

c.conn = newReadTimeoutConn(c.conn, func() error {

in order to have connections listen for context cancellation when the server is about to shut down (canceled in this method):
func (s *Server) drainImpl(drainWait time.Duration, cancelWait time.Duration) error {
. Since Reads block, we periodically wake the connection up and have it check its context, which results in higher than necessary CPU usage. The proposal is to call CloseRead on this connection (as a tcp connection) in drainImpl so that this causes the Read to unblock when necessary. The reason a full Close cannot work, is that the server must still send a message to the client as to why the connection was closed (see AdminShutdownErr). Note that CloseRead only exists on the TCPConn implementation of net.Conn so we'll have to cast net.Conns as TCPConns (they can also sometimes be tls.Conns, so we'll have to get the underlying conn in that case), I think doing this without introducing new error cases will probably be the hardest part.

@Zyqsempai
Copy link

@asubiotto Hi, thanks for explanation.
But i have few questions.
What is the best place to make CloseRead?

func (c *readTimeoutConn) Read(b []byte) (int, error) {

we have that Read function, which check the exit conditions, does it make sense to rename the type and change the behavior of that function?

@asubiotto
Copy link
Contributor

The CloseRead should be done by the server in drainImpl on the connections where we're currently canceling them (nothing should necessarily change in drainImpl, but the context.CancelFunc will probably now be a custom function that does this (and probably also send DrainRequest over the stmtBuf. The readTimeoutConn exists only for the purposes of checking exit conditions. Since we would now preempt connections, there is no need for the readTimeoutConn wrapper any more so we can probably just remove it.
I'm realizing that this issue is a bit more involved than it first seems, so I'm removing the "good-first-issue" label since it's probably not the best way to start contributing to cockroach, but please don't let this deter you from working on this!

@Zyqsempai
Copy link

@asubiotto Thanks for pointing me, I have strong willing to start contributing to cockroach, so no worries;)

@asubiotto asubiotto moved this from Lower priority backlog to [TENT] Conn executor in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 3, 2020
@asubiotto asubiotto moved this from [TENT] Conn executor to pgwire in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 3, 2020
@asubiotto asubiotto removed their assignment Apr 3, 2020
@asubiotto asubiotto removed the E-starter Might be suitable for a starter project for new employees or team members. label Apr 21, 2020
@yuzefovich yuzefovich added this to Triage in BACKLOG, NO NEW ISSUES: SQL Execution via automation Oct 25, 2020
@yuzefovich yuzefovich moved this from Triage to [GENERAL BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Oct 25, 2020
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this issue Mar 8, 2022
…sors

Previously, we were treating request/response processors as forwarder methods,
and the original intention was to perform a connection migration inline with
the forwarding. However, this has proved to cause confusions and complications.
The new connection migration design uses a different approach by performing
the connection migration out-of-band with the forwarding processors. For this
to work, we would need to be able to suspend and resume those processors.
This commit implements support for that.

Additionally, we no longer wrap clientConn with a custom readTimeoutConn
wrapper as that seems to be problematic with idle connections. There's a
similar discussion here: cockroachdb#25585. Due to that, context cancellations from the
parent no longer close the connection if we are blocked on IO. We could
theoretically spin up a new goroutine in the forwarder to check for this, and
close the connections accordingly, but this case is rare, so we'll let the
caller handle this.

Release justification: sqlproxy-only change, and only used within
CockroachCloud.

Release note: None
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this issue Mar 10, 2022
…sors

Previously, we were treating request/response processors as forwarder methods,
and the original intention was to perform a connection migration inline with
the forwarding. However, this has proved to cause confusions and complications.
The new connection migration design uses a different approach by performing
the connection migration out-of-band with the forwarding processors. For this
to work, we would need to be able to suspend and resume those processors.
This commit implements support for that.

Additionally, we no longer wrap clientConn with a custom readTimeoutConn
wrapper as that seems to be problematic with idle connections. There's a
similar discussion here: cockroachdb#25585. Due to that, context cancellations from the
parent no longer close the connection if we are blocked on IO. We could
theoretically spin up a new goroutine in the forwarder to check for this, and
close the connections accordingly, but this case is rare, so we'll let the
caller handle this.

Release justification: sqlproxy-only change, and only used within
CockroachCloud.

Release note: None
jaylim-crl added a commit to jaylim-crl/cockroach that referenced this issue Mar 10, 2022
…sors

Previously, we were treating request/response processors as forwarder methods,
and the original intention was to perform a connection migration inline with
the forwarding. However, this has proved to cause confusions and complications.
The new connection migration design uses a different approach by performing
the connection migration out-of-band with the forwarding processors. For this
to work, we would need to be able to suspend and resume those processors.
This commit implements support for that.

Additionally, we no longer wrap clientConn with a custom readTimeoutConn
wrapper as that seems to be problematic with idle connections. There's a
similar discussion here: cockroachdb#25585. Due to that, context cancellations from the
parent no longer close the connection if we are blocked on IO. We could
theoretically spin up a new goroutine in the forwarder to check for this, and
close the connections accordingly, but this case is rare, so we'll let the
caller handle this.

Release justification: sqlproxy-only change, and only used within
CockroachCloud.

Release note: None
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 2, 2023
@yuzefovich yuzefovich reopened this May 2, 2024
@rafiss
Copy link
Collaborator

rafiss commented May 17, 2024

Right now, the early exit condition is:

	rtc.checkExitConds = func() error {
		// If the context was canceled, it's time to stop reading. Either a
		// higher-level server or the command processor have canceled us.
		if err := ctx.Err(); err != nil {
			return err
		}
		// If the server is draining, we'll let the processor know by pushing a
		// DrainRequest. This will make the processor quit whenever it finds a good
		// time.
		if !sentDrainSignal && s.IsDraining() {
			_ /* err */ = c.stmtBuf.Push(ctx, sql.DrainRequest{})
			sentDrainSignal = true
		}
		return nil
	}

That means if we switch to using a separate goroutine, then we need to check ctx.Done() as well as check if the server is draining. I think we need to create another channel inside of pgwire.Server that can be used for that.

@rafiss
Copy link
Collaborator

rafiss commented May 17, 2024

This issue is pretty old, so I believe there are a few out of date comments from the initial thread.

even when the context is canceled, you still might need to use the connection to send back an error (e.g. sending an AdminShutdownError when draining).

I don't think this is the case anymore. The only time AdminShutdownError is sent is when draining starts. If the context is cancelled, the current code will just give up on reading and start returning errors:

rtc.checkExitConds = func() error {
// If the context was canceled, it's time to stop reading. Either a
// higher-level server or the command processor have canceled us.
if err := ctx.Err(); err != nil {
return err
}

Even so, I guess using CloseRead won't hurt.

The proposal is to call CloseRead on this connection (as a tcp connection) in drainImpl so that this causes the Read to unblock when necessary.

I don't think it will be possible to do this in drainImpl, since that function doesn't have a reference to the connection. (Nor should it IMO, since that would mean it needs to hold a reference to all the connections.) So instead, we should do this in a separate goroutine right before calling (pgwire.Server).serveImpl.

@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-queries SQL Queries Team labels May 20, 2024
@blathers-crl blathers-crl bot added this to Triage in SQL Foundations May 20, 2024
craig bot pushed a commit that referenced this issue May 20, 2024
124373: pgwire: remove readTimeoutConn in favor of a channel r=yuzefovich a=rafiss

Rather than using a connection that polls for the context being done every second, we now spin up an additional goroutine that blocks until the connection context is done, or the drain signal was received.

I wrote a simple benchmark of the idle connection loop to generate CPU profiles.

With the old readTimeoutConn:
<img width="861" alt="image" src="https://github.com/cockroachdb/cockroach/assets/1320573/9e0c88fc-eda7-4a51-8060-280b0d95a7a7">

With the new goroutine approach:
<img width="567" alt="image" src="https://github.com/cockroachdb/cockroach/assets/1320573/535ea962-ec03-46f7-a63b-382c65553553">

There's definitely less noise and overhead.

fixes #25585
Release note: None

Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
@craig craig bot closed this as completed in fb4ee19 May 20, 2024
SQL Foundations automation moved this from Triage to Done May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgwire pgwire protocol issues. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
BACKLOG, NO NEW ISSUES: SQL Execution
[GENERAL BACKLOG] Enhancements/Featur...
Status: Done
Development

Successfully merging a pull request may close this issue.

8 participants