pgwire: replace deadline and connection polling with separate goroutine #25585

nvanbenschoten · 2018-05-16T18:57:32Z

Rather than setting a deadline on the read and periodically polling whether the session is cancelled, we could fire up another goroutine which blocks on ctx.Done() and then closes the connection. I recall there was some reason this approach wasn't taken when this code was written, but taking another look it isn't clear to me why it wouldn't work. This would be a bit more involved of a change, so I'm fine with the current PR as a stop-gap solution.

The reason we use a deadline approach in the first place is because:

even when the context is canceled, you still might need to use the connection to send back an error (e.g. sending an AdminShutdownError when draining).

However, this can be worked around by half closing the connection. That is, we can close the connection for reading but still write to it. See TCPConn.CloseRead.

Jira issue: CRDB-5704

knz · 2018-07-09T15:09:28Z

I think this was done @andreimatei right?

nvanbenschoten · 2018-07-09T15:13:23Z

No this was not. We still poll.

Zyqsempai · 2019-08-15T14:08:36Z

Is this one still actual?
If yes, i will be glad to take care of it, but i need some more information(since it's my first issue) about where to start?

asubiotto · 2019-08-15T15:02:27Z

@Zyqsempai thanks for your interest! Yes, this issue is still available. The current landscape is that we create a readTimeoutConn here:

cockroach/pkg/sql/pgwire/conn.go

Line 237 in 1c7351e

c.conn = newReadTimeoutConn(c.conn, func() error {

in order to have connections listen for context cancellation when the server is about to shut down (canceled in this method):

cockroach/pkg/sql/pgwire/server.go

Line 332 in 1c7351e

    
           func (s *Server) drainImpl(drainWait time.Duration, cancelWait time.Duration) error {

. Since Reads block, we periodically wake the connection up and have it check its context, which results in higher than necessary CPU usage. The proposal is to call CloseRead on this connection (as a tcp connection) in drainImpl so that this causes the Read to unblock when necessary. The reason a full Close cannot work, is that the server must still send a message to the client as to why the connection was closed (see AdminShutdownErr). Note that CloseRead only exists on the TCPConn implementation of net.Conn so we'll have to cast net.Conns as TCPConns (they can also sometimes be tls.Conns, so we'll have to get the underlying conn in that case), I think doing this without introducing new error cases will probably be the hardest part.

Zyqsempai · 2019-08-19T15:16:22Z

@asubiotto Hi, thanks for explanation.
But i have few questions.
What is the best place to make CloseRead?

cockroach/pkg/sql/pgwire/conn.go

Line 1879 in 1c7351e

func (c *readTimeoutConn) Read(b []byte) (int, error) {

we have that Read function, which check the exit conditions, does it make sense to rename the type and change the behavior of that function?

asubiotto · 2019-08-24T10:03:01Z

The CloseRead should be done by the server in drainImpl on the connections where we're currently canceling them (nothing should necessarily change in drainImpl, but the context.CancelFunc will probably now be a custom function that does this (and probably also send DrainRequest over the stmtBuf. The readTimeoutConn exists only for the purposes of checking exit conditions. Since we would now preempt connections, there is no need for the readTimeoutConn wrapper any more so we can probably just remove it.
I'm realizing that this issue is a bit more involved than it first seems, so I'm removing the "good-first-issue" label since it's probably not the best way to start contributing to cockroach, but please don't let this deter you from working on this!

Zyqsempai · 2019-08-28T16:01:30Z

@asubiotto Thanks for pointing me, I have strong willing to start contributing to cockroach, so no worries;)

…sors Previously, we were treating request/response processors as forwarder methods, and the original intention was to perform a connection migration inline with the forwarding. However, this has proved to cause confusions and complications. The new connection migration design uses a different approach by performing the connection migration out-of-band with the forwarding processors. For this to work, we would need to be able to suspend and resume those processors. This commit implements support for that. Additionally, we no longer wrap clientConn with a custom readTimeoutConn wrapper as that seems to be problematic with idle connections. There's a similar discussion here: cockroachdb#25585. Due to that, context cancellations from the parent no longer close the connection if we are blocked on IO. We could theoretically spin up a new goroutine in the forwarder to check for this, and close the connections accordingly, but this case is rare, so we'll let the caller handle this. Release justification: sqlproxy-only change, and only used within CockroachCloud. Release note: None

github-actions · 2023-09-21T11:09:26Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

rafiss · 2024-05-17T14:17:36Z

Right now, the early exit condition is:

	rtc.checkExitConds = func() error {
		// If the context was canceled, it's time to stop reading. Either a
		// higher-level server or the command processor have canceled us.
		if err := ctx.Err(); err != nil {
			return err
		}
		// If the server is draining, we'll let the processor know by pushing a
		// DrainRequest. This will make the processor quit whenever it finds a good
		// time.
		if !sentDrainSignal && s.IsDraining() {
			_ /* err */ = c.stmtBuf.Push(ctx, sql.DrainRequest{})
			sentDrainSignal = true
		}
		return nil
	}

That means if we switch to using a separate goroutine, then we need to check ctx.Done() as well as check if the server is draining. I think we need to create another channel inside of pgwire.Server that can be used for that.

rafiss · 2024-05-17T16:24:37Z

This issue is pretty old, so I believe there are a few out of date comments from the initial thread.

even when the context is canceled, you still might need to use the connection to send back an error (e.g. sending an AdminShutdownError when draining).

I don't think this is the case anymore. The only time AdminShutdownError is sent is when draining starts. If the context is cancelled, the current code will just give up on reading and start returning errors:

cockroach/pkg/sql/pgwire/server.go

Lines 865 to 870 in db5d44d

    
           rtc.checkExitConds = func() error { 
        
           	// If the context was canceled, it's time to stop reading. Either a 
        
           	// higher-level server or the command processor have canceled us. 
        
           	if err := ctx.Err(); err != nil { 
        
           		return err 
        
           	}

Even so, I guess using CloseRead won't hurt.

The proposal is to call CloseRead on this connection (as a tcp connection) in drainImpl so that this causes the Read to unblock when necessary.

I don't think it will be possible to do this in drainImpl, since that function doesn't have a reference to the connection. (Nor should it IMO, since that would mean it needs to hold a reference to all the connections.) So instead, we should do this in a separate goroutine right before calling (pgwire.Server).serveImpl.

124373: pgwire: remove readTimeoutConn in favor of a channel r=yuzefovich a=rafiss Rather than using a connection that polls for the context being done every second, we now spin up an additional goroutine that blocks until the connection context is done, or the drain signal was received. I wrote a simple benchmark of the idle connection loop to generate CPU profiles. With the old readTimeoutConn: <img width="861" alt="image" src="https://github.com/cockroachdb/cockroach/assets/1320573/9e0c88fc-eda7-4a51-8060-280b0d95a7a7"> With the new goroutine approach: <img width="567" alt="image" src="https://github.com/cockroachdb/cockroach/assets/1320573/535ea962-ec03-46f7-a63b-382c65553553"> There's definitely less noise and overhead. fixes #25585 Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>

nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-pgwire pgwire protocol issues. labels May 16, 2018

nvanbenschoten added this to the Later milestone May 16, 2018

nvanbenschoten assigned asubiotto May 16, 2018

knz added this to Triage in (DEPRECATED) SQL Front-end, Lang & Semantics via automation Jul 9, 2018

knz moved this from Triage to Backlog in (DEPRECATED) SQL Front-end, Lang & Semantics Jul 10, 2018

petermattis removed this from the Later milestone Oct 5, 2018

jordanlewis added this to Triage in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 23, 2019

jordanlewis moved this from Triage to Lower priority backlog in [DEPRECATED] Old SQLExec board. Don't move stuff here May 7, 2019

asubiotto added the E-starter Might be suitable for a starter project for new employees or team members. label May 20, 2019

asubiotto added the good first issue label Jul 30, 2019

asubiotto removed the good first issue label Aug 24, 2019

asubiotto moved this from Lower priority backlog to [TENT] Conn executor in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 3, 2020

asubiotto moved this from [TENT] Conn executor to pgwire in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 3, 2020

asubiotto removed their assignment Apr 3, 2020

asubiotto removed the E-starter Might be suitable for a starter project for new employees or team members. label Apr 21, 2020

asubiotto moved this from pgwire to [TENT] SQL Exec in [DEPRECATED] Old SQLExec board. Don't move stuff here Apr 21, 2020

yuzefovich added this to Triage in BACKLOG, NO NEW ISSUES: SQL Execution via automation Oct 25, 2020

yuzefovich removed this from [TENT] SQL Exec in [DEPRECATED] Old SQLExec board. Don't move stuff here Oct 25, 2020

yuzefovich moved this from Triage to [GENERAL BACKLOG] Enhancements/Features/Investigations in BACKLOG, NO NEW ISSUES: SQL Execution Oct 25, 2020

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

github-actions bot added the no-issue-activity label Sep 21, 2023

github-actions bot added the X-stale label Oct 2, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 2, 2023

exalate-issue-sync bot closed this as completed Oct 2, 2023

yuzefovich removed X-stale no-issue-activity labels May 2, 2024

yuzefovich reopened this May 2, 2024

rafiss mentioned this issue May 17, 2024

pgwire: remove readTimeoutConn in favor of a channel #124373

Merged

exalate-issue-sync bot assigned rafiss May 20, 2024

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-queries SQL Queries Team labels May 20, 2024

blathers-crl bot added this to Triage in SQL Foundations May 20, 2024

craig bot closed this as completed in fb4ee19 May 20, 2024

SQL Foundations automation moved this from Triage to Done May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pgwire: replace deadline and connection polling with separate goroutine #25585

pgwire: replace deadline and connection polling with separate goroutine #25585

nvanbenschoten commented May 16, 2018 •

edited by cockroach-jira-scripts

knz commented Jul 9, 2018

nvanbenschoten commented Jul 9, 2018

Zyqsempai commented Aug 15, 2019

asubiotto commented Aug 15, 2019

Zyqsempai commented Aug 19, 2019

asubiotto commented Aug 24, 2019

Zyqsempai commented Aug 28, 2019

github-actions bot commented Sep 21, 2023

rafiss commented May 17, 2024

rafiss commented May 17, 2024

pgwire: replace deadline and connection polling with separate goroutine #25585

pgwire: replace deadline and connection polling with separate goroutine #25585

Comments

nvanbenschoten commented May 16, 2018 • edited by cockroach-jira-scripts

knz commented Jul 9, 2018

nvanbenschoten commented Jul 9, 2018

Zyqsempai commented Aug 15, 2019

asubiotto commented Aug 15, 2019

Zyqsempai commented Aug 19, 2019

asubiotto commented Aug 24, 2019

Zyqsempai commented Aug 28, 2019

github-actions bot commented Sep 21, 2023

rafiss commented May 17, 2024

rafiss commented May 17, 2024

nvanbenschoten commented May 16, 2018 •

edited by cockroach-jira-scripts