-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: TestDatabasesTablesV2 failed #124395
Comments
This failed last week here, which I chalked up to a flake. Given the newest failures, I'm going to give this a closer look. |
I was able to reproduce locally under From what I can tell, the test cluster/server is receiving signals to shutdown before the test completes. The specific call that's failing alone does not seem to be problematic. It seems that this query is failing to finish executing. When we attempt to read through the result set,
Looking at the timestamps in the logs, quite a bit of time passes from the time the test cluster is initialized, until the TableDetails endpoint is invoked. However, the amount of time spent within the TableDetails endpoint before the test cluster begins shutting down is only ~10 seconds, which should not trigger the request timeout.
Seems like the endpoint itself is responding in an acceptable amount of time. However, the amount of time it takes for the test to reach the point where it makes a request seems to be too long. Plenty of things happen between the test's first interaction with the test cluster and the timeout during the specific call to |
I added some timers to various parts of the test.
...and the cluster log indicating that the cluster has received a signal to shut down:
22 seconds for creation of DBs/tables/users/roles required by the test feels a bit long, but is that the true problem? From the first moment the test begins to interact with the test cluster until the test cluster receives a signal to shut down, only around ~1 minute passes. I'm going to look into the default timeouts and see if this is expected, and/or if we need to increase the test timeout. |
@abarganier I think it's significant that the test fails right after a test that checks for a 404 Not Found response. Is it possible that the helper we use to retrieve table data leaks resources in the error case? That could cause this odd error result. This is the handler which looks innocuous: cockroach/pkg/server/api_v2_sql_schema.go Lines 477 to 497 in c5e3cf4
But the |
Previously, we'd use a 10s timeout on HTTP clients for all test runs. This change modifies this to be `40s` under race. Resolves cockroachdb#124395 Resolves cockroachdb#122892 Release note: None <pkg>: <short description - lowercase, no final period> <what was there before: Previously, ...> <why it needed to change: This was inadequate because ...> <what you did about it: To address this, this patch ...>
Previously, we'd use a 10s timeout on HTTP clients for all test runs. This change modifies this to be `40s` under race. Resolves cockroachdb#124395 Resolves cockroachdb#122892 Release note: None
124579: server: increase test http client timeouts under race r=kyle-a-wong a=dhartunian Previously, we'd use a 10s timeout on HTTP clients for all test runs. This change modifies this to be `30s` under race. Resolves #124395 Resolves #122892 Release note: None Co-authored-by: David Hartunian <davidh@cockroachlabs.com>
server.TestDatabasesTablesV2 failed on master @ 93ad913106b6f0f6ec98bc2cfa788ff6d8085bd4:
Parameters:
attempt=1
race=true
run=2
shard=5
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-38868
The text was updated successfully, but these errors were encountered: