Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SQLParseException to include query and approximate position of the error. #15826

Open
surister opened this issue Apr 9, 2024 · 2 comments

Comments

@surister
Copy link
Contributor

surister commented Apr 9, 2024

Problem Statement

Currently whenever a SQLParseException is returned to the client, we only return the line:column, this is often confusing as the user might just ignore this information as counting lines and characters is annoying.

In some scenarios, like using the admin ui, this is even a worse experience, e.g.

image

Latest versions of modern languages made special advancements in this developer experience part, for example Python, where errors are typically underlined in ^ chars, e.g.

>>> with open('file.txt' as f: f.read()
  File "<stdin>", line 1
    with open('file.txt' as f: f.read()
                         ^^
SyntaxError: invalid syntax

Possible Solutions

Include the query with the offending token underlined with ^ characters, perhaps under a flag? ie: http://localhost:4200/_sql?verbose_errors=true

I tested the modification myself in my Python parser of CrateDB.

class ExceptionErrorListener(ErrorListener):
    """
    Error listener that raises antlr4 errors as a Python exceptions.
    """

    def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
        query = offendingSymbol.source[1].strdata
        offending_token_text: str = query[offendingSymbol.start: offendingSymbol.stop + 1]

        query_lines: list = query.split('\n')
        offending_line: str = query_lines[line - 1]

        # White spaces from the beginning of the offending line to the offending text, so the '^'
        # chars are correctly placed below the offending token.
        newline_offset = offending_line.index(offending_token_text)
        newline = offending_line + '\n' + (' ' * newline_offset + '^' * (offendingSymbol.stop - offendingSymbol.start + 1))

        query_lines[line - 1] = newline

        msg = msg + "\n\n" + "\n".join(query_lines)
        raise ParsingError(f'line {line}:{column} {msg}')

These are some examples:

    raise ParsingError(f'line {line}:{column} {msg}')
cratedb_sqlparse.parser.parser.ParsingError: line 1:0 mismatched input 'SLECT' expecting {'SELECT', 'DEALLOCATE', 'FETCH', 'END', 'WITH', 'CREATE', 'ALTER', 'KILL', 'CLOSE', 'BEGIN', 'START', 'COMMIT', 'ANALYZE', 'DISCARD', 'EXPLAIN', 'SHOW', 'OPTIMIZE', 'REFRESH', 'RESTORE', 'DROP', 'INSERT', 'VALUES', 'DELETE', 'UPDATE', 'SET', 'RESET', 'COPY', 'GRANT', 'DENY', 'REVOKE', 'DECLARE'}

SLECT * FROM SYS.SHARDS
^^^^^
    raise ParsingError(f'line {line}:{column} {msg}')
cratedb_sqlparse.parser.parser.ParsingError: line 1:89 no viable alternative at input 'SELECT ended FROM "sys"."jobs_log" WHERE classification['type'] = 'INSERT' AND stmt LIKE %'

SELECT ended FROM "sys"."jobs_log" WHERE classification['type'] = 'INSERT' AND stmt LIKE %%table_schema.%table_name%'
                                                                                         ^
  File "/home/surister/PycharmProjects/cratedb-sqlparse/cratedb_sqlparse/parser/parser.py", line 61, in syntaxError
    raise ParsingError(f'line {line}:{column} {msg}')
cratedb_sqlparse.parser.parser.ParsingError: line 19:12 no viable alternative at input 'SELECT\n        repos.name,\n        repos.type,\n        repos.settings,\n        (\n          SELECT\n            ARRAY_AGG(\n              {\n                "name" = snaps.name,\n                "state" = snaps.state,\n                "concrete_indices" = snaps.concrete_indices,\n                "failures" = snaps.failures,\n                "started" = snaps.started,\n                "finished" = snaps.finished,\n                "table_partitions" = snaps.table_partitions,\n                "tables" = snaps.tables,\n                "version" = snaps.version\n              \n            )'

      SELECT
        repos.name,
        repos.type,
        repos.settings,
        (
          SELECT
            ARRAY_AGG(
              {
                "name" = snaps.name,
                "state" = snaps.state,
                "concrete_indices" = snaps.concrete_indices,
                "failures" = snaps.failures,
                "started" = snaps.started,
                "finished" = snaps.finished,
                "table_partitions" = snaps.table_partitions,
                "tables" = snaps.tables,
                "version" = snaps.version
              
            )
            ^
          FROM
            sys.snapshots snaps
          WHERE repos.name = snaps.repository
        ) as snapshots
      FROM
        sys.repositories repos

I think having these literal pointers to where the error might be is a great help.

The antlr4 compile targets are extremely similar between languages, I would say it would be fairly easy to just port this to our Java implementation of the error listener or serve as inspiration for a solution.

Considered Alternatives

No response

@mfussenegger
Copy link
Member

I tried something like this a couple years back but discarded it, because I felt the long error messages are actually reducing the readability.

Putting it behind a flag may be a an option, but also increases product complexity.

Throwing in some alternatives:

  • Standalone Linter or integrated linter in the admin-ui & crash to highlight syntax errors already in the query before submitting them to the server.
  • A CrateSQL language server for editor integration

@surister
Copy link
Contributor Author

surister commented Apr 15, 2024

I'm totally fine with adding these improvements to the js/python parsers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants