Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Expose "_ignored" metadata field #108770

Merged

Conversation

ivancea
Copy link
Contributor

@ivancea ivancea commented May 17, 2024

Expose "_ignored" metadata field in ESQL queries

Fixes #108324

Copy link

Documentation preview:

@ivancea ivancea added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v8.15.0 >feature labels May 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @ivancea, I've created a changelog YAML for you.

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@@ -28,6 +29,8 @@ public class MetadataAttribute extends TypedAttribute {
tuple(DataTypes.KEYWORD, true),
IdFieldMapper.NAME,
tuple(DataTypes.KEYWORD, false), // actually searchable, but fielddata access on the _id field is disallowed by default
IgnoredFieldMapper.NAME,
tuple(DataTypes.KEYWORD, false),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "searchable"? I'm not sure of the implications

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure too! Maybe @astefan knows.

Copy link
Contributor Author

@ivancea ivancea May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking the _field_caps endpoint, it says it's searchable. Changed it; but still I don't know the implications, or how to test whether it should be that or not 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "searchable"? I'm not sure of the implications

If it's searchable, it'll be pushed down in a filter, if possible (WHERE _ignored == "some field"). However, this isn't always doable even if searchable and pushable (see _id), but it is in this case (and your tests cover that).

@ivancea ivancea requested a review from nik9000 May 17, 2024 15:08

count:long
0
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might just do all of these in a yaml test - it's possible for us to make _ignored work well in a csv test, but I'm not sure it's worth it. The yaml tests do all of the same bwc work as the csv tests and this feels enough like you want to roll your own index anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Named this "140_metadata". It would be nice to me if this was something like "141_metadata_ignored", but as there's no plain metadata yet, going with this, unless there are more suggestions 👀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't a clue why we use the number prefixes. We don't actually order the execution or anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

140_metadata is fine - we can rename them later when we add more. Or not if we don't. There isn't any risk involved.

refresh: true
body:
- { index: { } }
- { case: "ok", integer: 10, keyword: "ok" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Used a "case" field to reuse the index and avoid sorting flakyness/errors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the names of the test case documents? Makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the addition of the yaml test, I wonder if we should keep this one (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure there's much point in keeping the _ignore csv tests. They just end up with null all the time. I'd just run them all in yaml tests. Probably worth porting one or two more of these fun cases to the yaml tests too - like the filtering and aggregation. Just out of paranoia. It should work, but these tests are more about making sure we don't accidentally break it in nine months without realizing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the interesting tests here (where, group, aggr), and removed the csv ones

@ivancea ivancea marked this pull request as ready for review May 20, 2024 11:59
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@ivancea ivancea requested a review from a team as a code owner May 20, 2024 13:53
@ivancea ivancea requested a review from nik9000 May 20, 2024 14:42
@@ -18,7 +18,7 @@ tasks.named('javaRestTest') {

restResources {
restApi {
include '_common', 'bulk', 'get', 'indices', 'esql', 'xpack', 'enrich', 'cluster'
include '_common', 'bulk', 'get', 'indices', 'esql', 'xpack', 'enrich', 'cluster', 'capabilities'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed these were required by the capabilities support in yaml.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure there's much point in keeping the _ignore csv tests. They just end up with null all the time. I'd just run them all in yaml tests. Probably worth porting one or two more of these fun cases to the yaml tests too - like the filtering and aggregation. Just out of paranoia. It should work, but these tests are more about making sure we don't accidentally break it in nine months without realizing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

140_metadata is fine - we can rename them later when we add more. Or not if we don't. There isn't any risk involved.

refresh: true
body:
- { index: { } }
- { case: "ok", integer: 10, keyword: "ok" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the names of the test case documents? Makes sense.

@ivancea ivancea added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 21, 2024
@elasticsearchmachine elasticsearchmachine merged commit 47370a1 into elastic:main May 21, 2024
15 checks passed
@ivancea ivancea deleted the feat/esql-ignored-metadata-field branch May 21, 2024 11:32
ivancea added a commit that referenced this pull request May 21, 2024
jedrazb pushed a commit to jedrazb/elasticsearch that referenced this pull request May 21, 2024
Expose "_ignored" metadata field in ESQL queries
elasticsearchmachine pushed a commit that referenced this pull request May 22, 2024
Expose "_ignored" metadata field in ESQL queries.

This is the same code merged here:
#108770 Which got reverted
here: #108864

It was reverted because of a test failure:
https://gradle-enterprise.elastic.co/s/dpi2eib2x2fj2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Allow access to _ignored meta field
4 participants