Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow inspection/manipulation of record when all fields are null #725

Open
jordanrfrazier opened this issue Aug 30, 2023 · 5 comments
Assignees
Labels

Comments

@jordanrfrazier
Copy link
Collaborator

jordanrfrazier commented Aug 30, 2023

Summary

This behavior was first noticed in the the test described in #682, where it was expected that using coalesce would pick the second record, but it always gets the first.

When a record is created during execution, it will always be non-null, even if all fields are null. This is expected behavior (links?). In the below query, the extend creates a new record and thus the coalesce (which comes from the _else) always chooses that first record.

Feature Request

This use case may be summarized as

  1. I create a record from fields in Table A that I want
  2. I create a similar record from fields in Table B
  3. If all fields in record A are null, I want to use record B hereafter.

See Ben's latest comment for suggestions on how we may want to implement this existing behavior.

@jordanrfrazier
Copy link
Collaborator Author

This produces the correct results: (Commented out the extra extend user)

threads = content.filter(content.col("thread_ts").is_not_null())
non_threads = content.filter(content.col("thread_ts").is_null())

record=kd.record({"threads": threads, "non_threads": non_threads})

# non_threads = non_threads.extend({"user": non_threads.col("user")})

joined = kd.record({"threads": threads, "non_threads": non_threads})

thread=joined.col("threads")
msg=joined.col("non_threads")
result=joined.extend({"asdf": thread.else_(msg)})

result.explain().render(view=True)
result.preview()

Results in
Screen Shot 2023-10-03 at 11 05 14 AM

@jordanrfrazier
Copy link
Collaborator Author

jordanrfrazier commented Oct 3, 2023

Whereas with the extend user, it is creating a new Record, which by definition is never non-null and instead creates an empty record with null fields, as seen in the result.

Specifically in operation 3, expression 15, we create a new record that will never be null. Thus in the coalesce in 16, we'll always choose that record, even if all the fields are null.

Screen Shot 2023-10-03 at 11 06 30 AM

@bjchambers
Copy link
Collaborator

If I'm reading that correctly this is (as speculated on the PR) not a bug, but rather due to the fact that a record is never null (it's at-most a non-null record containing null fields).

It seems like changing the behavior to treat a record of all null fields would be confusing, and not really fit the definition of null. We could discuss doing it (although we'd likely also want to treat empty lists, empty strings, etc. as null at that point).

The other option would be to change the behavior of coalesce or have a variant for coalesce_fields which took two records and coalesced the fields. This seems like it could be useful in some cases, but would be confusing / prevent other uses (eg., if you really wanted to choose between a record which was null or not, but wanted to preserve the null values in it).

@jordanrfrazier
Copy link
Collaborator Author

It is not necessarily a bug in the sense that expressions are executing as we have defined the current behavior, yes. I'm not sure what I would expect here, but I would consider changing records to be null if all fields are null, or offer a similar method that operates on records to determine if all fields are null.

I would understand if user's generally follow the pattern below:

  1. I'm creating a record from this table with stuff I want
  2. I'm creating a record from this other table with similar stuff.
  3. If the first record is all nulls, I want to use my second record's data.

Of course it can be worked around by accessing fields, but I think just the presence of a method like records.all_null() in the documentation would serve as a warning that nullability on created records is different than other types.

@bjchambers
Copy link
Collaborator

We almost centaily should not change records of all null fields to be null. This would be inconsistent with Python, SQL extensions supporting records, and a variety of other things.

https://trinket.io/python/f7ad7f9864

Changing it to be all_null wouldn't really help -- this is talking about else and coalesce. As noted in the previous update, I think what we may actually want is a coalesce_fields which applies coalesce on each field separately.

Note this is actually different from "coalesce but treat all null as null". Instead, this is "coalesce, but applied to each field in the record". This would also be useful in cases where you had parts of a record filled in on one side and parts not filled in on the other.

Note that with that behavior, if the first record is "all nulls" then the fields from the second would be taken, so it would effectively be the same as "coalesce but treat all null as null", but it would also support other use patterns (and wouldn't lead to any "null means different things in different places".

@epinzur epinzur removed the bug Something isn't working label Oct 9, 2023
@jordanrfrazier jordanrfrazier changed the title bug: else is creating empty records in certain cases feat: Allow checking nullability and/or manipulation of record when all fields are null Oct 9, 2023
@jordanrfrazier jordanrfrazier changed the title feat: Allow checking nullability and/or manipulation of record when all fields are null feat: Allow inspection/manipulation of record when all fields are null Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants