Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added test of else method using differently ordered columns #682

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

epinzur
Copy link
Collaborator

@epinzur epinzur commented Aug 21, 2023

i was having trouble shortening this test, while still being able to reproduce the bug.

feel free to shorten if it still reproduces the issue.

note that I haven't fixed the issue yet. only added this failing test.

@cla-bot cla-bot bot added the cla-signed Set when all authors of a PR have signed our CLA label Aug 21, 2023
@epinzur epinzur force-pushed the esp/else_bug branch 2 times, most recently from a06147b to 6d235f6 Compare August 21, 2023 12:54
@@ -0,0 +1,4 @@
{"_time":"1970-01-01T00:00:00.000000001","_subsort":0,"_key_hash":17095134351192101601,"_key":"dev","joined":{"text":"Thread 1","user":"UCZ4","time":1,"thread_ts":1.0,"key":"dev"},"threads":{"text":"Thread 1","user":"UCZ4","time":1.0,"thread_ts":1.0,"key":"dev"},"non_threads":null}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on what is the expected results here? These appear to be correct. If threads is null then non_threads is used as the value of joined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def record_source_slack() -> kd.sources.JsonlString:
content = "\n".join(
[
"""{"text":"Thread 1","user":"UCZ4","time":1,"thread_ts":1,"key":"dev"}""",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't use """...""" here -- those are "docstrings". Lint will likely suggest changing it.

"""{"text":"Msg 2","user":"U016","time":4,"thread_ts":null,"key":"dev"}""",
]
)
return kd.sources.JsonlString(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyList may be even easier for this example:

source = kd.sources.PyList(
        [
            {"time": "1996-12-19T16:39:57", "text": "Thread 2", "user": "U016", "thread_ts": 1, "key": "dev" },
            ...,
        ],
        time_column_name="time",
        key_column_name="user",
    )

threads = record_source_slack.filter(record_source_slack.col("thread_ts").is_not_null())
non_threads = record_source_slack.filter(record_source_slack.col("thread_ts").is_null())

# this call re-orders the columns in the non_threads timestream
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the error, it doesn't seem to care that the fields are re-ordered. Instead, it doesn't like row 3:

  At positional index 2, first diff: {'user': None, 'text': None, 'time': None, 'thread_ts': None, 'key': None} != {'text': 'Msg 1', 'user': 'U016', 'time': 3, 'thread_ts': None, 'key': 'dev'}

non_threads = joined.col("non_threads")

golden.jsonl(
joined.extend({"joined": threads.else_(non_threads)})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I believe the problem here is that we're creating a new record containing joined. Even when threads.else_(non_threads) is null, this record will sometimes be not null. Eg.

Two questions / thoughts:

  1. Why do we need the joined at all? What happens if you just output threads.else_(non_threads)
  2. If we could do joined.extend(threads.else_(non_threads) that would also help (and it is currently planned)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed Set when all authors of a PR have signed our CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants