Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Journalling of input batches #841

Open
5 tasks
bjchambers opened this issue Oct 31, 2023 · 0 comments
Open
5 tasks

Journalling of input batches #841

bjchambers opened this issue Oct 31, 2023 · 0 comments
Assignees

Comments

@bjchambers
Copy link
Collaborator

  • Write batches to a journal
  • Recover batches from the journal
  • Checkpoint journal segments to a "prepared" file
  • Prepare files directly to journal checkpoints
  • Sources for reading from the input journal / checkpoints
@bjchambers bjchambers self-assigned this Oct 31, 2023
bjchambers added a commit that referenced this issue Nov 1, 2023
This is part of #841.

- Persistence of batches using Arrow IPC format.
- Journalling and recovery of batches via append-only `okaywal` crate.
- Concatenating batches to produce checkpoints (also in IPC format)
- Recovering from checkpoints
- Concatenating journal checkpoints with in-memory data for execution.

Next steps
- Allow adding checkpointed batches directly (eg., for import from
  Parquet).
- Allow subscribing to newly added batches (eg., possibly replacing
  existing in-memory implementation).

Future work:
- Handle late data by merging with overlapping in-memory and possibly
  checkpointed batches.
- Checkpoint to object store / read checkpoints from object stores.
  May lead to moving checkpoint storage to Parquet, even while using IPC
  for the write-ahead log.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant