Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove additional bytestring conversion. #125

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

dmjio
Copy link
Member

@dmjio dmjio commented Jun 18, 2019

When performing a checkpoint, the heap gets very large (eye balling with ekg). I think the additional Bytestring copying is to blame. Do we need to copy the bytestring? I think it gets copied twice, once with the strict conversion, and again with the call to copy.

cc @adamgundry @stepcut @lemmih @cleverca22

@dmjio dmjio self-assigned this Jun 18, 2019
Copy link
Collaborator

@ddssff ddssff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous change to that line was

-	  encoded = Lazy.fromChunks [ Strict.copy $ Put.runPut (safePut object) ]
+	  encoded = Lazy.fromChunks [ Strict.copy $ Lazy.toStrict $
+	              serialiserEncode (logSerialiser (logIdentifier fLog)) object ]

It is hard to see how this change wouldn't be an improvement.

@stepcut
Copy link
Member

stepcut commented Jun 18, 2019

This is patch where the copy was originally added. But I am not sure why it was added,

0dfb394

What purpose could that Strict.copy have? That is not the sort of thing you just add with out some reason in mind. Pretty much the only reason to Strict.copy, as far as I know, is because you have a space leak and need a copy of the ByteString without all the extra baggage.

Or maybe without it there is too much laziness and things are not durable when they are supposed to be?

Clearly that Strict.copy was intended to do something -- once we understand that bit, we can figure out how it interacts with the later addition of Lazy.toStrict.

It does seem like we would not need both Lazy.toStrict and Strict.copy because I think that toStrict does already do a full copy. So a safer patch might be to remove the now redundant copy. But then we should leave a note that says if the toStrict is removed, the copy should be added back?

@dmjio
Copy link
Member Author

dmjio commented Jun 18, 2019

Maybe @lemmih could chime in (if he's not too busy with LHC) as to the motivation of introducing Strict.copy, coming up on a decade old patch here.

@lemmih
Copy link
Member

lemmih commented Jun 20, 2019

The encoder would allocate a minimum of something like 32KB which would kill performance for small events. Copying small objects yielded a more than 10x speed improvement but it's not a great thing to do for large checkpoints. If the serialization library can be configured to use a smaller initial buffer then the copying can just be removed with no ill effects.

@ddssff
Copy link
Collaborator

ddssff commented Aug 19, 2021

Maybe we should do an A-Bump, merge everything, and let the chips fall where they may?

@ddssff
Copy link
Collaborator

ddssff commented Aug 19, 2021

Previous comment doesn't sound so great now that I'm rereading it.

@adamgundry
Copy link
Contributor

Sorry for neglecting this for so long. I think we should drop the copy, because it clearly hurts heap usage a lot with large checkpoints.

I don't fully understand why the copying helped, but it sounds like the issue is that the toLazyByteString call in packEntries could allocate too large a buffer for small events? Recent versions of bytestring have the builder allocate a 4k buffer first and then 32k buffers thereafter, which might avoid this without needing anything in acid-state? Or we could use toLazyByteStringWith to tweak the allocation strategy for the events archiver, or indeed merely perform the copying when serialising events, not checkpoints?

@stepcut
Copy link
Member

stepcut commented Apr 7, 2022

I think the problem here is that we have no suitable benchmark/testsuite to help give us confidence that this improves things more than it hurts things.

@adamgundry
Copy link
Contributor

Agreed. I've actually been looking at this a little recently, and started some work on improving the existing benchmarks, which is currently at 110-benchmark-tweaks.

It's fairly clear that this change can bring down memory usage substantially when checkpoints are large. That alone would make it worth including for my use case, where we don't see high enough loads to be too worried about throughput.

However, it is harder to assess the impact on throughput from the existing loading-benchmark, particularly for the case of many small events. The problem is that the relevant Local benchmarks are disk IO bound, and I've found they tend to be very noisy, which is not terribly surprising. Thus it is difficult to have confidence in the results. Perhaps they would fare better on a quieter machine with spinning rust rather than my development laptop with SSD...

Another thing that could be nice is benchmarking a server that uses acid-state, using a load testing tool that would report the response latency distribution (e.g. I've heard good things about https://k6.io). That might be more suitable than criterion, which IIUC is designed for micro-benchmarking functions with more predictable runtime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants