Remove additional bytestring conversion. #125

dmjio · 2019-06-18T06:21:23Z

When performing a checkpoint, the heap gets very large (eye balling with ekg). I think the additional Bytestring copying is to blame. Do we need to copy the bytestring? I think it gets copied twice, once with the strict conversion, and again with the call to copy.

cc @adamgundry @stepcut @lemmih @cleverca22

ddssff

The previous change to that line was

-	  encoded = Lazy.fromChunks [ Strict.copy $ Put.runPut (safePut object) ]
+	  encoded = Lazy.fromChunks [ Strict.copy $ Lazy.toStrict $
+	              serialiserEncode (logSerialiser (logIdentifier fLog)) object ]

It is hard to see how this change wouldn't be an improvement.

stepcut · 2019-06-18T16:36:09Z

This is patch where the copy was originally added. But I am not sure why it was added,

0dfb394

What purpose could that Strict.copy have? That is not the sort of thing you just add with out some reason in mind. Pretty much the only reason to Strict.copy, as far as I know, is because you have a space leak and need a copy of the ByteString without all the extra baggage.

Or maybe without it there is too much laziness and things are not durable when they are supposed to be?

Clearly that Strict.copy was intended to do something -- once we understand that bit, we can figure out how it interacts with the later addition of Lazy.toStrict.

It does seem like we would not need both Lazy.toStrict and Strict.copy because I think that toStrict does already do a full copy. So a safer patch might be to remove the now redundant copy. But then we should leave a note that says if the toStrict is removed, the copy should be added back?

dmjio · 2019-06-18T17:46:35Z

Maybe @lemmih could chime in (if he's not too busy with LHC) as to the motivation of introducing Strict.copy, coming up on a decade old patch here.

lemmih · 2019-06-20T01:23:49Z

The encoder would allocate a minimum of something like 32KB which would kill performance for small events. Copying small objects yielded a more than 10x speed improvement but it's not a great thing to do for large checkpoints. If the serialization library can be configured to use a smaller initial buffer then the copying can just be removed with no ill effects.

ddssff · 2021-08-19T15:27:30Z

Maybe we should do an A-Bump, merge everything, and let the chips fall where they may?

ddssff · 2021-08-19T15:29:05Z

Previous comment doesn't sound so great now that I'm rereading it.

adamgundry · 2021-10-19T09:51:08Z

Sorry for neglecting this for so long. I think we should drop the copy, because it clearly hurts heap usage a lot with large checkpoints.

I don't fully understand why the copying helped, but it sounds like the issue is that the toLazyByteString call in packEntries could allocate too large a buffer for small events? Recent versions of bytestring have the builder allocate a 4k buffer first and then 32k buffers thereafter, which might avoid this without needing anything in acid-state? Or we could use toLazyByteStringWith to tweak the allocation strategy for the events archiver, or indeed merely perform the copying when serialising events, not checkpoints?

stepcut · 2022-04-07T12:14:35Z

I think the problem here is that we have no suitable benchmark/testsuite to help give us confidence that this improves things more than it hurts things.

adamgundry · 2022-04-08T08:59:14Z

Agreed. I've actually been looking at this a little recently, and started some work on improving the existing benchmarks, which is currently at 110-benchmark-tweaks.

It's fairly clear that this change can bring down memory usage substantially when checkpoints are large. That alone would make it worth including for my use case, where we don't see high enough loads to be too worried about throughput.

However, it is harder to assess the impact on throughput from the existing loading-benchmark, particularly for the case of many small events. The problem is that the relevant Local benchmarks are disk IO bound, and I've found they tend to be very noisy, which is not terribly surprising. Thus it is difficult to have confidence in the results. Perhaps they would fare better on a quieter machine with spinning rust rather than my development laptop with SSD...

Another thing that could be nice is benchmarking a server that uses acid-state, using a load testing tool that would report the response latency distribution (e.g. I've heard good things about https://k6.io). That might be more suitable than criterion, which IIUC is designed for micro-benchmarking functions with more predictable runtime?

Remove additional bytestring conversion.

89c1671

dmjio requested review from adamgundry, lemmih, stepcut and ddssff June 18, 2019 06:22

dmjio self-assigned this Jun 18, 2019

ddssff reviewed Jun 18, 2019

View reviewed changes

adamgundry mentioned this pull request Nov 16, 2023

Unnecessary use of strict bytestrings in Data.Acid.Log #166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove additional bytestring conversion. #125

Remove additional bytestring conversion. #125

dmjio commented Jun 18, 2019 •

edited

ddssff left a comment •

edited by andreasabel

stepcut commented Jun 18, 2019

dmjio commented Jun 18, 2019

lemmih commented Jun 20, 2019

ddssff commented Aug 19, 2021

ddssff commented Aug 19, 2021

adamgundry commented Oct 19, 2021

stepcut commented Apr 7, 2022

adamgundry commented Apr 8, 2022

Remove additional bytestring conversion. #125

Are you sure you want to change the base?

Remove additional bytestring conversion. #125

Conversation

dmjio commented Jun 18, 2019 • edited

ddssff left a comment • edited by andreasabel

Choose a reason for hiding this comment

stepcut commented Jun 18, 2019

dmjio commented Jun 18, 2019

lemmih commented Jun 20, 2019

ddssff commented Aug 19, 2021

ddssff commented Aug 19, 2021

adamgundry commented Oct 19, 2021

stepcut commented Apr 7, 2022

adamgundry commented Apr 8, 2022

dmjio commented Jun 18, 2019 •

edited

ddssff left a comment •

edited by andreasabel