Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance tuning via reduced durability #54

Open
meteficha opened this issue May 28, 2015 · 6 comments
Open

Performance tuning via reduced durability #54

meteficha opened this issue May 28, 2015 · 6 comments
Labels

Comments

@meteficha
Copy link
Contributor

I have some test code that right now that is taking 582s of wall time and 51s of CPU time to complete. If I completely remove fsync calls by editing the Unix FileIO, I get 16s of wall time and 35s of CPU time. That's a 36-fold improvement to wall time!

I'm not suggesting we should not call fsync as that would rename the package to aci-state. However, many DBMSs provide knobs to adjust performance vs durability. For example:

  • SQLite provides three different compromises.
  • Redis allows you to never fsync, to fsync every second or fsync every query.
  • PostgreSQL has many knobs. You can disable fsync entirely. You can keep fsync but return query results before fsync completes.

I'm not proposing anything concrete that should be done, I'd just like to start a conversation about possible tradeoffs.

@meteficha
Copy link
Contributor Author

IIUC, currently acid-state will refuse to start if the log is corrupt. So going the Redis way of fsyncing every second seems a bad idea.

I like the option of continuing user code before data is written to the disk. It looked like scheduleUpdate could help me here but it doesn't work for my use case: a query after a scheduleUpdate may not see the update at all. What I wanted was for querys to see the updated state but to avoid waiting the update to be flushed to the disk.

@lemmih
Copy link
Member

lemmih commented May 28, 2015

We could add a 'reallyUnsafeUpdate' function which would make the result immediately available without waiting for serialization. This wouldn't work for the Remote backend, obviously. And God have mercy on your soul (and your data) if you use the function carelessly.

@meteficha
Copy link
Contributor Author

Well, it's not "really unsafe", depends on the context.

My test code above implements a storage backend for server-side sessions. On most websites it will be no big deal if the session data from the last few seconds is lost forever. However, adding hundreds on milliseconds per request due to fsync is a big no.

What about, instead of providing a different update function, adding an option that could be set when opening the Local backend? The option could be named "really unsafe" without hurting the legibility of every bit of code that updates something.

@lemmih
Copy link
Member

lemmih commented May 28, 2015

Permanently losing data doesn't qualify as "really unsafe"? I hope you're not working in banking. :)

I would be willing to downgrade it from "reallyUnsafe" to just "unsafe". At first I was against adding an option to make this behavior the default but actually that's the only approach that makes sense. An 'unsafeUpdate' function would taint your entire application, not just the thread that used it. So, we want an option for the Local backend to return results immediately without waiting for fsync, and we want a 'waitUntilMyDataIsSafe :: AcidState -> IO ()' function as well.

@meteficha
Copy link
Contributor Author

Well, even on banking I imagine losing session data would be harmless. You could log out some logged in users, or forget that some users had asked to be logged out. Doesn't look that bad :).

@meteficha
Copy link
Contributor Author

BTW, I like your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants