Hi,
I have a long-standing hobby project involving cross-platform multi-threaded compression. Basically, the program takes chunks of input file and passes it to multi-step compression pipeline.
By doing so, it constantly mallocates and frees memory after entering and leaving each step. Now multiply this by the number of CPU threads and you get a lot of malloc/free invocations.
So I thought, to speed things up, I'll switch to "arena type" memory allocation. After I reworked my library I was suprised that I actually didn't get much speed-up at all. As it turns out, malloc/free is very very speedy as is.
My question is, should I stick with the new "arena allocator" or should I leave it as is - a simple malloc/free in a self contained pipeline steps for the purpose of code clarity.
If you're interested, I currently have an open PR for this because I'm not too sure if I should merge it since I haven't gained any speedup.
EDIT: If someone knows, I would also like to know reason behind that. Is malloc/free really that much optimized so that is the same as moving one pointer up and down in arena allocation?
[link] [comments]






English (US) ·