In the previous blog post, we developed an understanding for the READ and MMAP system calls. If you haven’t read Part 1 of this, we suggest you go to this link. In this blog post, we are going to answer this simple question “Why not to use MMAP everywhere”.
Advantages of MMAP over READ
- Data is just copied once in MMAP as opposed to twice in READ
- Interacting with the MMAP is similar to doing any memory operations.
- In READ, an operation always leads to a system call as opposed to MMAP where you can perform direct operations on memory so no SYSTEM calls involved
So having said those advantages , it seems like one should always use MMAP over READs. But there are certain cases where MMAP might not do the trick or be worst at it.
Issues with MMAP
To explain the issue with MMAP, i will take the use-cases for certain applications and talk about what issues we might run into if we use MMAP
Application 1
Suppose i have an application where i need to read / write to certain big sized files ~ 1 GB and i have many such files. Memory Mapping each of these files will take a lot of VIRTUAL MEMORY.
So another option we have is to create MMAPs and destroy MMAPs whenever we are done with a particular file. In this option, major issue is that creation and destroying of MMAPs is rather heavy operation and should be avoided.
val mappedByteBuffers = GreaterThan1GBSizeFiles.map { file => val randomAccessFile = new RandomAccessFile(file) randomAccessFile.getChannel.map(FileChannel.MapMode.READ_WRITE, 0 , 100) } val byteNum = 1 val allFilesFirstBytes = mappedByteBuffers.map { e => e.get(byteNum) }
From a mail from Linus Torvalds,
Downsides to mmap: - quite noticeable setup and teardown costs. And I mean _noticeable_. It's things like following the page tables to unmap everything cleanly. It's the book-keeping for maintaining a list of all the mappings. It's The TLB flush needed after unmapping stuff. - page faulting is expensive. That's how the mapping gets populated, and it's quite slow.
So in this case, it would have been better if the application would have used normal fileRead operations as opposed to MMAP.
Application 2
Suppose i have an application to cache large number of small small files < 1 KB. In this case if we use MMAP, then default 4 KB or multiples of 4 KB of virtual address space is assigned to this file which might actually be in few bytes. So we would be wasting a lot and lot of Virtual Address Space. And as we have a limit on the virtual address space, our application might then need to manage the MMAPs lifecycle for these small small files which will unnecessarily complicate our application and degrade the performance due to reasons mentioned above.
Conclusions
- MMAP is great if you want to reuse some data over and over again.
- MMAP is great if you want to share data in a file across different processes. MMAP wins the game fair and square , when compared to a normal fd.read()
- Don’t use MMAP blindly without understanding the use-cases for your application and does your application actually need MMAP because in some of the cases using MMAPs might prove more costly due to its high setup and teardown costs.