Project MemShrink

Paul Bone — Sep 25, 2020

I asked, and have been given a new project at work. Together with another engineer I will be working on reducing Firefox’s memory footprint.

Mozilla has run a MemShrink project in the past, which even though it has wound-down, Firefox is still regarded as being more memory efficient than Chrome and the other chromium-based browsers. However keeping this accolade is going to be difficult.

Lately we’re working towards project fission - a site isolation feature and new process model for Firefox. This will necessarily require more OS processes for Firefox (particularly for users who browse multiple different sites in many tabs) which requires more memory.

We’ve started a new memory reduction project to ensure we can enable Fission without excessive memory use: Fission MemShrink. This was actually started in 2018, but now we have two full time engineers on it and one of them is me.

So far

A lot of the "easy wins" have already been won: we reduced the base memory usage on Windows from 21.5MB in 2018 to 14.8MB in 2019, and now 11.5MB in September 2020. This included situations where memory was used excessively, or not released when it could be, and other low-hanging-fruit. However that’s just base memory, the memory used by a content process before it loads a page. Memory usage of the whole browser has increased over the past year (by about 10% as best I can tell from our tracking).

Where to optimise

(TL;DR: use a profiler.)

The main goal is to reduce the overhead from enabling fission. There’s multiple places to look:

Reduce the number / size of structures that are used more commonly with fission.
Reduce the amount of memory an "empty" Firefox content process consumes (that’s the base memory above).
Reduce the overall memory usage of the browser - that’s always a good thing!

I’ve picked the 2nd, or almost-2nd one: Reduce the amount of memory overhead for a process loading a tiny page (or iframe). Firefox can generate a memory report, showing where memory is allocated.

With fission enabled I load http://example.com in one tab (because it’s a small page), and open about:memory in another tab, click "Measure" and Firefox will generate a memory report. We’re using this as the starting point for where to concentrate our optimisation efforts.

jemalloc

Table 1. jemalloc size classes
Category	Subcategory	Size
Small	Tiny	4
	Tiny	8
	Quantum-spaced	16
		32
		48
		…
		480
		496
		512
	Sub-page	1 kB
	Sub-page	2 kB
Large		4 kB
		8 kB
		12 kB
		…
		1012 kB
		1016 kB
		1020 kB
Huge		1 MB
		2 MB
		3 MB
		…

For a small process we noted that the memory allocator has more overhead than we’d like, >1.90MiB out of a 20.7MiB process (9.21%) and it may be worth-while looking here as place to reduce overheads. This overhead is the bin-unused item on these memory reports. So this is where I’m starting.

Firefox uses a fork of the jemalloc memory allocator, which has now been heavily modified. jemalloc allocates memory in several size classes, the most popular comment in the mozjemalloc.cpp file contains a table explaining its size classes (Table 1):

Note that these sizes can vary with different settings, this is the default on a 32bit build with 4KB pages. On a 64bit build the 4-byte class is missing and windows is always missing the word-sized class.

The 'bin-unused' item on the memory reports is only relevant for the small size classes in this table. In these size classes an area of memory that jemalloc calls a run (one or more pages) is divided up into allocations of this size class, So a request for 12 bytes is rounded up to 16 and allocated in a run that contain other 16 byte allocations. 'bin-unused' measures all the unallocated slots in these runs.

This is not wasted memory or lost memory. But it’s not "merely unused" either. For example, there may be many unused 16-byte slots, even some close together, but they cannot be used to satisfy a request for 32 bytes. And because they’re less than 4KB, they represent physical memory - they cannot be paged out without paging out the rest of that memory page.

Note	I think I owe you an article on what a memory page is. Since I haven’t defined that and have used it fairly liberally. If you don’t know then just pretend I’m saying "4KiB area" and await a future article!

Note	A `run` may be one or more pages. Packing 2KB annotations, one per page would be inefficient so when allocations are larger (or don’t align well with their header) jemalloc will use longer runs of pages. Here is a table with examples of run lengths.

(or sometimes more, but for this article I’ll assume a page) will only store items of that specific size class. This reduces overheads since the size information and if this were a GC other metadata) could be shared between all the items in the same page.

Wish me luck

I could go on with what I’ve found, and I will but in future posts. By keeping this post shorter I can publish it today. Generally this may help me write more generally, if each post is less time-consuming. So forgive me for leaving you in suspense, but that’s what’s best to maintain a blog at all.

In the next post I’ll write more about bin-unused and something called slop and describe what I might be able to do / have done to fix them. I’ll also write an introductory post about paging.

Thanks for reading.