Web Developers often fixate on optimizing the delivery of assets to the end-user's device, and overlook the computation that takes place on the end-user's device once those assets arrive.
- Bundle size doesn't matter since the bundles are stored in cache!
- We just completed our React migration, so we were expecting sizeable performance improvements, especially when the JS is cached!
These couldn't be further from the truth!
Web application performance is fundamentally tied to the network and the speed at which we can deliver an asset to the end-user device.
However, even if a web application has all assets optimally delivered and/or cached, that doesn't mean that it will run fast or render quickly when it is needed during runtime on the end-user's device.
At a high level, there are two primary performance bottlenecks on the web:
- Networking - the round-trip time to acquire an asset or data payload from a remote server
- End-user Device Compute - the amount of computational overhead required on the end-user's device
Consider the following React-based SPA:
ReactDOM.render(<MyApp />, document.getElementById('root'))
A common and insightful optimization would be to apply Cache-Control
headers to the
vendor-bundle JS files. One might even go further and use a Service Worker to
The following diagram overviews the process of loading a script from disk cache:
Let's consider some of the various bottlenecks that arise here.
Inter-Process Communication (IPC) is how processes in the browser send messages between each other.
Browser cache stores cached files on the end-user's device on disk. Disk access is not granted to the process executing your web application and is performed by the dedicated Network process, which loads cached assets from disk for all tabs and windows across the browser.
The process running the Web Application sends an IPC message to a dedicated Network process to load an asset. The asset is loaded from disk by the Network process and transferred back to the Web Application process via an IPC message.
IPC messages are not instantaneous. Furthermore, a large asset will require more time to fully transfer across the IPC channels.
The asset size will have a direct impact on how long this IPC transfer takes, and text compression like GZip or Brotli will not help mitigate this cost!
In the Chromium Network Tab, you may see a sizeable Content Download time associated with large scripts, even if they are loaded from the browser cache. This indicates IPC overhead:
Disk access performed by the Network process is not instantaneous.
For some users, this disk access may be extremely slow. Many users are still using physical spinning hard drives, or have a disk that's under intense load (i.e. running an antivirus scan or at storage capacity).
Furthermore, high-frequency disk reads (i.e. trying to load many cached scripts from Browser Cache at once) can cause backup on the Network process's Thread and Event Loop, and incur queueing time:
In addition, a large asset will take longer to read from disk, and text compression like GZip and Brotli will not help mitigate this cost.
Once an asset has been read off disk and copied back to the Web Application process, its journey is not over!
The above image is from Franziska Hinkelmann's blog.
At a high-level, this process works like this:
- Parse the JS text into an Abstract Syntax Tree
- Traverse the Abstract Syntax Tree, and emit Bytecode
- Load the Bytecode onto the thread, and begin execution of the script
Browsers will apply various optimizations in this phase, such as deferred parse and compilation and moving portions of this work to dedicated threads to help parallelize. But despite best efforts from the browser, this cost is still significant.
In a performance trace, one can observe this cost across the Main Thread and other threads:
A compilation block on the Main Thread:
A savvy web performance engineer might have discovered that the browser performs heuristic-based bytecode caching.
You can see the compilation cache size in the Profiler:
Furthermore, not all bytecode caching techniques optimize in the same way. For example, in Chromium, scripts cached
during the Service Worker
'install' event generate a more comprehensive instruction set, but often produce a larger
size. This reduces runtime codegen, but incurs more overhead during initial code load.
Once the script finally has its initial bytecode generated and loaded onto the thread, the browser can begin execution.
You can use performance timing marks and measures to capture and measure the speed of the codepaths in your critical path.
In most cases, your script still has more bytecode to produce when it executes!
What happens, however, is that as your script starts execution, the engine discovers previously skipped codepaths that need to be compiled, and this cost is often still observed, even if a script is loaded from cache.
In the profiler, you'll observe the following: