- Security Model
- Content and Rendering
1. Rendering Process
The job of a browser is to fetch and display a web-page. At a high level, most modern browsers carry out the following steps to render an HTML page:
- Load the HTML
- Parse it
- Apply styles
- Build frames
- Layout the frames (flow)
- Paint the frames
- Load: The browser tries to fetch the page from the specified location. Typically this would be thru a HTTP client. However, a HTML page may also be loaded from a filesystem. Irrespective, the loader fetches the HTML page from its location. The super important concept of Browser Cache comes into play over here - but more on this later. The way the HTML page gets loaded is different from the way the resources get loaded. In WebKit there are two different pipelines - one for loading the page and another for loading resources:
- Parse: As the stream comes thru from the loader, an HTML parser starts building the DOM (also called a "Content Tree") - each node here is an HTML element. Now a lot of HTML on the net is broken, and each browser has had to implement its own quirks to parse HTML leading to subtle incompatibilities. HTML 5 however specifies the parsing algorithm. As this gets adopted, the x-browser incompatibilities because of parsing should go away. While parsing, the engine may come across resources (JS, CSS, images, fonts, etc.) When that happens the particular resource is queued for loading and parsing continues. Again, there is more to this, which we'll tackle later.
- Compute Styles: The browser provides a default stylesheet. Often the HTML page also has a set of styles specified. These styles need to be applied to the Content Tree. For this purpose, a "Rendering Tree" is built - this essentially consists of elements that are to be rendered. For example an element with display set to none would not appear in this tree (nor would its descendants). Nor would elements like HEAD and SCRIPT. Nodes in the Render Tree represent style information: CSS box model, z-order, opacity are all specified here
- Construct Frames: Most render-able elements follow the CSS box model: They have height, width, border, spacing, padding, margin and position. For these objects, a rectangular box - called a Frame - is created. Not all objects have a frame - for example the SVG image above does not have a frame. It is put inside an iframe, which has a frame. A frame has all the information on how the object itself is going to be rendered. What is not known however, is how is the element going to be placed w.r.t other elements.
- Compute Flow: Flow Computation or Layout Computation is about how elements are placed w.r.t each other and is mostly controlled by the CSS Visual Rendering Model. This is typically a recursive process from the root of the tree to leafs. Also, this is typically a lazy process - it is done on a need-basis. Basically when the layout engine determines that an element needs to be laid out (for example a newly added Node), it marks it as such by setting a dirty bit. The actual layout is done only when some method is called which requires the new information. A visual representation of the layout process can be seen in these videos:
- Gecko reflow for Google homepage
- Gecko reflow for Wikipedia homepage
- Gecko reflow for Mozilla.org homepage
Most browsers do flow calculation at a higher resolution than what any display would have. This is to support zooming - when the user zooms in or out, the objects can be drawn correctly on the screen without requiring any extra steps other than mapping the coordinates to real pixels.
- Paint: Once the engine knows exactly where the objects need to be drawn, comes the process of actually rendering the objects on the screen. This process - called Painting - is described in agonizing detail in Appendix E of the CSS 2.1 Spec. This is basically a Tree walk from the root of the Rendering tree, where each node is asked to paint itself. The actual rendering is abstracted out thru a Graphics Engine which is responsible for actually turning on the pixels and things like hardware acceleration.
2. Rendering Modes
The actual execution of the rendering process described above can change completely based upon the rendering mode the browser decides to use for a particular page. The reason browsers have different rendering modes is because of the history of the web, and understanding rendering modes is very important to understanding how browsers behave. However, I would not touch upon it here since http://hsivonen.iki.fi/doctype/ does an excellent job of capturing all the details. If you are just interested in the background, read http://en.wikipedia.org/wiki/Quirks_mode.
3. Dynamic Pages
- If DOM elements are added or removed, the typical response of the browser is to follow the rendering process described earlier in almost serial order
- If the Style attribute on an element is changed, the style for the element needs to be recomputed, the page re-flown and re-painted
- Browsers may optimize this by batching style re-computes by queuing them
- However, scripts often read back changes that they have just made which requires the re-styling queue to be flushed
- For better performance, make style changes as a batch and then read them in a batch so that the queue is flushed less frequently
- Some style changes are cheaper:
- Changing size / location would not require style re-compute but only re-flowing and re-painting
- Color change does not require re-flowing, but only re-painting
- Scrolling also does not require re-computation, but only re-painting - this is typically done incrementally and may not even require full repainting (but things like fixed background images would necessitate full repainting). So moving elements by scrolling programmatically can be faster than moving elements by modifying their style attribute
- Re-Flow - because of position or size changes - is typically recursive (root to leafs)
- Some attribute changes in a child can trigger changes in the entire ancestry all the way up to the root. Example: Height changes
- Some attribute changes in a parent can trigger changes in all the descendants right down to leaves. Example: Width changes
- Browsers can detect that only a section of the tree may change and do re-flow only on that sub-tree
4. Resource Loading
As the Parser goes over the Content Tree, it may see an element referring an external resource (image, CSS, JS, font, etc), which needs to be loaded. This loading happens as follows:
Order of Loading
- Stylesheets are required to build the Rendering Tree, but have no impact on Content Tree, so HTML parsing and JS execution can continue while CSS is downloaded and loaded.
- A script could ask for style information even as the stylesheet is being downloaded and the rendering tree is being built. If this happens, you get an error. So you want to load styles before JS starts executing
Modern browsers maintain multiple persistent connections to a server. This allows parallel loading. Parallel loading is a good thing because it reduces the overall latency of the page getting delivered to the end-user. However, out of consideration for the load factor on web-servers, The HTTP 1.1 RFC recommends that "Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy".
Note that there is a trade-off here between the overhead of number of open sockets vs. the overhead of opening new sockets, and its impact on latency. With the number of external resources being fetched by a page going up, it makes sense to optimize for reducing the number of times a new connection has to be setup, to reduce the latency and improve the user experience. Indeed, most browsers these days allow more than 2 simultaneous connections per host. Steve Souders summarizes the current situation nicely in his Roundup on Parallel Connections.
Since a script can call document.write(), parsing can't proceed before the script is fully loaded, executed (if there is any inline script in the script block) and document.write() has been inserted. This means that a script load blocks parsing, and that means that further loading is blocked, preventing the parallelism mentioned above from being exploited. Modern browsers do help a bit. For example, in WebKit, when the main parser gets blocked because of a script load, it starts a side parser that figures out other resources to load in the rest of the HTML. However, that is WebKit - for other browsers, there are a couple of ways out:
- Put script blocks at the end - that way they do not pause any further parsing
- Use a hack to download scripts asynchronously - Souders sums these up in his post on Loading Scripts Without Blocking
- HTML 5 specifies the async attribute on the script tag which tells the browser that the script does not require synchronous execution and the parser can continue. WebKit recently started supporting this attribute and Firefox has supported this since 3.6.
5. Physical Architecture
Web browsers started off with a single process, single thread model. This was acceptable since web-pages were just documents that had to be rendered. However, the web has evolved from being document-centric to becoming application-centric - a lot of sites these days are applications, with a lot of active code, a far-cry from the static content browsers were designed to render. This gives rise to problems of stability, performance and security. To address these, most browsers have moved (or are in the process of moving) to a multi-process architecture. There are three drivers behind this trend:
- Performance: Multiple processes exploit multiple cores
- Security: The browser can spin up a new process in a lower privilege mode, reducing / removing the impact of malicious code
- Stability: A badly behaved page / script / plugin does not impact others since it is isolated in a process.
Firefox uses a single thread, single process model. This means that in Firefox a single UI thread is shared by all windows. The reason for that apparently is to allow X-DOM blocking calls from diff pages of the same origin. More details on http://www.mail-archive.com/[email protected]/msg03580.html. Network calls and web-worker requests are handled on different threads.
To provide better isolation and reliablility Firefox will move to a multi-process model with its Electrolysis project. However, this seems to be for plugins alone and pages would continue to be served from a single process.
The first browser to ship with multiple process support was IE 7, with each browser window running in its own process:
IE 8 improved upon this model by putting each tab in its own process, but moving the frame and the broker into a common process for improving startup time. Microsoft calls this architecture Loosely Coupled Internet Explorer (LCIE):
The actual model is however more sophisticated than what the diagram above suggests since IE 8 tries to balance the benefits of more processes with the extra overhead, without compromising on security. The actual process model is:
- Protected Mode processes: Irrespective of memory overhead, sites with different levels of configured security open in different processes. This approach called Protected Mode is based on Mandatory Integrity Control
- Context-based tab-processes: The decision on whether to create a new tab-process or not is made depending upon the amount of memory available
- Max tab-processes: A specific value of maximum tab processes that can be created for a single isolated session at specific MIC
Chrome follows an approach similar to that of IE 8 - the host process for a tab is called a Renderer and the broker process is called Browser:
Chrome supports four process models:
- Process per site-instance: Different visits to a site are in separate processes. Provides the highest level of isolation but also creates more overhead.
- Process per site: Different sites are isolated from each other, but visits to the same site run in the same process. Reduces overall memory overhead, but if you have several pages from a site open, the size of a single Renderer would be quite large, perhaps slowing it down.
- Process per tab: While the previous models consider source of origin, the process per tab model is based on the choice a user makes. One process is used for rendering one tab, and if in the same tab you switch to a different site, the process would continue.
- Single process: This is the simplest process with no isolation.
In both Chrome and IE, a frame runs in the same process as its parent page. Also, separate process may prevent legal interactions between two pages from the same origin. Chrome's solution to this is to not permit a x-process call even if it is legal. What IE does is to proxy these specific calls and convert them behind the scenes into some sort of IPC. This may also be supported by Chrome at some later stage.