Best Practices on Web Site Performance Optimization

Over the last couple of years performance of web applications became more important to businesses as search engines (such as Google) factor in performance into their ranking. This ultimately leads to Performance == Better Visibility == More Users == More Revenue. Read more on Why Web Performance Optimization is as Important as SEO

Vote Up

Vote Down

The best way to measure the performance of your website is by looking at certain Key Performance Indicators (KPI’s) that tell you how fast or slow your web site is to the end user. Driven by efforts from web performance specialists such as Steve Souders and companies like Google and Yahoo, the industry has learned that factors such as page load time, number of network roundtrips and transferred size are important performance indicators for a web page.

With the tracing capabilities of dynaTrace AJAX Edition it is possible to extend the list of existing KPI’s to also include metrics such as Time to First Impression, Time to Fully Loaded and Time Spent in JavaScript. This document describes a list of KPI’s that should be tracked on every page, what we consider as good and bad and what you can do to improve these KPI's.

KPI’s on Load Time

There are 3 interesting phases of a web site from an end-user performance perspective. The dynaTrace AJAX Edition visualizes the page lifecycle in the TimeLine View where we can highlight First Impression, onLoad and Fully Loaded Time:

Time to First Impression

This is the time from when the URL is entered into the browser until the user has the first visual indication of the page that gets loaded. The first visual indication is the first drawing activity by the browser and can be traced with dynaTrace AJAX Edition. It depends on the initial HTML document when the browser can start drawing content. There are different Best Practices available that talk about different strategies. Google for example downloads a minimalistic page to provide fast first visual rendering. It then delay loads more content after onLoad or even later when the user starts interacting with the page.

Time to onLoad Event

This is the time until the browser triggers the onLoad event which happens when the initial document and all referenced objects are fully downloaded. JavaScript onLoad handlers use this event to manipulate the current initial state of the page. This event is one of the options explained earlier to download additional or delay load content.

Time to Fully Loaded

This is the time until all onLoad JavaScript handlers have finished their execution and all dynamically or delay loaded content triggered by those handlers has been retrieved. It is sometimes a bit hard to identify the exact time when the page is fully loaded especially when JavaScript handlers use reoccurring timeouts that constantly modify the page, e.g.: to implement a ticker.

KPI’s on Resources

A web page is composed out of the initial HTML document, embedded resources such as images, css, javascript and dynamically downloaded content via XHR or by modifying the DOM through javascript. The more resources there are on the page the more network roundtrips between the browser and the server to download the content. The larger these resources the more bandwidth is required to transfer the content. Please have a detailed look into the Best Practices on Network document that describes in more details on why reducing roundtrips and payload size is important. The dynaTrace AJAX Edition provides the Network View to analyze each individual network resource that has been downloaded:

From the Network View we can read several Network Resource related KPI's that help us to understand the structure and size of the page.

Total Number of Requests

This is the total number of network requests that get downloaded with the website. The ultimate goal is to keep this number as low as possible in order to reduce roundtrips. Monitoring this KPI gives you early indications on newly introduced content that can negatively impact page performance.

Total Number of HTTP 300s/400s/500s

This is the total number of requests to the server that responded with an HTTP Status Code of 300 (Redirect), 400 (Authorization Problem) or 500 (Server Error). These are requests that should be avoided as they have a negative impact on the page load time. The root cause of these problems is often server-side related implementation, configuration or deployment issues.

Total Size of Web Site

This is the total size of all resources that make up your page. It is important to keep track of the total payload size. The larger web sites become the longer it takes to download. Changes to the page – such as adding images or new javascript libraries – can have a significant impact on download time.

Total Size of Images/CSS/JS

Besides keeping track of the total page size it is important to look into the sizes of the individual content types such as Images, Style Sheets and JavaScript files. With this it is easier to spot the main contributors of page size.

Total Number of XHR Requests

The total number of XmlHttpRequests (XHR) sent via JavaScript to retrieve data asynchronously from the server. Monitor this KPI to identify sudden changes in dynamic content retrieval via XHR. Some JavaScript frameworks provide update mechanisms with the server-side and use XHR for these purposes. Depending on the configuration you can end up with too many XHR requests that not only impact client side performance but also cause additional load on the application server.

KPI’s on Network Connections

The browsers underlying network connection has a major impact on the download speed of web site content. There are different phases when downloading content that impact the overall download time. The dynaTrace AJAX Edition shows all phases for every network request.

A request that gets handled by the browser runs through different stages. The following list explains these phases, what the measures tell us and how they get impacted by the browser, the network and other requests.

DNS Time

One DNS Lookup happens for every domain that hosts resources for the current web site. If you move between multiple pages the browser does not require another DNS lookup for a domain that has been resolved on the previous page. It is interesting to look at the total DNS time to identify problems with DNS Lookup Times that can be caused by DNS configuration problems.

Connect Time

Depending on the browser and the number of resources that are served by a domain the browser establishes one or multiple connections to each domain that hosts resources for the page. Connect Time is the time it takes to establish the TCP/IP connection to the web server. Connections usually stay open unless the Web Server directs the browser to close the connection (Connection HTTP Header). When using secure communication via SSL, the Connect Time also includes the time of the SSL handshake. High Connect Time can therefore have the following reasons: slow network connection to the web server, usage of SSL and not allowing the browser to keep the connection open.

Server Time

High Server Time means that the Web/Application Server required a long time to process the request. This is particularly relevant with requests that trigger application logic to be executed on the application server where higher Server Times can be expected – especially under heavy load periods. Monitoring Server Time is important to identify bottle necks, performance and scalability problems with the application server. It is usually easier to scale static content delivery by adding more web servers with load-balancers or by using a Content Delivery Network. It is not that easy to scale a dynamic application in the same way. Keeping an eye on this metric is important.

Transfer Time

This time directly correlates with the size and the connection speed between browser and server. Keeping transfer time low is important to ensure faster load times. Transfer Time can be improved by lowering the Total Page Size and by bringing content closer to the end user by using Content Delivery Networks (CDNs)

Wait Time

Wait Time is directly correlated with the number of resources that are served by the same domain. The physical network limitation of a browser per domain causes resources to wait for a free connection. Reducing the number of resources or spreading the resources over different domains will bring this time down. Instead of looking at the total Wait Time the average Wait Time tells a better story whether Wait Time is of a concern.

Number of Domains / Single Resource Domains

The number of domains that host the web sites resources is important as it affects DNS, Connect and Wait Time. Additional domains that are utilized to download resources will have a direct reduction in the wait time because the browser ultimately uses more physical connections. This can have an opposite affect when more DNS lookups are needed and more time is spent to establish the physical connections. Single Resource Domains should be avoided as you pay a high price for performing the DNS and Connect to download a single resource. It is sometimes not avoidable when downloading content from external content providers (such as ad-services). When having the deployment under your own control you want to make sure to not have single resource domains.

Performance Savings, Recommendations and Rank Calculation

The ultimate goal for a web site must be to load as fast as possible and it must therefore be a goal to lower the 3 Page Load KPI's. There are different options you have such as reducing the network roundtrips (see the Best Practices on Network), making use of caching (see the Best Practices on Caching), optimizing server-side content generation (see the Best Practices on Server-Side Performance) and optimizing JavaScript/AJAX (see the Best Practices on JavaScript/AJAX). Optimizations in these 4 areas as well as following best practices such as delay loading javascript or loading js and css files on the bottom of the page will improve your page load times.

Rank Calculations

dynaTrace AJAX Edition calculates a total page rank based on some of the KPI’s that were discussed in this article. We consider the 3 Page Load Times as most important indicators and we identified the following threshold values to define what is great, acceptable and bad page speed:

Time to First Impression is great if < 1s, acceptable if < 2.5s and slow if > 2.5s
Time to onLoad is great if < 2s, acceptable if < 4s and slow if > 4s
Time to Fully Loaded is great if < 2s, acceptable if < 5s and slow if > 5s

The most important factor is the Time to First Impression followed by Time to onLoad and then Time to Fully Loaded. We penalize a threshold violation higher for Time to First Impression and onLoad as compared to Fully Loaded. For details see the example calculation below.

We also factor the number of total HTTP Requests as the number of roundtrips greatly impact overall download time. Great sites require fewer than 40 requests, acceptable are up to 100 requests. Sites with more than 100 HTTP Roundtrips are considered bad.

dynaTrace AJAX also calculates Ranks for Browser Caching, Network Resources, JavaScript/AJAX and Server-Side Activities.
The overall rank is calculated by weighting in 10% on each of these Sub Ranks. The overall rank therefore is calculated by taking 60% of the rank based on the KPI's and 10% each from Caching, Network, JavaScript and Server-Side.

Disclaimer

Of course – these are generic rules and may not be applicable for all web sites. CPU Power and Network Connectivity also have a great impact on load times. Across the board these thresholds seem to be fair.
The Rank Calculation is best suited when analyzing pages till they are fully loaded. When interacting with pages more dynamic content gets downloaded and impacts the the metrics that influence the rank calculation.
Our goal is to adjust all these rules over time based on more feedback we receive from our community.

Example

A page starts with a rank of 100 and is lowered based on missed thresholds. Assuming our page has a 1.6 seconds Time to First Impression. For every 200ms this KPI is slower than the value we specified as being great (which is 1s in this case) we degrade the rank by 1. This reduces the rank of this page by 3 due to Time to First Impression.

If the page has an onLoad of 3.2s it gets penalized an additional 6 points as 3.2s exceeds the 2s goal for a great time. We also use the 200ms rule that we take for the First Impression time.

If the page has a Fully loaded time of 4s it gets additionally penalized by 4 (difference to the 2s goal but only penalizing 1 point for every 500ms). This reduces the page rank by 10 (6 & 4) points due to OnLoad and Fully Loaded time.

Even though a page might be fast - if it requires too many roundtrips to download all resources we penalize the Rank. If a page causes more than 40 roundtrips we penalize 1 point for every 5 requests more than 40. If the page has 55 roundtrips we reduce the rank by 3.

The Rank calculation based on these KPI's therefore is 100-3-6-4-3=84.

Now its time to look at the subranks. We assume the following values: Browser Caching (60 out of 100), Network (80 out of 100), JavaScript (80 out of 100) and Server-Side(70 out of 100). We weight the overall Rank by taking 60% of the KPI Rank and 10% each on the Sub Ranks.
This gives us to the following calculation: 84*0.6+60*0.1+80*0.1+80*0.1+70*0.1=79 which corresponds to a Grade C.

Limiting the Time Based Penalties

We only penalize up to twice the values we consider as bad (5s/8s/10s). That means that – if a page would have a Fully Loaded Time of 12s we penalize it the same way as a page with a load time of 10s.

Rank Explanation

The Rank concept have been taken from tools like YSlow and PageSpeed who present their result in a Rank (100 is best - 0 is worst) that also corresponds to a Grade (A=100-90, B=89-80, C=79-70, D=69-60, E=59-50, F=49-0).