Penn Computing
Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

Caching Service on www.upenn.edu

Most pages housed on www.upenn.edu are cached by the web caching service provider, Akamai.

How does the caching service work

When a user requests one of your pages (.html, .pdf, .doc, .gif, .jpg, etc) from www.upenn.edu, instead of the user's request coming to the server that actually stores your page, the user is sent to one of the many Akamai EdgeSuite servers. The Akamai caching server will check that it has a copy of your page in storage and that the cache age of the copy has not expired. If the caching server has a copy and the cache age has not expired, the cached copy of the page will be displayed to the user. If the caching server does not have a copy of your page or the cache age has expired, the caching server will first come to the www.upenn.edu server and get a copy of your page, store it on the caching server, and then display the copy to the user. A detailed explanation of the service is available.


Why cache

It might be a long trip across the Internet to get from a user's machine to our web server. That trip is subject to traffic jams, congestion, construction, and network failures. While the performance of our web server may look great here on campus, it may look worse elsewhere on the Internet. Because our content will be mirrored on Akamai's 1100 or so caches distributed around the edges of the Internet, connections from just about anywhere will be very short and fast.

Similarly, our server won't have to answer nearly as many requests as it does today, as the majority of them are expected to be handled by Akamai's caching servers. This greatly reduces the load on our infrastructure -- our Internet connection, PennNet, and the web server itself. This allows us to avoid expensive, complicated web server infrastructure, which means we can keep our costs down -- which means we can keep your costs down.


What is the cache age

For the initial rollout of our service we have chosen 5 minutes as the cache age for our pages on www.upenn.edu. That caching period will be adjusted as the service matures and providers become more familiar with caching and how caching affects their pages.

Most pages on www.upenn.edu are not modified on a daily basis and displaying a copy of a page that is up to 5 minutes old still displays current and correct information to the user. Opting to cache that information with a web caching service provider like Akamai insures more reliability and scalability, better response time for the user and helps us to reduce maintenance costs for running the server.

Such a short cache age will minimize propagation delays when a page is updated, but will also minimize the benefit of caching. A longer cache age means that more pages will already be in the cache when a user requests them, which means quicker response times for the user and reduced impact on our infrastructure. We hope to increase the default cache age over time, with an eventual target of 24 hours for static content such as images and pages that are rarely modified.

Pages that are modified more often could be considered dynamic content.


What is dynamic content

In this discussion of our caching service, dynamic content is information in a web page that is updated on a regular basis more often than once in 5 minutes.

On www.upenn.edu, there are four mechanisms for automatically providing dynamic content to a page and the links below lead to specific instructions for overriding the cache for each mechanism.

You can use any of the above mechanisms without having the content of the page actually change. If you are using these mechanisms and the content of your page doesn't change, you can use the default caching service and you don't need to do anything different when maintaining your pages.

If you are using any of the above mechanisms and the content of your pages changes more often then the default caching period, you need to override the default cache.


How do I override the default cache

In order to benefit from the caching service, we strongly recommend that you do not override the default caching of pages on www.upenn.edu unless you have dynamic content that changes more often than the default cache or if you are certain that you can cache your information for a longer period.

If it's necessary to override the default caching period, you can do so by adding the HTTP header, Cache-control, to your pages that redefines the cache period. Please note that if your page is already cached, any of the following changes will not be applied to your page until the current cache expires.

.htaccess Directive

ServerSideIncludes and JavaScript do not have a mechanism for adding an HTTP header but you can override the caching for files that are using ServerSideIncludes or JavaScript to produce dynamic content by creating a file called .htaccess in the directory that has the file using ServerSideIncludes or Javascript. If you want to override the cache for index.html in a directory, the .htaccess in that directory should contain a line like

<Files index.html>
Header append Cache-control "max-age=1800"
</Files>
where the "max-age" is the number of seconds after which the caching server will go and get a new copy of the file. In this example, the cache for "index.html" is now set for 30 minutes.

You may want to override the cache for a whole class of files. If you wanted to override the cache for all images in a directory, the .htaccess in the directory could contain a line like

<Files ~ "\.(gif|jpe?g|png)$">
Header append Cache-control "max-age=1800"
</Files>
where the "max-age" is the number of seconds after which the caching server will go and get a new copy of the file. In this example, the cache for all files that have a file extension of either ".gif", "jpeg", ".jpg", or ".png" would not be set for 30 minutes.

For more information on the Apache <Files> directive, please see the Apache documentation.

CGI

If you are running a CGI script on www.upenn.edu and you want to override the caching of pages generated by that script, add a line to your HTML header like

Cache-control: max-age=1800
where the "max-age" is the number of seconds after which the caching server will go and get a new copy of the file. In this example, the cache for "index.html" is now set for 30 minutes.

In a Perl script, e.g., your code would look something like

print "Content-type: text/html\n";
print "Cache-control: max-age=1800\n\n";

PHP

To override the cache for a page that is using PHP, add a line to the actual PHP file like

<?php
Header("Cache-control: max-age=1800");
where the "max-age" is the number of seconds after which the caching server will go and get a new copy of the file. In this example, the cache for "index.html" is now set for 30 minutes.

Please note that you cannot add the HTTP Cache-control header using a <META> tag within your HTML page. <META> tags are a browser device and are ignored by the caching servers.


How do I delete the cache

There is no mechanism that allows a user to delete the cache on demand. If you have an emergency and must delete the cache for your page(s), please open a ticket at http://www.upenn.edu/prodesk/. Please note that deleting the cache is a very intensive operation since that request has to propagate through the many caching servers. The deletion of the caching for a page can take as long as 15 minutes.


How do I upload my pages to www.upenn.edu

www.upenn.edu providers use FTP to upload data to the server. Since the hostname www.upenn.edu now points to the Akamai servers, you can no longer FTP to www.upenn.edu. You must change the configuration for your FTP client to point to

origin.www.upenn.edu
Additional information on uploading data is available.


How do I preview changes before they are written to cache

If you would like to preview your modifications to your page before they are written to cache, you can point your browser directly to

http://origin.www.upenn.edu/
adding the path to your page. This will bypass the caching servers and your browser will be negotiating directly with the server that is housing your real content.

If your page on www.upenn.edu is

http://www.upenn.edu/almanac/between/between.html
you can preview changes to this page by pointing your browser to
http://origin.www.upenn.edu/almanac/between/between.html

Please do not create links in your pages that go directly to origin.www.upenn.edu. Linking directly to origin.www.upenn.edu defeats the purpose of caching.


Do I have to change my links to www.upenn.edu

You do not have to change your links and in order to take advantage of the caching, you shouldn't change your links. You should continue to use www.upenn.edu when linking to pages on the Penn server.

The one exception to this rule is linking to the Altavista search indexes.

Search index Old URL New URL
Advanced search http://www.upenn.edu:8080/ http://search.www.upenn.edu:8080/
Computing Web search http://www.upenn.edu:9000/ http://search.www.upenn.edu:9000/
Publications Web search http://www.upenn.edu:9020/ http://search.www.upenn.edu:9020/
University Archives Web search http://www.upenn.edu:9025/ http://search.www.upenn.edu:9025/
Ben Knows search http://www.upenn.edu:9100/ http://search.www.upenn.edu:9100/

An easy way to find links that you may have to any of these search indexes is to use the Advanced Search. To find all pages on www.sas.upenn.edu that are linking to the old URL for the Computing Web search, go to the Advanced Search and enter the search term

host:www.sas.upenn.edu link:www.upenn.edu:9000


Why do I get the error message, "Invalid URL" when requesting a page from www.upenn.edu

Your browser may have been configured so that you could have requested a page from www.upenn.edu without having to fully qualify the domain name. Instead of typing

http://www.upenn.edu/almanac/
you may be accustomed to typing
http://www/almanac/

Since www.upenn.edu now redirects to the Akamai caching service gateway and that gateway must know the actual hostname of the server that you're trying to reach, you must specify the full URL when requesting www.upenn.edu pages with your browser.

You may also receive this error message if you are using a very old browser that doesn't support the newer HTTP/1.1 protocol.


Why can't people authenticate for pages with their PennKeys

If you are using the Apache/Websec module to restrict access to your web pages on www.upenn.edu which requires that a user authenticate first with his/her PennKey before being able to view your pages, you must turn off caching for your pages and turn off IP-checking. Since pages are being served by the Akamai servers, IP-checking will not work.

To restrict access to your pages and require that a user authenticate first with a PennKey, create a file called .htaccess in the directory to be restricted and enter the following into that file

AuthPennNet
AuthIPCkOff
<Files *>
Header append Cache-control "no-cache"
</Files>

For more information, please see our full documentation on restricting pages with a PennKey.


How does this affect my www statistics report

Since users actually go to the many Akamai caching servers to get your pages, the statistics on how your pages are being used are actually housed on the Akamai servers. Daily we will be pulling down those statistics so that we can continue to provide you with your web statistics but there will be a delay of up to two days in reporting.

The checking of links within your pages will continue to run our server and there should be no delay in that report.


Are other virtual hosts also cached

Our contract with Akamai is not currently sized to accomodate the volume of traffice associated with virtual hosts that may also have space on www.upenn.edu. Once the service matures, we will be happy to discuss how to extend the service to virtual hosts, and what the associated costs would be.

top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


University of Pennsylvania Penn Computing University of Pennsylvania Information Systems & Computing (ISC)
Information Systems and Computing, University of Pennsylvania