![]() |
|||||||||
|
Caching Service on www.upenn.edu (effective March 25, 2004)Most pages housed on www.upenn.edu are cached by the web caching service provider, Akamai.
How does the caching service workWhen a user requests one of your pages (.html, .pdf, .doc, .gif, .jpg, etc) from www.upenn.edu, instead of the user's request coming to the server that actually stores your page, the user is sent to one of the many Akamai EdgeSuite servers. The Akamai caching server will check that it has a copy of your page in storage and that the cache age of the copy has not expired. If the caching server has a copy and the cache age has not expired, the cached copy of the page will be displayed to the user. If the caching server does not have a copy of your page or the cache age has expired, the caching server will first come to the www.upenn.edu server and get a copy of your page, store it on the caching server, and then display the copy to the user. A detailed explanation of the service is available. Why cacheIt might be a long trip across the Internet to get from a user's machine to our web server. That trip is subject to traffic jams, congestion, construction, and network failures. While the performance of our web server may look great here on campus, it may look worse elsewhere on the Internet. Because our content will be mirrored on Akamai's 1100 or so caches distributed around the edges of the Internet, connections from just about anywhere will be very short and fast. Similarly, our server won't have to answer nearly as many requests as it does today, as the majority of them are expected to be handled by Akamai's caching servers. This greatly reduces the load on our infrastructure -- our Internet connection, PennNet, and the web server itself. This allows us to avoid expensive, complicated web server infrastructure, which means we can keep our costs down -- which means we can keep your costs down. What is the cache ageFor the initial rollout of our service we have chosen 5 minutes as the cache age for our pages on www.upenn.edu. That caching period will be adjusted as the service matures and providers become more familiar with caching and how caching affects their pages. Most pages on www.upenn.edu are not modified on a daily basis and displaying a copy of a page that is up to 5 minutes old still displays current and correct information to the user. Opting to cache that information with a web caching service provider like Akamai insures more reliability and scalability, better response time for the user and helps us to reduce maintenance costs for running the server. Such a short cache age will minimize propagation delays when a page is updated, but will also minimize the benefit of caching. A longer cache age means that more pages will already be in the cache when a user requests them, which means quicker response times for the user and reduced impact on our infrastructure. We hope to increase the default cache age over time, with an eventual target of 24 hours for static content such as images and pages that are rarely modified. Pages that are modified more often could be considered dynamic content. What is dynamic contentIn this discussion of our caching service, dynamic content is information in a web page that is updated on a regular basis more often than once in 5 minutes. On www.upenn.edu, there are four mechanisms for automatically providing dynamic content to a page and the links below lead to specific instructions for overriding the cache for each mechanism. You can use any of the above mechanisms without having the content of the page actually change. If you are using these mechanisms and the content of your page doesn't change, you can use the default caching service and you don't need to do anything different when maintaining your pages. If you are using any of the above mechanisms and the content of your pages changes more often then the default caching period, you need to override the default cache. How do I override the default cacheIn order to benefit from the caching service, we strongly recommend that you do not override the default caching of pages on www.upenn.edu unless you have dynamic content that changes more often than the default cache or if you are certain that you can cache your information for a longer period. If it's necessary to override the default caching period, you can do so by adding the HTTP header, Cache-control, to your pages that redefines the cache period. Please note that if your page is already cached, any of the following changes will not be applied to your page until the current cache expires.
Please note that you cannot add the HTTP Cache-control header using a <META> tag within your HTML page. <META> tags are a browser device and are ignored by the caching servers. How do I delete the cacheThere is no mechanism that allows a user to delete the cache on demand. If you have an emergency and must delete the cache for your page(s), please send mail to webtech@isc.upenn.edu. Please note that deleting the cache is a very intensive operation since that request has to propagate through the many caching servers. The deletion of the caching for a page can take as long as 15 minutes. How do I upload my pages to www.upenn.eduwww.upenn.edu providers use FTP to upload data to the server. Since the hostname www.upenn.edu now points to the Akamai servers, you can no longer FTP to www.upenn.edu. You must change the configuration for your FTP client to point to origin.www.upenn.eduAdditional information on uploading data is available. How do I preview changes before they are written to cacheIf you would like to preview your modifications to your page before they are written to cache, you can point your browser directly to http://origin.www.upenn.edu/adding the path to your page. This will bypass the caching servers and your browser will be negotiating directly with the server that is housing your real content. If your page on www.upenn.edu is http://www.upenn.edu/almanac/between/between.htmlyou can preview changes to this page by pointing your browser to http://origin.www.upenn.edu/almanac/between/between.html Please do not create links in your pages that go directly to origin.www.upenn.edu. Linking directly to origin.www.upenn.edu defeats the purpose of caching. Do I have to change my links to www.upenn.eduYou do not have to change your links and in order to take advantage of the caching, you shouldn't change your links. You should continue to use www.upenn.edu when linking to pages on the Penn server. The one exception to this rule is linking to the Altavista search indexes.
An easy way to find links that you may have to any of these search indexes is to use the Advanced Search. To find all pages on www.sas.upenn.edu that are linking to the old URL for the Computing Web search, go to the Advanced Search and enter the search term host:www.sas.upenn.edu link:www.upenn.edu:9000 Why do I get the error message, "Invalid URL" when requesting a page from www.upenn.eduYour browser may have been configured so that you could have requested a page from www.upenn.edu without having to fully qualify the domain name. Instead of typing http://www.upenn.edu/almanac/you may be accustomed to typing http://www/almanac/ Since www.upenn.edu now redirects to the Akamai caching service gateway and that gateway must know the actual hostname of the server that you're trying to reach, you must specify the full URL when requesting www.upenn.edu pages with your browser. You may also receive this error message if you are using a very old browser that doesn't support the newer HTTP/1.1 protocol. Why can't people authenticate for pages with their PennKeysIf you are using the Apache/Websec module to restrict access to your web pages on www.upenn.edu which requires that a user authenticate first with his/her PennKey before being able to view your pages, you must turn off caching for your pages and turn off IP-checking. Since pages are being served by the Akamai servers, IP-checking will not work. To restrict access to your pages and require that a user authenticate first with a PennKey, create a file called .htaccess in the directory to be restricted and enter the following into that file
For more information, please see our full documentation on restricting pages with a PennKey. How does this affect my www statistics reportSince users actually go to the many Akamai caching servers to get your pages, the statistics on how your pages are being used are actually housed on the Akamai servers. Daily we will be pulling down those statistics so that we can continue to provide you with your web statistics but there will be a delay of up to two days in reporting. The checking of links within your pages will continue to run our server and there should be no delay in that report. Are other virtual hosts also cachedOur contract with Akamai is not currently sized to accomodate the volume of traffice associated with virtual hosts that may also have space on www.upenn.edu. Once the service matures, we will be happy to discuss how to extend the service to virtual hosts, and what the associated costs would be. |
![]() |