Primary Audiences
CacheFu, and caching in general, can be quite complex--for those who want or need to understand this complexity. Rather than trying to teach everything to everybody, this section teaches the concepts for different audiences. People building sites in Plone need to know less than people who build complex products for others to use; people who deploy ordinary sites need to know less than people who deploy enormous sites with thousands of users or more.
End Users
End users of the site (the people who use the site, but don't create content or design the skins, etc.) need to know nothing about CacheFu, nor do they need to make any changes in their browsers. CacheFu already sends out the right commands (caching headers, as they're called) to say things like "don't cache this, even if the browser normally wants to cache things".
Tip
With the standard install, there's only one case where users (only anonymous users) might see stale content. When a user is looking at a page (say, news-items/project-a, Plone will normally show navigation on the left-hand side that might include "sibling items"--other items in the news-items folder. If you are using Squid, it will serve cached pages to anonymous users, and the cached copy of project-a might show navigation that doesn't include the newly-created project-b news item. This only happens with anonymous users, and only when Squid is involved (because CacheFu tries to cache things most aggressively in these cases).
This case "times out" after an hour--even if nothing else changes, CacheFu has told Squid to not serve project-a for more than an hour without re-checking, and so it will, at the very latest, pick up the new project-a page showing project-b in the left-hand navigation within an hour.
If you can't tolerate this experience, you could reduce the window of time to less than an hour, or, if you absolutely need 100% up-to-date navigation for sibling items, you can turn off Squid caching for this case, (covered later).
For logged-in users, there are no stale content opportunities with the standard settings: non-anonymous users never have their content pages cached in Squid by CacheFu's default settings.
Content Managers
For the people who edit content, there's nothing they need to know, except for the warning about stale sibling items in "End Users", above. The content managers themselves, won't experience this, but the users who use their content might.
If a content creator sends a link to newly-created content, everyone will be able to get to it (assuming, of course, they have the right permission to do so). However, should a content manager send out a link to news-items/project-a, that page might not show the even-more-newly-created project-b in the navigation for anonymous users, as described above.
ZMI Customizers
This audience makes up the bulk of people who build Plone sites and customize them. It includes people who do things like:
- customize templates and CSS
- write Python Scripts and use External Methods
- make setting/configuration changes, like changing how the navigation displays, etc.
It's important for you to understand that you don't have to change anything to get many of the benefits of CacheFu, but, as you write new portlets, or skins, you may need to adjust the settings in the product to continue to not get stale pages.
First, let's look at the general configuration for CacheFu.
Cache Configuration Tool
Most of the settings for CacheFu's technologies are set in the "Cache Configuration Tool"; found in Site Setup -> Cache Configuration Tool.
This tool has five tabs:
- "Cache configuration tool". This is where broad configurations to the behavior are made.
- "Caching rules". This tab allows you to determine what kind of caching is chosen for different situations.
- "Caching header sets". This tab allows you to decide exactly how the caching rules get carried out.
- "Page cache". This allows you to clear the in-memory storage of some kinds of cached content.
All of these are described below in more detail.
Cache Configuration Tool Tab
- Cache configuration
CacheFu can be used on a site that is "Zope only". This means that no proxy server (either Apache or Squid or anything else) is sitting in front of Zope.
In this case, things can be cached in Zope, or in a web browser.
If, however, you put your Zope behind Apache (or any other non-Squid caching proxy), CacheFu can send out headers to tell the proxy server to cache JavaScript, CSS, etc.
If you're behind Squid, CacheFu can send out headers to cache those things as well. Plus, since Squid support "purge requests" (to let go of cached content and get it fresh), CacheFu can also send out headers to tell Squid to cache many more pages, since it can selectively clear those from Squid's cache as needed.
In some cases, your organization may want or need to use Apache (as a very full-featured web server, it can do things that Squid can't, and has dozens of add-on products). However, you may still want to get the benefit of Squid. This is a case to use the "Zope behind Squid behind Apache" setting: the public talks to your Apache server (which might also handle things like PHP applications). Requests for anything on the Plone site are delegated to Squid, which can either respond itself or further delegate to the actual Zope/Plone server.
If you are going to run with Squid, either by itself or behind Apache, be sure to read the "Setup with Squid" section, below, for important information on how to configure Squid.
If you are going to run with Apache, you will need to have a few settings in your httpd.conf virtual host block -- see below.
If you are running your Zope site by itself, there are no special configurations required.
Site Domains
In order for CacheFu to be able to tell Squid to purge a cached page, CacheFu needs to know the domains that Squid might have that page under.
For example, if you're serving www.example.com, a cached page, about-us could be at http://www.example.com/about-us. Depending on your Squid (or Squid+Apache settings), people might also be able to visit http://example.com/about-us (note the missing "www") and find the same page.
CacheFu will need to tell Squid separately to purge both of those URLs; while CacheFu understands that this is the same page, Squid has no idea that these two pages are the same thing, and must be told separately.
Therefore, you'll want to list all of the domain names that your site is reachable at. Be sure put the port number at the end, even for port 80, and don't forget to include https://example.com:443 and https://www.example.com:443 if you run on HTTPS, too.
Please note that port numbers here are the ones that the public visits, not where your Zope instance is really speaking. Most Zope servers serve content themselves on port 8080 (8282 is common on Mac OSX); however, as far as Squid is concerned that content was asked for on port 80, since that was the original request.
If you're not using Squid, you can keep this empty. Values entered here are only meaningful if you chose "Squid" or "Squid behind apache" above.
Squid URLs
If you're running Squid and Zope alone, Squid normally answers web requests from the outside world on port 80. Therefore, CacheFu knows how to reach Squid (on port 80) to send purge requests. You can leave this blank.
In some cases, you're not running Squid on port 80. Most likely this is because you're running Squid behind Apache, and you run Apache on port 80 so it is the first server to handle the request). In this case, you need to tell CacheFu how to reach Squid to send purge requests. Normally, this will be http://127.0.0.1:3128, the address and port number to reach Squid on the local box.
If you are running Squid on a different box, or on a different port number, you'll want to enter that instead. If you have several Squid instances, you'll want to list all of them so that each purge request can be sent to each one.
Compression
Separate from caching, CacheFu can also compress web pages.
This compression is a standard part of the HTTP/web technologies: pages can be sent compressed with "gzip compression", and most web browser can receive the compressed page, uncompress them, and render them from the user.
You can choose to "Never" do this, which is the safest option--you won't have any browser incompatibility issues to worry about.
You can choose to "Always" do this. This is an unusual choice, and probably not correct--some browsers can't deal with gzipped pages, and always sending them will not allow people with these browsers to use your site.
You can have CacheFu decide on a case-by-case basis depending on what the web browser sent for the "Accept-Encoding" header. This header is sent by web browsers to indicate what kind of content that can receive back. If the browser indicated that it can receive gzipped content, this will send it gzipped.
You can have CacheFu decide based on both the "Accept-Encoding" header and the "User-Agent" header. "User agent" refers to which web browser the browser says that is. If this option is selected, CacheFu looks both to see if the web browser says if can handle gzipped content, and, just to be safe, it checks that it is a browser that CacheFu knows can do this successfully (some early versions of Netscape are buggy for this!). Checking the user agent comes at a high cost: squid will need to cache separate versions of each of your pages for every single browser / operating system combination, which will make cache hits much less likely and will increase the disk space required by squid by a factor of 20-100! Using Accept-Encoding is the recommended practice, since the buggy browsers are rarely in use.
Vary Header
In order to not serve stale content, cache systems (either proxy caches like Squid or in-browser caches) need to know more than just the URL of the object. Returning a cached copy of http://example.com/about-us, for example, might be wrong if the user prefers to speak Greek, and the cached copy is the English version.
Therefore, in this field, you can list all of the values that are understood by browser/cache proxies for "varying" these results.
For example, if the Vary header is just "Accept-Language", and a request comes for http://example.com/about-us, your cache program will cache it while keeping track of the Accept-Language value it received from the browser. Then, when another request comes, it will make sure to hand back the cached copy only if the Accept-Language value for the new request is the same as the old. Otherwise, it will cache this second copy, and hand that back only for the same URL and Accept-Language header.
If your site has multilingual capabilities, you'll want "Accept-Language" in here. This will make sure you don't return the English copy for the Greek speaker.
If you don't have any multilingual content, and you don't even want the standard Plone templates to be returned in other languages (which you can prevent by removing PlacelessTranslationService from the products), you should remove this. Keeping this value in would mean that your caching systems are keeping a separate copy of a page for Accept-Language=en (English speakers) and a different copy for Accept-Language=gr (Greek speakers) even though the pages themselves don't vary based on language. Should you fail to remove this, you will use up more memory for caching (and have an occasional request go to Zope that could have been answered from caching) more often than is necessary.
If you allow compression (see above), you'll want to add "Accept-Encoding" here. This will ensure that you don't consider a gzipped version of a page and a non-gzipped-version to be the same thing. In other words, we'll cache two copies of each page: one for people who say that can accept gzip, and others for those who can't.
If you have a multilingual site and allow gzipping, you'll want to leave both in.
Caching Rules
Caching rules are part of the core concepts of CacheFu. They are a set of rules which are analyzed for each web request that gets to Zope, and, if the rule matches the request, the caching rule puts its behavior into effect.
For example, if a web browser gets some CSS, and this request gets to Zope (i.e., it isn't answered by Squid or other places), the request goes through the rules here, in order (top to bottom) until it matches one rule. That rule that matches can do things like:
- Cache the page in memory in the ZODB
- Request that headers be sent out with the response. These headers are interpreted by browsers and cache proxies to say things like "keep this in your cache for 1 hour" or "never, ever cache this" (for some details on headers, see the next section).
Since we take the first rule that matches, only one can ever apply to any given request. If no rule matches, no particular action is taken, and the content is still rendered and returned normally.
Let's walk through the rules that CacheFu ships with. Understanding them will help you understand what CacheFu does.
- HTTPCache
This cache is never matched (FIXME: not sure what this means?) out-of-the-box for CacheFu, but, if you customize things, can be quite useful.
It is used if there are places where you have pages (or images, such as your site's logo) that can be entirely cached and do not require any invalidation. Consider a site with a page that _always_ looks the same, regardless of whether you're logged in or not, and which doesn't need to be purged when any content changes. A good example might be a page like that pops up with static help about your site, and doesn't show content, rely on content, or change based on your login status. For this example, that page would be a Page Template called site-help.
To tell CacheFu that this page can be cached like this, you'd need to associate it with the "HTTPCache" cache manager. For example, we could go to our site-help template and, under the Cache tab, associate it with the HTTPCache.
Traditionally, to do this in Zope, you'd be associating the PageTemplate with a cache called "HTTPCache" which is an "Accelerated HTTPCache Manager", and that manager sets the headers itself. In CacheFu, however, HTTPCache doesn't do anything itself--it's essentially a "marker" to indicate that a piece of content can be cached like this, and therefore, it's picked up by this rule.
Since no content types are selected in the box below, all types of content work can meet this rule (assuming, of course, they're associated with the HTTPCache).
Several options aren't used here (& will be discussed where they are used, later).
The two boxes for headings for anonymous users and headings for authenticated users are the primary "outcomes" for rules, and this is the case here: content that meets this rule is, by default, matched to the cache-in-browser-for-24-hours rule, both for anonymous and logged-in users. If you wanted to cache this stuff just for 1 hour, you could change to the cache-in-browser-for-1-hour rule. If you wanted a different possibility (say, cache-in-browser-for-10-mins), you'd have to create this in the caching header sets tab, described below.
The last choice, "Last-Modified Expression" is the expression that will be used to decide when this content was last modified; this is used so that CacheFu only caches it for 24 hours beyond that (or one hour, or however long you chose).
- Content
This rule is used to cache displays of normal (non-folder, non-image content), like a News Item.
By default, the normal Plone content types are cached by this; you don't have to associate them with any cache manager, like we did above.
The content types that are matched by this are selected in the content types box. These are those content types that aren't File or Image (those are handled separately) or folderish thing (those are also handled separately).
Of course, we don't really want to cache the news item itself-- that's not sent to web browsers. Instead, we want to cache the HTML view of a news item.
The "Default view" box, when checked, means that this rule will be in effect when a request is made for a skin object (Page Template, Python Script, etc.) for the content types listed if that skin object is the default view for that content type. So, for example, it will catch:
http://news-items/project-a
and
http://news-items/project-a/newsitem_view
since newsitem_view is the default view for news items. It won't catch
http://news-items/project-a/special_newsitem_view
Should you want to cache additional views, like special_newsitem_view, you should add that to the "Templates" field, which is for the ids of additional skins that should be cached.
"Cache Templates" in memory means that, in addition to whatever else kind of outcome we would have for things that match this rule, we should cache the template results in memory. This helps so that, even if you have no proxy cache, CacheFu can still work with the in-memory cache.
Note that
http://news-item/project-a
and
http://news-item/project-a?form_var=1
are different requests, and, as such, will be cached separately.
"Cache Preventing Request Values" is used to specify those form variables (or other things in the request object) that, if present, should signal that this request should not be cached. By default, "portal_status_message" is listed here, plus "statusmessages", which is the name of the cookie that's getting used with recent 2.5 and 3.0 plone versions. "portal_status_message" is the request variable used to hold the feedback messages (usually shown in bright orange, at the top of pages). Since the same template can be shown with dozens of these feedback messages ("Changes saved", "Content added", "Email sent", etc.), it would be expensive to cache each copy individually, and probably not that helpful, since not many people would want to see the exact same page, anyway. Therefore, if there is an status message being sent, CacheFu won't cache this page.
"Predicate" lets us add any arbitrary TALES expression as a further check for whether this rule matches. A request would have a) be skinning a content type listed in the content types, b) be for a default view (if that's checked) or be an explicitly-named template, c) not have a portal_status_message (or statusmessages cookie) and d) pass this expression to match.
"Header Set for Anonymous Users" is set to have the proxy cache (Squid, usually) receive the message to cache this for 1h. For logged-in users, it's cached with an ETag, a mechanism that allows it to be cached in ways that cause changes to request a fresh copy.
Tip
ETags
ETags are a mechanism that is a HTTP/web standard for web browsers and servers to pass around some information that represents "other information" about a request. This information can include things like: the kind of user that requested it, the date it was last changed, etc.
So, if you request
http://news-items/project-a
the server might give this back to you with an ETag of
ETag: joelburton-2006/12/25-en
Which means that this request was for Joel Burton, the content was modified on Christmas, 2006, and it was in English. The cache program (be it in the browser of cache proxy) will store this ETag along with the content. (Note: this isn't the exact format that ETags appear in, but the concept is accurate).
Now, the browser wants to re-request this page from Zope, it will get that old ETag and give it to the Zope. Zope will look at it and compare it with the current ETag it would create. If they are the same, Zope will return a HTTP response that "nothing has changed, feel free to use your current copy". If they differ, it will return the new page.
So, for example, if Joel re-visits http://news-items/project-a when he is logged out, his browser will hand Zope the ETag it has from his last visit, joelburton-2006/12/25-en. Zope looks at the logged in user (now Anonymous) and creates a new ETag:
ETag: anonymous-2006/12/25-en
(requested by Anonymous, content last modified on Christmas, prefer it in English). Since this is different than the ETag that the proxy cache submitted, Zope will return the content itself, since anonymous users might need a different page (without Joel's name, for instance, or with different things showing because of security).
Squid cannot handle requests with ETags, since it has no way of knowing who is logged in, when content was created, and so on, so all of these requests get routed around squid.
The "ETag Components" choice allows you to select what things should make up the ETag; as you choose things here, if any of these things changes, it would cause the ETag to change, and cause the page to be recalculated.
The default settings are sane but cautious: it causes a re-examination if there are changes to the user who requested the page, the current skinpath (i.e.,, Plone Default, Plone Tableless, or a custom skinpath), whether they want gzipped content or not, and the time of the last catalog change.
The list is an interesting idea. If you edit a news item, of course, we'd want CacheFu to stop returning the old copy of that news item view. However, if you edit the title of the folder that news item was in, we'd also want CacheFu to stop returning the old copy of that news item view, since the old title of the parent folder would appear in the navigation portlet, breadcrumbs, etc.
By having the date/time of the last catalog change factor into the ETag, any change to any object will cause CacheFu to re-create the content pages. This guarantees fresh navigation, but at the expense of re-creating things where staleness might have never been an issue, or where it might have been acceptable.
This ETag setting is just for logged-in users; for anonymous users, remember, the page is cached in the proxy for an hour regardless of ETags, and, as such, any change to the site won't cause everything to be unusable from cache.
If you don't care about this, and desire more performance, you could opt to change the "Time of last catalog change" to "Context modification time". This would mean that editing anything won't invalidate all ETags, but it does mean that an old title for a sibling might appear in the navigation tree or breadcrumbs, or such. This can significantly speed up performance, and may be worth considering.
In addition to the checkbox ETag choices, you also can indicate which request values should be factored into the ETag. The default is month, year, orig_query. This is used for the calendar portlet--you might have changed nothing on the site since this page was requested a while ago but this request is for a user who has clicked the next-month button on the calendar, and, as such, needs a different response: the same piece of content, but with the calendar portlet showing the next month.
If you add other form variables to your view templates and changes to these should cause the page to be cached differently, you should add them here.
Note the difference between the "Cache Preventing Request Values", above (which held "portal_status_message") and this. The cache-preventing-values piece says that if this value exists, don't match the rule (and therefore, don't send out the caching headers, etc., for this rule). The "ETag request values" field says you can cache it if this value exists, but consider it a different request.
"ETag Timeout" is a failsafe for ETags. This is the number of seconds that an ETag should ever last. At 3600 (the default here), it means that even if the ETag comparison suggests nothing has changed, that Zope should generate a new page anyway. It's good to have this--this way, if you forget to have your caching predicated on a certain value, you still won't hold onto stale content views for more than one hour.
"ETag Expression" allows you calculate anything as an additional ETag value.
"Purge Expression" is a script that can handle the purging in Squid of content views once changed. You shouldn't have to edit this.
Containers
This rule is used for folderish content types (Folder and Large Plone Folder). It is similar to the Content rule.
The difference is that, for anonymous users, the Content rule caches the view of the content object until the object changes, whereas, that would provide too much stale content for views of folderish objects. The view of a folder traditionally is a list of the child objects, so when a folder gets a new document added to it, or has one of its child documents edited, the view of the entire folder needs to be changed.
It's difficult to decide if any child object was changed; instead, CacheFu relies on the quick (but much more general) rule of: has any object in the entire catalog been changed? If so, the view for a Folderish object must be regenerated.
Therefore, these are never cached in Squid or Apache (it would be difficult to purge every possible view of every folderish content item on every catalog change!) and are cached in Zope, using the same ETag strategy as for the Content rule, above. Essentially, both anonymous and logged-in users who view container views are treated the same as logged-in users viewing content with the Content rule.
Templates
This rule is for "templates"--skin objects like Page Templates that are not presenting a piece of content, and are not forms. A good example is the "accessibility-info" template that comes with Plone; this isn't content, is not a form, but is a page that could be cached until it changes.
The "templates" field is a list of those templates that match this rule. As you add new templates that could be matched by the rule, you can list them here.
These are cached with ETags, using the same strategy as for Content and Containers rules, above.
Similar to the Container rule strategy, the cache is cleared for these every time the catalog changes. Almost all template rely on the main template, which relies on a dozen other templates, and it would be very difficult to track all of these dependencies. Therefore, any change to anything in Zope might affect what these template should show, and, as such, clears them.
This is safe and conservative. Some sites may decide that it is acceptable, for anonymous users, to cache these Templates in a proxy server, so that they are cached for an hour (or 24 hours, or more). Changes to the template itself, or to the main_template or other template won't appear, but these sort of changes might be very rare on a production site, and, if they happened, the sysadmin could restart the proxy server to clear it's entire cache.
Tip
Why Not Forms?
So, why aren't forms cachable with the Template rule?
Forms are use in Plone both to show the form, and also to re-present the form with validation errors; if the form was cached and used from cache, it might have validation errors and already-entered data from a previous user.
FIXME: geoffd, couldn't this be fixed by adding form.submitted to the 'Cache preventing request values' box? Or are there other reasons why forms shouldn't be cached? [GD: yes, probably, but I don't think it would be a huge win]
XXXX END OF WRITTEN PART
- More likely to use Apache than Squid, or to use nothing at all
- But everything should "work" w/o Squid, even if things are cache-in-mem rather than cache-in-squid when they could be
- Should know to associate w/HTTP Cache for fine-to-cache full pages (ie, homepage if w/o personalization & portlets)
- Cloned content types that are non-folderish should be added to content rules
- New views for content (ie, news_item_short) that aren't on the display menu choices should be added to content tab
- If add portlets that use form vars (ie, show-more-news-in-portlet or show-less-news-in-portlet), add this formvar to "etag request item" in all places
- Cloned folderish types should get added to the containers tab
- New views for folderish things that aren't on display menu choice should be added to containers tab
- New templates that aren't contentish should be added to "templates" as long as it doesn't depend on things like clock
- if template uses formvars, add them to cache_preventing_Req_items, unless they're commonly-shared (worse case is that we cache too many choices)
- Nothing to know about for CSS/JS
- If we clone new File/Image types, add to file/image
- Know when to clear page cache [XXX: when is that? GD: for debugging]
- Might be helpful for them to understand the header-settings, but only the most basic (ie, cache-for-1h-not-1d, etc.)
- XXX: should we ship additional header settings (ie, ship with "cache-in-mem-for-1h" v "cache-in-member-for-1d" (both of which could still be cleared, of course, but would have diff max cache lengths)
- RAMCaches still work exactly the same, so you can still do things like cache production of news-item-portlet-search
- Nothing here to clear those RAM caches in advance, or in any smart way
- RelDB caches still work the same
- Nothing here to clear those reldb caches in advance
Intermediate Developers
- [These are people who are building Archetypes, building web apps, but not neccessarily hard-core-geeks]
- FIXME
Advanced Developers
- [These are people who are seriously customizing Plone, and will learn more knobs to get more power]
- FIXME
System Admins
- [People who administer the Squid/Apache stuff, but aren't necc Zope/Plone people]
- FIXME