1 | |
---|
2 | [August 15, 2007 - archived for future reference - newbery] |
---|
3 | ___________________________________________________________ |
---|
4 | |
---|
5 | |
---|
6 | |
---|
7 | See CacheSetup/docs/audiences.rest for more documentation |
---|
8 | |
---|
9 | Tested with Plone 2.5, Plone 2.1.3, Plone 2.1.2, Plone 2.0.5 and CMF 1.4.8/1.5/1.6 |
---|
10 | |
---|
11 | IMPORTANT: If you have a pre-beta CacheFu installation, you MUST |
---|
12 | uninstall it before putting any of the CacheFu code into your Products |
---|
13 | directory. |
---|
14 | |
---|
15 | If you have a 1.0 beta CacheFu installed, you will need to uninstall |
---|
16 | then install. DON'T REINSTALL - reinstalling will not trigger updates |
---|
17 | of the header sets and you will be stuck with a bunch of IE problems. |
---|
18 | |
---|
19 | Quick start: |
---|
20 | ------------ |
---|
21 | |
---|
22 | * Make sure you are using Plone 2.0, 2.1, or 2.5 |
---|
23 | |
---|
24 | * IMPORTANT: Uninstall any previous version of CacheFu you have installed |
---|
25 | |
---|
26 | * Stop Zope |
---|
27 | |
---|
28 | * Copy CacheSetup, PageCacheManager, CMFSquidTool, and |
---|
29 | PolicyHTTPCacheManager into your Products directory |
---|
30 | (You do not need MemcachedManager) |
---|
31 | |
---|
32 | * Start Zope |
---|
33 | |
---|
34 | * Use the QuickInstaller to install CacheSetup (you do not need to |
---|
35 | install any other products) |
---|
36 | |
---|
37 | * As a manager, go to Site Setup, select Cache Configuration Tool, |
---|
38 | and then indicate your site's configuration |
---|
39 | |
---|
40 | If you are using squid: |
---|
41 | |
---|
42 | * Change the settings in squid/squid.cfg to reflect your system. Run |
---|
43 | squid/makeconfig to generate your squid/apache files |
---|
44 | |
---|
45 | Alternatively, you can use Nic Benders' makefile to generate the squid |
---|
46 | files -- you will need to edit the file to set correct ZOPELIB and |
---|
47 | PYTHON variables. |
---|
48 | |
---|
49 | ------------ |
---|
50 | |
---|
51 | See CacheSetup/docs/audiences.rest for more documentation. There is |
---|
52 | also some older (deprecated) documentation below that should give you |
---|
53 | an idea of how things work internally. |
---|
54 | |
---|
55 | There are tips below for getting squid to work. Please remember that |
---|
56 | I do not have time to help everyone set up squid. |
---|
57 | |
---|
58 | Geoff |
---|
59 | |
---|
60 | |
---|
61 | |
---|
62 | |
---|
63 | Configuring Squid |
---|
64 | ----------------- |
---|
65 | |
---|
66 | The makeconfig file in the squid directory generates a basic squid |
---|
67 | configuration file, squid.conf. There are also several squid helper |
---|
68 | applications included: |
---|
69 | |
---|
70 | - iRedirector.py |
---|
71 | |
---|
72 | - squidRewriteRules.py |
---|
73 | |
---|
74 | - squidAcl.py |
---|
75 | |
---|
76 | The file 'makeconfig' generates a deploy script that will copy the |
---|
77 | files to the appropriate locations and set permissions on them. |
---|
78 | |
---|
79 | 'iRedirector.py' and 'squidRewriteRules.py' together form a squid |
---|
80 | redirector. These files are used by squid to rewrite URLs when |
---|
81 | handing off to Zope. The files were originally written by Simon |
---|
82 | Eisenmann (longsleep) and modified to have a more mod_rewrite-like |
---|
83 | syntax by Florian Schulze (fschulze). |
---|
84 | |
---|
85 | 'squidAcl.py' is used to test whether a request is being made by an |
---|
86 | authenticated user (in which case we should not cache the response) or |
---|
87 | if the request is a conditional GET with an If-None-Match header |
---|
88 | (also should not be served up by squid). |
---|
89 | |
---|
90 | Squid Debugging Tips |
---|
91 | -------------------- |
---|
92 | |
---|
93 | Buy and read "The Definitive Guide to Squid":http://www.amazon.com/gp/product/0596001622&tag=phdsorgsciencmat |
---|
94 | |
---|
95 | Squid's logs are a good place to start looking if you have problems |
---|
96 | with it. |
---|
97 | |
---|
98 | Make sure you initialize the squid cache the first time you run squid |
---|
99 | (use 'squid -z'). |
---|
100 | |
---|
101 | While debugging, it's a good idea to run squid from the command line |
---|
102 | and tell it to echo problems to the console. Start squid using |
---|
103 | '/usr/sbin/squid -d1' |
---|
104 | |
---|
105 | To stop squid from the command line, use '/usr/sbin/squid -k kill', |
---|
106 | and to reconfigure squid after you have modified 'squid.conf', use |
---|
107 | '/usr/sbin/squid -k reconfigure'. |
---|
108 | |
---|
109 | If squid won't start, check '/var/log/squid/cache.log' to see why. |
---|
110 | |
---|
111 | If squid blocks your access to a particular page, uncomment the line |
---|
112 | 'debug_options ALL, 1 33,2' in squid.conf, reconfigure squid, then |
---|
113 | look at '/var/log/squid/cache.log'. |
---|
114 | |
---|
115 | The redirector and external ACL python scripts can all log their |
---|
116 | activity. Set 'debug=1' in each of the .py files to see what they are |
---|
117 | up to. |
---|
118 | |
---|
119 | Use the LiveHTTPHeaders FireFox extension to see if you are getting |
---|
120 | cache hits for your pages. If there is a cache hit, you will see |
---|
121 | 'X-Cache: HIT from yourserver'; otherwise you will see 'X-Cache: MISS |
---|
122 | from yourserver'. If you are getting misses, try clearing your |
---|
123 | browser cache. |
---|
124 | |
---|
125 | Note that a MISS can mean either that squid tried to retrieve your |
---|
126 | page from the cache but did not find it or that squid has been |
---|
127 | disallowed from responding to the type of request you made (e.g. you |
---|
128 | were authenticated or making a conditional GET with an |
---|
129 | 'If-None-Match' request). |
---|
130 | |
---|
131 | |
---|
132 | |
---|
133 | |
---|
134 | |
---|
135 | |
---|
136 | |
---|
137 | |
---|
138 | |
---|
139 | |
---|
140 | |
---|
141 | |
---|
142 | THE TEXT BELOW IS OUT OF DATE BUT SHOULD GIVE SOME INSIGHT INTO HOW |
---|
143 | CACHE-FU WORKS. |
---|
144 | |
---|
145 | Caching Dynamic Content in the CMF |
---|
146 | ================================== |
---|
147 | |
---|
148 | Geoff Davis - September 19, 2005 - geoff at geoffdavis dot net |
---|
149 | |
---|
150 | |
---|
151 | Overview |
---|
152 | -------- |
---|
153 | |
---|
154 | Proxy caching is a way to dramatically speed up a web site. The |
---|
155 | products and tutorial below will show you how to set up your site in |
---|
156 | such a way that: |
---|
157 | |
---|
158 | 1. Users to not receive stale pages when your site's content changes |
---|
159 | |
---|
160 | 2. The load on Zope is minimized, and |
---|
161 | |
---|
162 | 3. Overall bandwidth is reduced. |
---|
163 | |
---|
164 | Who This Is For |
---|
165 | --------------- |
---|
166 | |
---|
167 | This strategy is a bit complicated to implement and is still fairly |
---|
168 | new. If you are a newbie, this is not for you (though you may find |
---|
169 | the background section below useful). You should have experience |
---|
170 | setting up Zope behind Apache, and ideally should have some experience |
---|
171 | with Squid. |
---|
172 | |
---|
173 | Background |
---|
174 | ---------- |
---|
175 | |
---|
176 | To achieve our goals, we will be working with 2 basic properties of |
---|
177 | the HTTP protocol: caching headers and conditional GETs. |
---|
178 | |
---|
179 | HTTP headers and caching |
---|
180 | ------------------------ |
---|
181 | |
---|
182 | HTTP headers are used to control when and where web pages are cached. |
---|
183 | The header that matters most is the Cache-Control header, which |
---|
184 | consists of a list of cache parameters. The main values we will be |
---|
185 | working with are as follows: |
---|
186 | |
---|
187 | max-age -- This tells browsers how long they can cache content before |
---|
188 | they have to check back with the server to make sure the content is |
---|
189 | up-to-date. For static content that is unlikely to change very often |
---|
190 | (images, css, javascript), we will typically set a long max-age so |
---|
191 | that browsers will store these and neither re-request them nor check |
---|
192 | to see if their copies are current very often. |
---|
193 | |
---|
194 | s-maxage -- This tells proxy caches how long they can cache content. |
---|
195 | If no value is specified, the proxy cache will use max-age. For |
---|
196 | dynamic content (views of documents, etc), we will send browsers that |
---|
197 | the content is immediately stale (max-age = 0) and will tell our proxy |
---|
198 | cache (Squid) to hold on to the content for awhile (e.g. s-maxage = |
---|
199 | 7200). When we change our documents, we will tell Squid to remove the |
---|
200 | old, cached versions. |
---|
201 | |
---|
202 | must-revalidate -- This tells browsers that they must check back with |
---|
203 | the server to see if pages are up to date before serving anything in |
---|
204 | their local cache that is stale. |
---|
205 | |
---|
206 | public -- This tells proxy caches that they can cache content even if |
---|
207 | they otherwise wouldn't be able to (i.e. if the user is |
---|
208 | authenticated). We'll indicate that static content (images, |
---|
209 | javascript, css, etc) is public. |
---|
210 | |
---|
211 | private -- This tells proxy caches not to cache content. We'll |
---|
212 | indicate that personalized pages and anything requiring authorization |
---|
213 | is private. |
---|
214 | |
---|
215 | Conditional GETs and browser caching |
---|
216 | ------------------------------------ |
---|
217 | |
---|
218 | When a browser first requests a page, it makes an HTTP request like the following:: |
---|
219 | |
---|
220 | GET /some/page/on/the/site |
---|
221 | |
---|
222 | When the server responds, it has the option to send back several |
---|
223 | useful pieces of information about the object, including the time at |
---|
224 | which the object was last modified (present in almost all requests) |
---|
225 | and an ETag (optional; included using a Caching Policy in the CMF). |
---|
226 | The browser can use these pieces of information in subsequent requests |
---|
227 | to see if the page it is currently holding in its cache is up to date. |
---|
228 | |
---|
229 | When re-visiting a web page, the browser first checks the |
---|
230 | Cache-Control header for a max-age parameter to see if the page it is |
---|
231 | holding has expired. If there is no max-age header, the browser then |
---|
232 | checks the Expires header for an expiration date. If the content has |
---|
233 | not yet expired, the browser serves up the page from its cache. If |
---|
234 | the page has expired, the browser sends a conditional GET instead of a |
---|
235 | regular GET request. The conditional GET looks like the following:: |
---|
236 | |
---|
237 | GET /some/page/on/the/site |
---|
238 | If-Modified-Since: [last-modified date for the page the browser currently has in cache] |
---|
239 | If-None-Match: [the ETag for the page the browser has in cache] |
---|
240 | |
---|
241 | The server has 2 options: First, it can respond as it usually would |
---|
242 | to a GET request by sending the page to the browser along with a |
---|
243 | 'Status: 200 (OK)' header. Alternatively, it can do something |
---|
244 | smarter: it can examine the date and ETag for the user's cached |
---|
245 | content, see if the user is holding the page that the server would |
---|
246 | serve anyway, and if s/he is, it can send an empty page with a Status: |
---|
247 | 304 (Not Modified) header. |
---|
248 | |
---|
249 | This new option is a win for all parties concerned: the server does |
---|
250 | not have to render the page, so the server load is reduced, nor does |
---|
251 | not have to send the full page, so bandwidth is reduced. The user, in |
---|
252 | turn, gets a much faster response from the server, and hence |
---|
253 | experiences a more responsive site. CMFCore has recently been |
---|
254 | modified to allow Page Templates to send 304s under the appropriate |
---|
255 | circumstances. |
---|
256 | |
---|
257 | Conditional Requests and the WinInet Cache (for Internet Explorer users) |
---|
258 | ------------------------------------------------------------------------ |
---|
259 | |
---|
260 | Internet Explorer takes advantage of the caching services provided by |
---|
261 | Microsoft Windows Internet Services (WinInet). WinInet allows the user |
---|
262 | to configure the size and behavior of the cache. The vast majority of |
---|
263 | users leave the setting at the default of Automatically, but we still |
---|
264 | have the "Every visit to the page", "Every time you start Internet |
---|
265 | Explorer" and "Never" options. |
---|
266 | |
---|
267 | The most important fact to keep in mind is that these four options |
---|
268 | mostly impact the behavior when there are no caching headers on the |
---|
269 | HTTP responses; when caching headers are present, Internet Explorer |
---|
270 | will always respect them (however, some bugs on Internet Explorer |
---|
271 | seams to create real troubles in practice). |
---|
272 | |
---|
273 | The Automatically setting bears some explanation. How can WinInet know |
---|
274 | if the cached resource is fresh when no caching directives were |
---|
275 | provided on the server's HTTP response? The answer is that WinInet |
---|
276 | can't know for sure and a Heuristic process is followed to make a |
---|
277 | "best guess" effort. In the Automatically state, the Heuristic will |
---|
278 | issue a conditional request unless all of the following criteria are |
---|
279 | met: |
---|
280 | |
---|
281 | The cached resource bears a Content-Type that begins with image/. |
---|
282 | |
---|
283 | The cached resource has a Last-Modified time. |
---|
284 | |
---|
285 | The URL to the cached resource does not contain a question mark. |
---|
286 | |
---|
287 | The cached resource has been conditionally requested at least once |
---|
288 | within the most recent 25 percent of its overall age in the cache |
---|
289 | (this one is evil to debug, isn't it :-) |
---|
290 | |
---|
291 | If all of the criteria above are met, no request is made. However, |
---|
292 | seams that there are bugs on Internet Explorer cache engine that seams |
---|
293 | to create a similar behavior, where the page is requested but an old |
---|
294 | copy of the page is used from the cache. |
---|
295 | |
---|
296 | There are also some situations where Internet Explore will ignore a |
---|
297 | cache and always make a new request. One example is the use if images |
---|
298 | which are inserted via innerHTML (see |
---|
299 | http://www.bazon.net/mishoo/articles.epl?art_id=958 for more |
---|
300 | information). |
---|
301 | |
---|
302 | IE 5.5+ introduces some proprietary Cache-Control tokens, pre-check |
---|
303 | and post-check, that let IE ignore the headers to some extent (see |
---|
304 | http://msdn.microsoft.com/workshop/author/perf/perftips.asp). |
---|
305 | Fortunately, this behavior can be turned off by setting Cache-Control: |
---|
306 | pre-check=0, post-check=0 (and It was added to CMF caching polices |
---|
307 | recently too). |
---|
308 | |
---|
309 | ETags |
---|
310 | ----- |
---|
311 | |
---|
312 | Most of the discussion about caching and Plone has made use of |
---|
313 | time-based caching. In time-based caching, the server sends |
---|
314 | 'Last-Modified', 'Expires', and 'Cache-Control: max-age' headers with |
---|
315 | content. Browsers serve up content until it expires, then do |
---|
316 | conditional GETs with an 'If-Modified-Since' header. This kind of |
---|
317 | caching enables browsers to cache content for a specified length of |
---|
318 | time. This kind of caching is unsuitable for any kind of content that |
---|
319 | might be personalized because the browser has no way of telling the |
---|
320 | server whether it has an anonymous view of content or a personalized |
---|
321 | one, nor can the browser distinguish between content personalized for |
---|
322 | different users. To cache personalized content, we need more |
---|
323 | information. |
---|
324 | |
---|
325 | An ETag is an arbitrary string that the server uses to determine |
---|
326 | whether or not content is fresh. An ETag should be designed to have |
---|
327 | the property that if the ETag for a cached view of an object matches |
---|
328 | the object's current ETag, then the view the server would generate for |
---|
329 | the object should be the same as the view in cache. |
---|
330 | |
---|
331 | We will use ETags to enable browsers to cache personalized content |
---|
332 | and to then handle it appropriately. The key to doing so is to use |
---|
333 | an ETag generator that serves up a different ETag any time the content |
---|
334 | in question changes. The kinds of changes we are concerned with are |
---|
335 | |
---|
336 | 1. The content changes |
---|
337 | |
---|
338 | 2. The user changes (e.g. sh/e logs in or out, or somebody new logs |
---|
339 | in and hence requires a different personalized view). |
---|
340 | |
---|
341 | The ETag we will use for content object views is a string consisting |
---|
342 | of the following: |
---|
343 | |
---|
344 | 'user name of the currently authenticated member' + delimiter + |
---|
345 | 'modification time in seconds for the content object being viewed' + |
---|
346 | delimiter + 'current time rounded to the nearest hour' |
---|
347 | |
---|
348 | The first part of the tag ensures that if the user logs out or |
---|
349 | changes, then the ETag will change. The second part of the tag |
---|
350 | ensures that if the content object changes, the ETag will change. The |
---|
351 | third part of the tag ensures that the tag will time out after an hour |
---|
352 | at most. |
---|
353 | |
---|
354 | To be thorough, we might also want to include things like a hash of |
---|
355 | the current query string, and / or a hash of the contents of |
---|
356 | REQUEST.form. Alternatively, we can simply arrange to not respond to |
---|
357 | a REQUEST with form variables from cache. |
---|
358 | |
---|
359 | Our Cache Strategy |
---|
360 | ------------------ |
---|
361 | |
---|
362 | Our general strategy is as follows: |
---|
363 | |
---|
364 | 1. We put Squid in front of Zope. Squid will handle all static |
---|
365 | content as well as initial requests for dynamic content from |
---|
366 | anonymous clients. |
---|
367 | |
---|
368 | 2. We set up caching policies for dynamic content in the CMF. The |
---|
369 | caching policies will set HTTP headers on our pages that ensure that |
---|
370 | |
---|
371 | a. Squid stores content for an appropriate amount of time, |
---|
372 | |
---|
373 | b. That browsers cache pages for an appropriate amount of time, |
---|
374 | |
---|
375 | c. That browsers check back with the server to make sure their |
---|
376 | cached content is fresh, and |
---|
377 | |
---|
378 | d. If their content is fresh, that they don't request an entire new |
---|
379 | page (thanks to recent improvements in CMFCore). |
---|
380 | |
---|
381 | 3. We cache dynamic content (views of documents, etc) for anonymous |
---|
382 | visitors in Squid. CMFSquidTool will be used to purge old views |
---|
383 | from Squid when content objects change. |
---|
384 | |
---|
385 | 4. We will also cache dynamic content in RAM using PageCacheManager. |
---|
386 | PageCacheManager will be used to ensure that conditional GETs from |
---|
387 | clients are handled rapidly when content changes. |
---|
388 | |
---|
389 | Here are the layers of cache we use and what each layer does: |
---|
390 | |
---|
391 | 1. Squid: Squid handles all static content (images, javascript, css, |
---|
392 | etc) and all initial requests for dynamic content by anonymous |
---|
393 | users. Squid will not serve up personalized content (we will ensure |
---|
394 | that such content is cached in the user's browser) nor will it handle |
---|
395 | some kinds of subsequent requests (Squid doesn't understand ETags). |
---|
396 | |
---|
397 | 2. PageCacheManager (optional): PageCacheManager will handle |
---|
398 | conditional GETs that use ETags and will cache pages from both |
---|
399 | anonymous and authenticated users. PageCacheManager's primary benefit |
---|
400 | is its efficient handling of conditional GETs from clients when a |
---|
401 | content object changes. |
---|
402 | |
---|
403 | 3. CMFCore: Recent modifications to CMFCore let Zope handle |
---|
404 | conditional GETs efficiently. When a user visits a page that s/he |
---|
405 | has in cache, Zope will return an empty page with a 304 status (Not |
---|
406 | Modified) in response if the content has not changed instead of the |
---|
407 | full page and a 200 (OK) status. |
---|
408 | |
---|
409 | 4. CMFSquidTool: CMFSquidTool hooks the reindex method on content |
---|
410 | objects. When an object changes, CMFSquidTool purges views of the |
---|
411 | content from Squid to make sure Squid is not holding on to stale |
---|
412 | content. |
---|
413 | |
---|
414 | The life cycle of a view of a content object is as follows: |
---|
415 | |
---|
416 | 1. An anonymous user requests an initial view of a document. If the |
---|
417 | document view is in Squid, Squid serves it up. If not, Squid |
---|
418 | hands off to Zope, Zope serves up the request, and Squid stores it for |
---|
419 | future requests. |
---|
420 | |
---|
421 | 2. The anonymous user re-visits a page. The user now has the page in |
---|
422 | her local browser cache, so the browser does a conditional GET: |
---|
423 | basically it asks the server for the page only if it differs from the |
---|
424 | page the browser has in cache. Squid can't handle this kind of |
---|
425 | conditional GET (since we use ETags), so the request gets handed to |
---|
426 | Zope. If PageCacheManager is installed, it will handle the request: |
---|
427 | if the object has not changed since the user last accessed the view, |
---|
428 | PageCacheManager will send a 304 (Not Modified) status header and an |
---|
429 | empty page; if the object has changed, PageCacheManager will serve the |
---|
430 | page from RAM if it has a copy or will regenerate the page and cache |
---|
431 | it. Alternatively, if PageCacheManager is not installed, the CMF will |
---|
432 | send a 304 if the object has not changed and will re-render and send |
---|
433 | the full page to the user if it has. |
---|
434 | |
---|
435 | 3. When an object changes, CMFSquidTool will purge old views of the |
---|
436 | object from Squid's cache. The object's ETag will change, too, so |
---|
437 | PageCacheManager will no longer serve the old, cached version, and CMF |
---|
438 | will no longer respond to conditional GETs for the old object with a |
---|
439 | 304. |
---|
440 | |
---|
441 | Tools Required |
---|
442 | -------------- |
---|
443 | |
---|
444 | You will need Squid, the CacheSetup product from the CacheFu project |
---|
445 | in svn collective, and CMF 1.4.9 (for Plone 2.0.x) or 1.5.5 (for Plone |
---|
446 | 2.1.x). |
---|
447 | |
---|
448 | If CMF 1.4.9 / 1.5.5 or Plone 2.1.2 are not yet released, grab CMF from svn: |
---|
449 | |
---|
450 | CMF 1.4: 'svn co svn://svn.zope.org/repos/main/CMF/branches/1.4' |
---|
451 | |
---|
452 | CMF 1.5: 'svn co svn://svn.zope.org/repos/main/CMF/branches/1.5' |
---|
453 | |
---|
454 | CacheFu contains a patched version of CMFSquidTool (don't use the one from Enfold). |
---|
455 | |
---|
456 | The LiveHTTPHeaders extension for FireFox |
---|
457 | ("http://livehttpheaders.mozdev.org":http://livehttpheaders.mozdev.org) |
---|
458 | is invaluable for diagnosing cache problems. The Fiddler tool for IE |
---|
459 | ("http://www.fiddlertool.com/fiddler/":http://www.fiddlertool.com/fiddler/) |
---|
460 | provides similar functionality for IE. |
---|
461 | |
---|
462 | Configuring Squid |
---|
463 | ----------------- |
---|
464 | |
---|
465 | CacheFu contains a basic Squid configuration file, squid.conf. The |
---|
466 | file is set up for two different types of configurations: (1) direct |
---|
467 | access to Squid, and (2) Squid behind apache. Files for these 2 |
---|
468 | different configurations are contained in the Squid_direct directory |
---|
469 | and the squid_behind_apache directory, respectively. Look at the |
---|
470 | directory appropriate to your setup. |
---|
471 | |
---|
472 | The 'squid.conf' file and the following discussion assumes that |
---|
473 | Squid's configuration files are in '/etc/squid/', Squid's binary is in |
---|
474 | '/usr/sbin/squid', Squid's logs are in '/var/log/Squid', and that |
---|
475 | Squid runs as user 'squid'. There are several Squid helper |
---|
476 | applications included: |
---|
477 | |
---|
478 | - iRedirector.py |
---|
479 | |
---|
480 | - redirector_class.py |
---|
481 | |
---|
482 | - squidAcl.py |
---|
483 | |
---|
484 | Copy these to '/etc/squid', and make sure that user 'squid' has read |
---|
485 | and execute access for them. |
---|
486 | |
---|
487 | 'iRedirector.py' and 'redirector_class.py' together form a Squid |
---|
488 | redirector. These files are used by Squid to rewrite URLs when |
---|
489 | handing off to Zope. The files were originally written by Simon |
---|
490 | Eisenmann (longsleep) and modified to have a more mod_rewrite-like |
---|
491 | syntax by Florian Schulze (fschulze). |
---|
492 | |
---|
493 | 'squidAcl.py' is used to test whether a request is being made by an |
---|
494 | authenticated user (in which case we should not cache the request) or |
---|
495 | if the request is a conditional GET with an If-None-Match header |
---|
496 | (also should not be served up by Squid). |
---|
497 | |
---|
498 | You will need to customize 2 files: put information about your site's |
---|
499 | URL in squid.conf and configure redirector_class.py to do appropriate |
---|
500 | redirection. |
---|
501 | |
---|
502 | Squid Debugging Tips |
---|
503 | -------------------- |
---|
504 | |
---|
505 | Squid's logs are a good place to start looking if you have problems |
---|
506 | with it. |
---|
507 | |
---|
508 | While debugging, it's a good idea to run Squid from the command line |
---|
509 | and tell it to echo problems to the console. Start Squid using |
---|
510 | '/usr/sbin/squid -d1' |
---|
511 | |
---|
512 | To stop Squid from the command line, use '/usr/sbin/squid -k kill', |
---|
513 | and to reconfigure Squid after you have modified 'squid.conf', use |
---|
514 | '/usr/sbin/squid -k reconfigure'. |
---|
515 | |
---|
516 | If Squid won't start, check '/var/log/squid/cache.log' to see why. |
---|
517 | |
---|
518 | If Squid blocks your access to a particular page, uncomment the line |
---|
519 | 'debug_options ALL, 1 33,2' in squid.conf, reconfigure Squid, then |
---|
520 | look at '/var/log/squid/cache.log'. |
---|
521 | |
---|
522 | The redirector and external ACL python scripts can all log their |
---|
523 | activity. Set 'debug=1' in each of the .py files to see what they are |
---|
524 | up to. |
---|
525 | |
---|
526 | Use the LiveHTTPHeaders FireFox extension to see if you are getting |
---|
527 | cache hits for your pages. If there is a cache hit, you will see |
---|
528 | 'X-Cache: HIT from yourserver'; otherwise you will see 'X-Cache: MISS |
---|
529 | from yourserver'. Note that a MISS can mean either that Squid tried |
---|
530 | to retrieve your page from the cache but did not find it or that Squid |
---|
531 | has been disallowed from responding to the type of request you made |
---|
532 | (e.g. you were authenticated or making a conditional GET with an |
---|
533 | 'If-None-Match' request) |
---|
534 | |
---|
535 | Installing Caching Policies |
---|
536 | --------------------------- |
---|
537 | |
---|
538 | We use CMF's Caching Policy Manager to set headers for the pages we |
---|
539 | will be serving. Install the CacheSetup product using the |
---|
540 | QuickInstaller. CacheSetup installs the following basic caching |
---|
541 | policies (go to the ZMI to 'caching_policy_manager' to see them): |
---|
542 | |
---|
543 | Policies for Static Content |
---|
544 | |
---|
545 | anonymous_cache_template -- This policy is for static page templates |
---|
546 | on the site that are not associated with a content object (e.g. |
---|
547 | accessibility-info). Content served to anonymous visitors is cached |
---|
548 | in Squid for ANON_PAGE_TEMPLATE_CACHE_DURATION_DAYS (default = 1 day); |
---|
549 | browsers are told that the content is immediately stale. ETags are |
---|
550 | generated using the script getPageTemplateETag.py. The script |
---|
551 | getPageTemplateETag.py is used to generate ETags. The script |
---|
552 | getAnonPageTemplatesToCache.py returns a list of page template ids for |
---|
553 | which this policy should apply. |
---|
554 | |
---|
555 | Policies for Dynamic Content |
---|
556 | |
---|
557 | anonymous_cache_policy -- This policy is for content object views |
---|
558 | served to anonymous visitors. Content is cached in Squid for |
---|
559 | ANON_PAGE_TEMPLATE_CACHE_DURATION_DAYS (default = 1 day); browsers are |
---|
560 | told that the content is immediately stale. ETags are generated by |
---|
561 | the script getContentETag.py. The script doCache.py is used to |
---|
562 | determine which templates to cache. |
---|
563 | |
---|
564 | authenticated_cache_policy -- Same as anonymous_cache_policy, but |
---|
565 | content is flagged as private and is not cached in Squid. |
---|
566 | |
---|
567 | These policies should be a good starting point. You can customize |
---|
568 | the various ETag generating scripts and policy membership scripts (in |
---|
569 | CacheSetup/skins/cache_setup) to suit your needs. |
---|
570 | |
---|
571 | Debugging |
---|
572 | --------- |
---|
573 | |
---|
574 | LiveHTTPHeaders lets you see what (if any) headers |
---|
575 | CachingPolicyManager has set on your content. If CachingPolicyManager |
---|
576 | has set headers on your content, there will be a header labeled |
---|
577 | "X-Cache-Headers-Set-By: CachingPolicyManager" in the server's |
---|
578 | response. |
---|
579 | |
---|
580 | A few things to note: |
---|
581 | |
---|
582 | - CacheSetup requires CMF 1.4.9 or 1.5.5 (or the current CMFCore from |
---|
583 | svn) |
---|
584 | |
---|
585 | - CachingPolicyManager only sets cache headers on page templates. |
---|
586 | Some content objects (Documents, for example), when accessed |
---|
587 | directly via a URL, implicitly call a view template. Others, such as |
---|
588 | File objects or ATFile objects do not (they render their the result of |
---|
589 | their __call__ method, which is not a page template) |
---|
590 | |
---|
591 | - Zope's HTTP compression will disable the caching policy manager |
---|
592 | headers (and causes other problems as well). The file |
---|
593 | enableHTTPCompression.py in Plone's skins/plone_scripts toggles Zope's |
---|
594 | compression. The version of enableHTTPCompression.py included in the |
---|
595 | CacheSetup skin turns HTTP compression off in Plone. |
---|
596 | |
---|
597 | - There was a bug fixed on Internet Explorer 6 service packs that |
---|
598 | where content with "Content-Encoding: gzip" is always cached |
---|
599 | although you use "Cache-Control: no-cache" |
---|
600 | (http://support.microsoft.com/default.aspx?scid=kb;en-us;326489). It |
---|
601 | is another good reason to use enableHTTPCompression.py shipped with |
---|
602 | CacheFu. |
---|
603 | |
---|
604 | - The Enable 304s box must be checked for a caching policy for Zope |
---|
605 | to return a Status 304 response. CacheSetup turns on 304s upon |
---|
606 | installation, but if you add your own policies, you will need to be |
---|
607 | sure to explicitly turn on 304 handling for your policies. |
---|
608 | |
---|
609 | Installing PageCacheManager |
---|
610 | --------------------------- |
---|
611 | |
---|
612 | Install PageCacheManager by placing it in your Products directory (no |
---|
613 | other installation is needed). PageCacheManager will cache pages that |
---|
614 | have a caching policy associated with them that has the Enable 304s |
---|
615 | flag set. Create a PageCacheManager instance in the ZMI (give it an |
---|
616 | id of page_cache_manager, say) and then associate your content view |
---|
617 | page templates with the cache manager by modifying their .metadata |
---|
618 | files. You will need something like the following in the .metadata |
---|
619 | files for your view templates:: |
---|
620 | |
---|
621 | [default] |
---|
622 | title=Title of page template here |
---|
623 | cache=page_cache_manager |
---|
624 | |
---|
625 | Testing PageCacheManager |
---|
626 | ------------------------ |
---|
627 | |
---|
628 | Visit a content object view that you have associated with the |
---|
629 | PageCacheManager. The initial visit should load the page into the |
---|
630 | cache. Now clear your browser's cache and revisit the page with |
---|
631 | LiveHTTPHeaders enabled. You should see the header 'X-PageCache: |
---|
632 | HIT'. Now edit the content object and revisit the page. You should |
---|
633 | receive an updated version of the page. |
---|
634 | |
---|
635 | If you do not see an 'X-PageCache: HIT', verify that there is a |
---|
636 | Caching Policy associated with the page. The easiest way to check |
---|
637 | this is to verify that the header "X-Cache-Headers-Set-By: |
---|
638 | CachingPolicyManager" is present in your response headers. If there |
---|
639 | is a caching policy present, make sure it has the Enable 304s box |
---|
640 | checked. If that is the case, make sure that the view is in fact |
---|
641 | associated with the PageTemplateCache: visit page_template_cache in |
---|
642 | the ZMI, click the Associate tab, check 'Associated with this cache |
---|
643 | manager' and click the Locate button to verify the association. |
---|
644 | |
---|
645 | |
---|
646 | Installing and Configuring CMFSquidTool |
---|
647 | --------------------------------------- |
---|
648 | |
---|
649 | Install CMFSquidTool from the CacheFu distribution (it contains |
---|
650 | several useful patches that the Enfold version does not yet have). It |
---|
651 | will create a tool called 'portal_squid' in the ZMI. This is where |
---|
652 | you configure CMFSquidTool. |
---|
653 | |
---|
654 | For Cache type, select 'Squid'. |
---|
655 | |
---|
656 | If you have Squid directly responding to requests: |
---|
657 | |
---|
658 | - For Portal URLs for the cache, enter |
---|
659 | 'http://your.domain.name.here'. If it is possible to access your |
---|
660 | site through multiple URLs (e.g. 'http://www.mysite.com', |
---|
661 | 'http://mysite.com', 'https://www.mysite.com'), enter all of those |
---|
662 | URLs. |
---|
663 | |
---|
664 | - For URLs to purge, enter:: |
---|
665 | |
---|
666 | python:object.getUrlsToPurge(setup='squid_direct') |
---|
667 | |
---|
668 | If you have Squid behind Apache: |
---|
669 | |
---|
670 | - For Portal URLs for the cache, enter 'http://127.0.0.1:3128' |
---|
671 | |
---|
672 | - For URLs to purge, enter:: |
---|
673 | |
---|
674 | python:object.getUrlsToPurge(setup='squid_behind_apache') |
---|
675 | |
---|
676 | Customize the script getSiteUrls.py in CacheSetup/skins/cache_setup so |
---|
677 | that it returns your site's URLs. |
---|
678 | |
---|
679 | Testing CMFSquidTool |
---|
680 | -------------------- |
---|
681 | |
---|
682 | Visit a document that is cached by Squid as an anonymous user. Make |
---|
683 | sure you get an 'X-Cache: HIT' header in response (you may need to |
---|
684 | make a second visit). |
---|
685 | |
---|
686 | If you have Squid responding directly to requests: |
---|
687 | |
---|
688 | - Enter the URL you just visited relative to the portal root in the |
---|
689 | "Purge URL" box and click Go!. For example, for |
---|
690 | http://mysite.com/foo/bar, enter "foo/bar". |
---|
691 | |
---|
692 | If you have Squid behind apache: |
---|
693 | |
---|
694 | - Enter the prefix 'http/your.site.url/' followed by URL you just |
---|
695 | visited relative to the portal root in the 'Purge URL' box and |
---|
696 | click 'Go!'. For example, for 'http://mysite.com/foo/bar', enter |
---|
697 | 'http/mysite.com/foo/bar'. |
---|
698 | |
---|
699 | You should see '200 http://someurlhere' on the next page. The 200 |
---|
700 | means that CMFSquidTool successfully purged the content. If you get a |
---|
701 | 404, something is wrong (you already purged the page from cache, the |
---|
702 | page was not in cache or you do not have permission to purge pages on |
---|
703 | cache). |
---|
704 | |
---|
705 | Notes |
---|
706 | ----- |
---|
707 | |
---|
708 | - xiru points out that if you are running squid and zope on separate |
---|
709 | boxes, you should make sure they have synchronized clocks. Use the |
---|
710 | ntp protocol to keep them in sync. |
---|
711 | |
---|
712 | - It appears that there is a weird IE bug connected with the |
---|
713 | Last-Modified header. If you have a page with a last-modified |
---|
714 | header and cache-control: max-age: 0, under some circumstances, IE |
---|
715 | will (properly) determine that the page in cache is stale and request |
---|
716 | a new copy. However, it will then render the old cached page instead |
---|
717 | of the new one. P-J Grizel speculates that this might have to do with |
---|
718 | a bug in how IE parses the time zone in the last-modified header. xiru |
---|
719 | believes that this bug has relation with Internet Explorer cache |
---|
720 | cleanup implementation and says that it does NOT happen on Firefox. |
---|
721 | For now, I am disabling the last-modified header in the caching |
---|
722 | policies. |
---|
723 | |
---|
724 | - Using the standard ETags to cache content pages can result in |
---|
725 | stale portlets and navigational elements. To overcome this issue |
---|
726 | an optional behavior has been added to CacheSetup that allows the ETag |
---|
727 | for content objects and page templates to change whenever any content |
---|
728 | object in the portal is modified in any way (other than changes made |
---|
729 | to local roles and changes made in the ZMI). This is accomplished by |
---|
730 | adding the last time that a persistent change was made to the |
---|
731 | portal_catalog to the ETag. The result is that, when this feature is |
---|
732 | enabled, any changes to content objects will change the ETags for all |
---|
733 | content objects. This is particularly useful for sites with a very |
---|
734 | high proportion of read vs. write operations which want all portlets |
---|
735 | and navigation to update immediately when a change is made. |
---|
736 | Unfortunately, anonymous users visiting a page for the first time may |
---|
737 | still get a page with stale portlets served from a proxy cache |
---|
738 | (squid), but all authenticated users, all users revalidating a cached |
---|
739 | page with an ETag, and all users not using a proxy cache will be |
---|
740 | served a page with updated portlets. To enable this feature simply |
---|
741 | uncomment the appropriate line in getContentETag.py and/or |
---|
742 | getPageTemplateETag.py. |
---|