source: products/quintagroup.plonegooglesitemaps/branches/sitemap_date/quintagroup/plonegooglesitemaps/filters.txt @ 3497

Last change on this file since 3497 was 3497, checked in by potar, 12 years ago

Updated docs

  • Property svn:eol-style set to native
File size: 10.2 KB
RevLine 
[2947]1
2Blackout filtering
3==================
4
[3005]5Introduction
6============
[2947]7
[3005]8Sitemap portal type has an option that filters objects that
9should be excluded from a sitemap. This option is accessable
10on sitemap edit form and is labeled as "Blackout entries".
[2947]11
[2997]12In earlier versions of the package (<4.0.1 for plone-4 branch
13and <3.0.7 for plone-3 branch) this field allowed to
[3005]14filter objects only by their ids, and it looked like:
[2997]15
[2947]16<pre>
17  index.html
18  index_html
19</pre>
20
[3005]21As a result, all objects with "index.html" or "index_html" ids
22were excluded from the sitemap.
[2947]23
[3005]24In the new versions of GoogleSitemaps filtering was refactored
25to pluggable architecture. Now filters turned to be named multi
26adapters. There are only two default filters: "id" and "path".
[2947]27
[2997]28Since different filters can be used - new syntax was applied
[2947]29to the "Blackout entries" field. Every record in the field
[2997]30should follow the specification:
[2947]31 
32  [<filter name>:]<filter arguments>
33
[3005]34* If no <filter name> is specified - "id" filter will be used.
35* If <filter name> is specified - system will look for
36  <filter name>-named  multiadapter to IBlackoutFilter interface.
37  If such multiadapter is not found - filter ill be ignored without
38  raising any errors.
[2947]39
[3005]40The following parts demonstrate how to work with filtering.
41Aspects of default filters ("id" and "path") will also be
42considered.
[2947]43
[3005]44Demonstration environment setup
[2997]45===============================
[2947]46
[3005]47First, we have to do some setup. We use testbrowser that is
[2997]48shipped with Five, as this provides proper Zope 2 integration. Most
49of the documentation, though, is in the underlying zope.testbrower
50package.
51
[2947]52    >>> from Products.Five.testbrowser import Browser
53    >>> browser = Browser()
54    >>> portal_url = self.portal.absolute_url()
55
[3005]56This is useful when writing and debugging testbrowser tests. It lets
57us see all error messages in the error_log.
[2947]58
59    >>> self.portal.error_log._ignored_exceptions = ()
60
[2997]61With that in place, we can go to the portal front page and log in.
62We will do this using the default user from PloneTestCase:
[2947]63
64    >>> from Products.PloneTestCase.setup import portal_owner, default_password
65    >>> browser.open(portal_url)
66
67We have the login portlet, so let's use that.
68
69    >>> browser.open('http://nohost/plone/login_form')
70    >>> browser.getControl('Login Name').value = portal_owner
71    >>> browser.getControl('Password').value = default_password
72    >>> browser.getControl('Log in').click()
73    >>> "You are now logged in" in browser.contents
74    True
75    >>> "Login failed" in browser.contents
76    False
77    >>> browser.url
78    'http://nohost/plone/login_form'
79
80
81Functionality
82=============
83
[3005]84First, create some content for demonstration purpose.
[2947]85
86In the root of the portal
87
88    >>> self.addDocument(self.portal, "doc1", "Document 1 text")
89    >>> self.addDocument(self.portal, "doc2", "Document 2 text")
90
91And in the memeber's folder
92
93    >>> self.addDocument(self.folder, "doc1", "Member Document 1 text")
94    >>> self.addDocument(self.folder, "doc2", "Member Document 2 text")
95
[2997]96We need to add sitemap for demonstration.
[2947]97
98    >>> browser.open(portal_url + "/prefs_gsm_settings")
99    >>> browser.getControl('Add Content Sitemap').click()
100   
[3005]101Now we are landed on the newly-created sitemap edit form.
102What we are interested in is "Blackout entries" field on the edit
103form, it should be empty by default settings.
[2997]104
[3497]105    >>> file("/tmp/browser.test.html","wb").write(browser.contents)
[2947]106    >>> blackout_list = browser.getControl("Blackout entries")
107    >>> blackout_list
108    <Control name='blackout_list:lines' type='textarea'>
[2949]109    >>> blackout_list.value == ""
110    True
[2948]111    >>> save_button = browser.getControl("Save")
[2947]112    >>> save_button
[2992]113    <SubmitControl name='form...' type='submit'>
[2948]114    >>> save_button.click()
[2947]115
116
[3005]117Clicking on "Save" button will lead us to the sitemap view.
[2947]118
[2950]119    >>> print browser.contents
120    <?xml version="1.0" encoding=...
[2947]121
[2950]122
[3005]123"sitemap.xml" link should appear on "Settings" page of the
124Plone Google Sitemap configlet after "Content Sitemap"
[2997]125was added.
[2947]126
[2949]127    >>> browser.open(portal_url + "/prefs_gsm_settings")
128    >>> smedit_link = browser.getLink('sitemap.xml')
[2950]129    >>> smedit_url = smedit_link.url
[2947]130
[3005]131This link points to the newly-created sitemap.xml edit form.
132Let's prepare view link to simplify the following demonstrations.
[2947]133
[2950]134    >>> smedit_url.endswith("sitemap.xml/edit")
[2949]135    True
[2950]136    >>> smview_url = smedit_url[:-5]
[2949]137
138
139No filters
140==========
141
[3005]142The created sitemap has no filters applied and all documents should appear in it.
[2949]143
[2950]144    >>> browser.open(smview_url)
[3497]145    >>> file("/tmp/browser.test.html","wb").write(browser.contents)
[2949]146    >>> no_filters_content = browser.contents
147
[3005]148Check if result page is really a sitemap...
[2949]149
[2950]150    >>> print browser.contents
151    <?xml version="1.0" encoding=...
[2949]152
[2950]153
[3005]154Create regular expression, which will help us to test which urls pass the filters.
[2949]155
[3163]156    >>> import re
[2949]157    >>> reloc = re.compile("<loc>%s([^\<]*)</loc>" % self.portal.absolute_url(), re.S)
158
[3494]159Test if all 4 documents are in the sitemap without filters.
[2949]160
161    >>> no_filters_res = reloc.findall(no_filters_content)
162    >>> no_filters_res.sort()
163    >>> print "\n".join(no_filters_res)
164    /Members/test_user_1_/doc1
165    /Members/test_user_1_/doc2
166    /doc1
167    /doc2
168
169
170Check "id" filter
171=================
172
[3494]173Go to the sitemap edit form and add "doc1" line with "id:"
[3005]174prefix to the "Blackout entries" field.
[2949]175
[2950]176    >>> browser.open(smedit_url)
[2949]177    >>> filtercontrol = browser.getControl("Blackout entries")
[2952]178    >>> filtercontrol.value = """
179    ...     id:doc1
180    ... """
[2949]181    >>> browser.getControl("Save").click()
182    >>> id_filter_content = browser.contents
183
[3494]184"doc1" document should now be excluded from the
[2997]185sitemap.
[2949]186
187    >>> id_filter_res = reloc.findall(id_filter_content)
188    >>> id_filter_res.sort()
189    >>> print "\n".join(id_filter_res)
190    /Members/test_user_1_/doc2
191    /doc2
192
193
194Check "path" filter
195===================
196
[3494]197Suppose we want to exclude "doc2" document,
198located in test_user_1_ home folder, but leave "doc2"
[3005]199untouched in portal root with all other objects.
[2949]200
[2950]201    >>> browser.open(smedit_url)
[2949]202    >>> filtercontrol = browser.getControl("Blackout entries")
[2952]203    >>> filtercontrol.value = """
204    ...    path:/Members/test_user_1_/doc2
205    ... """
[2949]206    >>> browser.getControl("Save").click()
207    >>> path_filter_content = browser.contents
208
[3494]209"/Members/test_user_1_/doc2" object should
[2997]210be excluded from the sitemap.
[2949]211
212    >>> path_filter_res = reloc.findall(path_filter_content)
213    >>> path_filter_res.sort()
214    >>> print "\n".join(path_filter_res)
[3000]215    /Members/test_user_1_/doc1
[2949]216    /doc1
217    /doc2
218
219
220Check default filter
221====================
222
[3005]223Now I have a question: "What filter will be used when no
224filter name prefix is specified (e.g. old-fashion filters)?"
[2949]225
[3494]226Go to the sitemap edit form and add "doc1" line
227without any filter name prefix to the "Blackout entries"
[3005]228field.
[2949]229
230    >>> browser.open(portal_url + "/sitemap.xml/edit")
231    >>> filtercontrol = browser.getControl("Blackout entries")
[2952]232    >>> filtercontrol.value = """
233    ...     doc1
234    ... """
[2949]235    >>> browser.getControl("Save").click()
236    >>> default_filter_content = browser.contents
237
[3494]238"id" filter must be used as default filter. So, "doc1"
239object should be excluded from the sitemap.
[2949]240
241    >>> default_filter_res = reloc.findall(default_filter_content)
242    >>> default_filter_res.sort()
243    >>> print "\n".join(default_filter_res)
244    /Members/test_user_1_/doc2
245    /doc2
246
247
[3005]248Create your own filters
249=======================
[2951]250
[3005]251Suppose we want to create our own blackout filter,  which will
252behave like id-filter, but will have some differences. Our fitler
253has the following format:
[2951]254
255  (+|-)<filtered id>
256
[3005]257- if the 1st sign is "+" then only objects with <filtered id>
258  should be left in sitemap after filetering;
259- if the 1st sign is "-" then all objects with <filtered id>
260  should be excluded from the sitemap (like default id filter).
[2951]261
[3005]262You need to create new IBlckoutFilter multi-adapter, and register
263it with unique name.
[2951]264
265    >>> from zope.component import adapts
266    >>> from zope.interface import Interface, implements
267    >>> from zope.publisher.interfaces.browser import IBrowserRequest
268    >>> from quintagroup.plonegooglesitemaps.interfaces import IBlackoutFilter
269    >>> class SignedIdFilter(object):
270    ...     adapts(Interface, IBrowserRequest)
271    ...     implements(IBlackoutFilter)
272    ...     def __init__(self, context, request):
273    ...         self.context = context
274    ...         self.request = request
275    ...     def filterOut(self, fdata, fargs):
276    ...         sign = fargs[0]
277    ...         fid = fargs[1:]
278    ...         if sign == "+":
279    ...             return [b for b in fdata if b.getId==fid]
280    ...         elif sign == "-":
281    ...             return [b for b in fdata if b.getId!=fid]
282    ...         return fdata
283
284
285Now register this new filter as named multiadapter ...
286
287    >>> from zope.component import provideAdapter
288    >>> provideAdapter(SignedIdFilter,
289    ...                name=u'signedid')
290
[3005]291So that's all what needed to add new filter. Now test newly-created
292filter.
[2951]293
[2997]294Check whether white filtering ("+" prefix) works correctly.
[3005]295Go to the sitemap edit form and add "signedid:+doc1"
[2951]296to the "Blackout entries" field.
297
298    >>> browser.open(smedit_url)
299    >>> filtercontrol = browser.getControl("Blackout entries")
[2952]300    >>> filtercontrol.value = """
301    ...    signedid:+doc1
302    ... """
[2951]303    >>> browser.getControl("Save").click()
304    >>> signedid_filter_content = browser.contents
305
[3005]306Only objects with "doc1" id should be left in the sitemap.
[2951]307
308    >>> signedid_filter_res = reloc.findall(signedid_filter_content)
309    >>> signedid_filter_res.sort()
310    >>> print "\n".join(signedid_filter_res)
311    /Members/test_user_1_/doc1
312    /doc1
313
314
[3005]315Finally, check whether black filtering ("-" prefix) works correctly.
316Go to the sitemaps edit form and add "signedid:-doc1" to the "Blackout
317entries" field.
[2951]318
319    >>> browser.open(smedit_url)
320    >>> filtercontrol = browser.getControl("Blackout entries")
[2952]321    >>> filtercontrol.value = """
322    ...     signedid:-doc1
323    ... """
[2951]324    >>> browser.getControl("Save").click()
325    >>> signedid_filter_content = browser.contents
326
[3005]327All objects, except those having "doc1" id, must be included in
[2997]328the sitemap.
[2951]329
330    >>> signedid_filter_res = reloc.findall(signedid_filter_content)
331    >>> signedid_filter_res.sort()
332    >>> print "\n".join(signedid_filter_res)
333    /Members/test_user_1_/doc2
334    /doc2
Note: See TracBrowser for help on using the repository browser.