Blackout filtering
==================
Filtering introductioin
=======================
Sitemap portal type has an option, designed to filter out
objects which should be excluded from a sitemap. This option is
accessable in sitemap edit form and is labeled as
"Blackout entries".
In earlier versions of the package (<4.0.1 for plone-4 branch
and <3.0.7 for plone-3 branch) this field allowed to
filter objects only by their ids, and looked like:
index.html
index_html
So, all objects with "index.html" or "index_html" ids were
excluded from the sitemap.
In the new versions of GoogleSitemaps filtering was remade
to pluggable architecture. Now filters became named multi
adapters. There are only two default filters - "id" and
"path".
Since different filters can be used - new syntax was applied
to the "Blackout entries" field. Every record in the field
should follow the specification:
[:]
If no is specified - "id" filter will
be used. If is specified - system will look
for -named multiadapter to IBlackoutFilter
interface. If such multiadapter is not found - filter
will be ignored without raising any errors.
Following parts demonstrate working of the filtering.
Aspects of default filters ("id" and "path")
are considered yet.
Setup demonstration environment
===============================
First, we must perform some setup. We use the testbrowser that is
shipped with Five, as this provides proper Zope 2 integration. Most
of the documentation, though, is in the underlying zope.testbrower
package.
>>> from Products.Five.testbrowser import Browser
>>> browser = Browser()
>>> portal_url = self.portal.absolute_url()
The following is useful when writing and debugging testbrowser tests.
It lets us see all error messages in the error_log.
>>> self.portal.error_log._ignored_exceptions = ()
With that in place, we can go to the portal front page and log in.
We will do this using the default user from PloneTestCase:
>>> from Products.PloneTestCase.setup import portal_owner, default_password
>>> browser.open(portal_url)
We have the login portlet, so let's use that.
>>> browser.open('http://nohost/plone/login_form')
>>> browser.getLink('Log in').click()
>>> browser.url
'http://nohost/plone/login_form'
>>> browser.getControl('Login Name').value = portal_owner
>>> browser.getControl('Password').value = default_password
>>> browser.getControl('Log in').click()
>>> "You are now logged in" in browser.contents
True
>>> "Login failed" in browser.contents
False
>>> browser.url
'http://nohost/plone/login_form'
Functionality
=============
First create some content for demonstrations.
In the root of the portal
>>> self.addDocument(self.portal, "doc1", "Document 1 text")
>>> self.addDocument(self.portal, "doc2", "Document 2 text")
And in the memeber's folder
>>> self.addDocument(self.folder, "doc1", "Member Document 1 text")
>>> self.addDocument(self.folder, "doc2", "Member Document 2 text")
We need to add sitemap for demonstration.
>>> browser.open(portal_url + "/prefs_gsm_settings")
>>> browser.getControl('Add Content Sitemap').click()
Now we bring-up to edit form of the newly created content sitemap.
We are interested in two things: "Blackout entries" field must
present in the form and by default it should be empty.
>>> file("/tmp/browser.0.html","wb").write(browser.contents)
>>> blackout_list = browser.getControl("Blackout entries")
>>> blackout_list
>>> blackout_list.value == ""
True
>>> save_button = browser.getControl("Save")
>>> save_button
>>> save_button.click()
Click on "Save" button lead us to result sitemap view.
>>> print browser.contents
>> browser.open(portal_url + "/prefs_gsm_settings")
>>> smedit_link = browser.getLink('sitemap.xml')
>>> smedit_url = smedit_link.url
This link points to edit form of the newly created sitemap.xml.
Let prepare view link to simplifier following demonstrations.
>>> smedit_url.endswith("sitemap.xml/edit")
True
>>> smview_url = smedit_url[:-5]
No filters
==========
Created sitemap has no filters and all documents should appear in it.
>>> browser.open(smview_url)
>>> file("/tmp/browser.1.html","wb").write(browser.contents)
>>> no_filters_content = browser.contents
Check if resulted page really is sitemap...
>>> print browser.contents
>> reloc = re.compile("%s([^\<]*)" % self.portal.absolute_url(), re.S)
Test if all 4 documents and default front-page present in the sitemap
without filters.
>>> no_filters_res = reloc.findall(no_filters_content)
>>> no_filters_res.sort()
>>> print "\n".join(no_filters_res)
/Members/test_user_1_/doc1
/Members/test_user_1_/doc2
/doc1
/doc2
/front-page
Check "id" filter
=================
Go to the edit form of the sitemap and add "doc1"
and "front-page" lines with "id:" prefix to the
"Blackout entries" field.
>>> browser.open(smedit_url)
>>> filtercontrol = browser.getControl("Blackout entries")
>>> filtercontrol.value = """
... id:doc1
... id:front-page
... """
>>> browser.getControl("Save").click()
>>> id_filter_content = browser.contents
"doc1" and "front-page" documents should be excluded from the
sitemap.
>>> id_filter_res = reloc.findall(id_filter_content)
>>> id_filter_res.sort()
>>> print "\n".join(id_filter_res)
/Members/test_user_1_/doc2
/doc2
Check "path" filter
===================
Suppouse we wont to exclude the "front_page" from portal root
and "doc2" document, located in test_user_1_ home folder,
but leave untouched "doc2" in portal root with all other objects.
>>> browser.open(smedit_url)
>>> filtercontrol = browser.getControl("Blackout entries")
>>> filtercontrol.value = """
... path:/Members/test_user_1_/doc2
... path:/front-page
... """
>>> browser.getControl("Save").click()
>>> path_filter_content = browser.contents
"/Members/test_user_1_/doc2" and "/front_page" objects should
be excluded from the sitemap.
>>> path_filter_res = reloc.findall(path_filter_content)
>>> path_filter_res.sort()
>>> print "\n".join(path_filter_res)
/Members/test_user_1_/doc1
/doc1
/doc2
Check default filter
====================
Now I have the question: "What filter will be used when no
filter name prefix was specified (old-fashion filters for
example)?"
Go to the edit form of the sitemap and add "doc1" and
"front-page" lines without any filter name prefix to the
"Blackout entries" field.
>>> browser.open(portal_url + "/sitemap.xml/edit")
>>> filtercontrol = browser.getControl("Blackout entries")
>>> filtercontrol.value = """
... doc1
... front-page
... """
>>> browser.getControl("Save").click()
>>> default_filter_content = browser.contents
"id" filter must be used as default filter. So all "doc1" and
"front-page" objects should be excluded from the sitemap.
>>> default_filter_res = reloc.findall(default_filter_content)
>>> default_filter_res.sort()
>>> print "\n".join(default_filter_res)
/Members/test_user_1_/doc2
/doc2
Creation own filters
====================
Suppouse we want to create own blackout filter,
which behave like id-filter, but has some differencies.
Our fitler has following format:
(+|-)
- when 1st sign is "+" then only objects with
should be leaved in sitemap after filetering;
- if 1st sign is "-" then all objects with
should be excluded from the sitemap (like default id
filter).
You need create new IBlckoutFilter multi-adapter,
and register it with unique name.
>>> from zope.component import adapts
>>> from zope.interface import Interface, implements
>>> from zope.publisher.interfaces.browser import IBrowserRequest
>>> from quintagroup.plonegooglesitemaps.interfaces import IBlackoutFilter
>>> class SignedIdFilter(object):
... adapts(Interface, IBrowserRequest)
... implements(IBlackoutFilter)
... def __init__(self, context, request):
... self.context = context
... self.request = request
... def filterOut(self, fdata, fargs):
... sign = fargs[0]
... fid = fargs[1:]
... if sign == "+":
... return [b for b in fdata if b.getId==fid]
... elif sign == "-":
... return [b for b in fdata if b.getId!=fid]
... return fdata
Now register this new filter as named multiadapter ...
>>> from zope.component import provideAdapter
>>> provideAdapter(SignedIdFilter,
... name=u'signedid')
So that's all what needed to add new filter.
Now test newly created filter.
Check whether white filtering ("+" prefix) works correctly.
Go to edit form of the sitemap and add "signedid:+doc1"
to the "Blackout entries" field.
>>> browser.open(smedit_url)
>>> filtercontrol = browser.getControl("Blackout entries")
>>> filtercontrol.value = """
... signedid:+doc1
... """
>>> browser.getControl("Save").click()
>>> signedid_filter_content = browser.contents
Only objects with "doc1" id should be leaved in the sitemap.
>>> signedid_filter_res = reloc.findall(signedid_filter_content)
>>> signedid_filter_res.sort()
>>> print "\n".join(signedid_filter_res)
/Members/test_user_1_/doc1
/doc1
And for the last - check wheter black filtering ("-" prefix)
works correctly.
Go to the edit form of the sitemap and add "signedid:-doc1"
to the "Blackout entries" field.
>>> browser.open(smedit_url)
>>> filtercontrol = browser.getControl("Blackout entries")
>>> filtercontrol.value = """
... signedid:-doc1
... """
>>> browser.getControl("Save").click()
>>> signedid_filter_content = browser.contents
All objects except those having "doc1" id must be included in
the sitemap.
>>> signedid_filter_res = reloc.findall(signedid_filter_content)
>>> signedid_filter_res.sort()
>>> print "\n".join(signedid_filter_res)
/Members/test_user_1_/doc2
/doc2
/front-page