Blackout filtering ================== Sitemaps has option to filterout objects, which shouldn't present in a sitemap. This option is accessable in sitemap edit form and present as "Blackout entries" lines field. In earlier (<3.0.7 and <4.0.1) versions of the package the field filter-out objects only by its ids, and looks like:
  index.html
  index_html
So all objects with "index.html" or "index_html" ids excluded from the sitemap. In the new versions of GoogleSitemaps filtering was remaked to pluggable architecture. Now filters are named mutli adapters. By default there are only two most useful filters - "id" and "path". Because of different filters can be used - new syntax applied to the "Blackout entries" field. Every record in the field should follow the spec: [:] By default (if no specified) - "id" filter will be used. If specified - system looking for name multiadapter to IBlackoutFilter interface. If such multiadapter was not found - it's ignored silently. Setup ===== First, we must perform some setup. We use the testbrowser that is shipped with Five, as this provides proper Zope 2 integration. Most of the documentation, though, is in the underlying zope.testbrower package. >>> from Products.Five.testbrowser import Browser >>> browser = Browser() >>> portal_url = self.portal.absolute_url() The following is useful when writing and debugging testbrowser tests. It lets us see all error messages in the error_log. >>> self.portal.error_log._ignored_exceptions = () With that in place, we can go to the portal front page and log in. We will do this using the default user from PloneTestCase: >>> from Products.PloneTestCase.setup import portal_owner, default_password >>> browser.open(portal_url) We have the login portlet, so let's use that. >>> browser.open('http://nohost/plone/login_form') >>> browser.getLink('Log in').click() >>> browser.url 'http://nohost/plone/login_form' >>> browser.getControl('Login Name').value = portal_owner >>> browser.getControl('Password').value = default_password >>> browser.getControl('Log in').click() >>> "You are now logged in" in browser.contents True >>> "Login failed" in browser.contents False >>> browser.url 'http://nohost/plone/login_form' Functionality ============= First create several documents for demonstrations. In the root of the portal >>> self.addDocument(self.portal, "doc1", "Document 1 text") >>> self.addDocument(self.portal, "doc2", "Document 2 text") And in the memeber's folder >>> self.addDocument(self.folder, "doc1", "Member Document 1 text") >>> self.addDocument(self.folder, "doc2", "Member Document 2 text") We need add sitemap, of corse, for demonstration. >>> browser.open(portal_url + "/prefs_gsm_settings") >>> browser.getControl('Add Content Sitemap').click() Now we bring-up to edit form of the newly created content sitemap. We interested in two things: "Blackout entries" field must present in the form and it should be empty by default. >>> file("/tmp/browser.0.html","wb").write(browser.contents) >>> blackout_list = browser.getControl("Blackout entries") >>> blackout_list >>> blackout_list.value == "" True >>> save_button = browser.getControl("Save") >>> save_button >>> save_button.click() Click on "Save" button lead us to result sitemap view. >>> print browser.contents >> browser.open(portal_url + "/prefs_gsm_settings") >>> smedit_link = browser.getLink('sitemap.xml') >>> smedit_url = smedit_link.url This link lead to edit form of the newly created sitemap.xml. Also prepare view link to simplifier following demonstrations. >>> smedit_url.endswith("sitemap.xml/edit") True >>> smview_url = smedit_url[:-5] No filters ========== Resulted sitemap has no filters - all document should present in it. >>> browser.open(smview_url) >>> file("/tmp/browser.1.html","wb").write(browser.contents) >>> no_filters_content = browser.contents Check if resulted page is real sitemap... >>> print browser.contents >> reloc = re.compile("%s([^\<]*)" % self.portal.absolute_url(), re.S) With help of reloc regular expression - check if all 4 documents + default front-page present in the sitemap without filters. >>> no_filters_res = reloc.findall(no_filters_content) >>> no_filters_res.sort() >>> print "\n".join(no_filters_res) /Members/test_user_1_/doc1 /Members/test_user_1_/doc2 /doc1 /doc2 /front-page Check "id" filter ================= Go to the edit form of the sitemap and add "doc1" and "front-page" lines with "id:" prefix to the "Blackout entries" field. >>> browser.open(smedit_url) >>> filtercontrol = browser.getControl("Blackout entries") >>> filtercontrol.value = """ ... id:doc1 ... id:front-page ... """ >>> browser.getControl("Save").click() >>> id_filter_content = browser.contents As result - all "doc1" and "front-page" documents must be filtered-out from the sitemap. >>> id_filter_res = reloc.findall(id_filter_content) >>> id_filter_res.sort() >>> print "\n".join(id_filter_res) /Members/test_user_1_/doc2 /doc2 Check "path" filter =================== Suppouse we wont to filter-out doc2 of the test_user_1_'s (but not from the portal root) and the front-page from the portal root. >>> browser.open(smedit_url) >>> filtercontrol = browser.getControl("Blackout entries") >>> filtercontrol.value = """ ... path:/Members/test_user_1_/doc2 ... path:/front-page ... """ >>> browser.getControl("Save").click() >>> path_filter_content = browser.contents As result - "doc2" of the pointed member and "front-page" documents must be filtered-out from the sitemap. >>> path_filter_res = reloc.findall(path_filter_content) >>> path_filter_res.sort() >>> print "\n".join(path_filter_res) /Members/test_user_1_/doc1 /doc1 /doc2 Check default filter ==================== Lets check what filter should be used for old-feshion filters (without any filter name prefixes)? Go to the edit form of the sitemap and add "doc1" and front-page lines without any filter name prefix to the "Blackout entries" field. >>> browser.open(portal_url + "/sitemap.xml/edit") >>> filtercontrol = browser.getControl("Blackout entries") >>> filtercontrol.value = """ ... doc1 ... front-page ... """ >>> browser.getControl("Save").click() >>> default_filter_content = browser.contents By default "id" filter must be used, so all "doc1" and "front-page" objects must be filtered-out from the sitemap. >>> default_filter_res = reloc.findall(default_filter_content) >>> default_filter_res.sort() >>> print "\n".join(default_filter_res) /Members/test_user_1_/doc2 /doc2 Creation own filters ==================== Suppouse we want to create own blackout filter, which behave like id-filter, but with some differencies. Our fitler has following format: (+|-) - when 1st sign "+" then only objects with must leave after filetering, - if 1st sign is "-" or all objects with must be filtered-out (like default id filter) You need create new IBlckoutFilter multi-adapter, and register it with unique name. >>> from zope.component import adapts >>> from zope.interface import Interface, implements >>> from zope.publisher.interfaces.browser import IBrowserRequest >>> from quintagroup.plonegooglesitemaps.interfaces import IBlackoutFilter >>> class SignedIdFilter(object): ... adapts(Interface, IBrowserRequest) ... implements(IBlackoutFilter) ... def __init__(self, context, request): ... self.context = context ... self.request = request ... def filterOut(self, fdata, fargs): ... sign = fargs[0] ... fid = fargs[1:] ... if sign == "+": ... return [b for b in fdata if b.getId==fid] ... elif sign == "-": ... return [b for b in fdata if b.getId!=fid] ... return fdata Now register this new filter as named multiadapter ... >>> from zope.component import provideAdapter >>> provideAdapter(SignedIdFilter, ... name=u'signedid') So thet's all what neede to add new filter. No test if newly added filter really take into consideration. Check if white filtering ("+" prefix) work correctly. Go to the edit form of the sitemap and add "signedid:+doc1" to the "Blackout entries" field. >>> browser.open(smedit_url) >>> filtercontrol = browser.getControl("Blackout entries") >>> filtercontrol.value = """ ... signedid:+doc1 ... """ >>> browser.getControl("Save").click() >>> signedid_filter_content = browser.contents As result - only objects with "doc1" id must present in the sitemap. >>> signedid_filter_res = reloc.findall(signedid_filter_content) >>> signedid_filter_res.sort() >>> print "\n".join(signedid_filter_res) /Members/test_user_1_/doc1 /doc1 An for the last - check black filtering ("-" prefix) is working. Go to the edit form of the sitemap and add "signedid:-doc1" to the "Blackout entries" field. >>> browser.open(smedit_url) >>> filtercontrol = browser.getControl("Blackout entries") >>> filtercontrol.value = """ ... signedid:-doc1 ... """ >>> browser.getControl("Save").click() >>> signedid_filter_content = browser.contents As result - all except objects with "doc1" id must present in the sitemap. >>> signedid_filter_res = reloc.findall(signedid_filter_content) >>> signedid_filter_res.sort() >>> print "\n".join(signedid_filter_res) /Members/test_user_1_/doc2 /doc2 /front-page