[2947] | 1 | |
---|
| 2 | Blackout filtering |
---|
| 3 | ================== |
---|
| 4 | |
---|
[3005] | 5 | Introduction |
---|
| 6 | ============ |
---|
[2947] | 7 | |
---|
[3005] | 8 | Sitemap portal type has an option that filters objects that |
---|
| 9 | should be excluded from a sitemap. This option is accessable |
---|
| 10 | on sitemap edit form and is labeled as "Blackout entries". |
---|
[2947] | 11 | |
---|
[2997] | 12 | In earlier versions of the package (<4.0.1 for plone-4 branch |
---|
| 13 | and <3.0.7 for plone-3 branch) this field allowed to |
---|
[3005] | 14 | filter objects only by their ids, and it looked like: |
---|
[2997] | 15 | |
---|
[2947] | 16 | <pre> |
---|
| 17 | index.html |
---|
| 18 | index_html |
---|
| 19 | </pre> |
---|
| 20 | |
---|
[3005] | 21 | As a result, all objects with "index.html" or "index_html" ids |
---|
| 22 | were excluded from the sitemap. |
---|
[2947] | 23 | |
---|
[3005] | 24 | In the new versions of GoogleSitemaps filtering was refactored |
---|
| 25 | to pluggable architecture. Now filters turned to be named multi |
---|
| 26 | adapters. There are only two default filters: "id" and "path". |
---|
[2947] | 27 | |
---|
[2997] | 28 | Since different filters can be used - new syntax was applied |
---|
[2947] | 29 | to the "Blackout entries" field. Every record in the field |
---|
[2997] | 30 | should follow the specification: |
---|
[2947] | 31 | |
---|
| 32 | [<filter name>:]<filter arguments> |
---|
| 33 | |
---|
[3005] | 34 | * If no <filter name> is specified - "id" filter will be used. |
---|
| 35 | * If <filter name> is specified - system will look for |
---|
| 36 | <filter name>-named multiadapter to IBlackoutFilter interface. |
---|
| 37 | If such multiadapter is not found - filter ill be ignored without |
---|
| 38 | raising any errors. |
---|
[2947] | 39 | |
---|
[3005] | 40 | The following parts demonstrate how to work with filtering. |
---|
| 41 | Aspects of default filters ("id" and "path") will also be |
---|
| 42 | considered. |
---|
[2947] | 43 | |
---|
[3005] | 44 | Demonstration environment setup |
---|
[2997] | 45 | =============================== |
---|
[2947] | 46 | |
---|
[3005] | 47 | First, we have to do some setup. We use testbrowser that is |
---|
[2997] | 48 | shipped with Five, as this provides proper Zope 2 integration. Most |
---|
| 49 | of the documentation, though, is in the underlying zope.testbrower |
---|
| 50 | package. |
---|
| 51 | |
---|
[2947] | 52 | >>> from Products.Five.testbrowser import Browser |
---|
| 53 | >>> browser = Browser() |
---|
| 54 | >>> portal_url = self.portal.absolute_url() |
---|
| 55 | |
---|
[3005] | 56 | This is useful when writing and debugging testbrowser tests. It lets |
---|
| 57 | us see all error messages in the error_log. |
---|
[2947] | 58 | |
---|
| 59 | >>> self.portal.error_log._ignored_exceptions = () |
---|
| 60 | |
---|
[2997] | 61 | With that in place, we can go to the portal front page and log in. |
---|
| 62 | We will do this using the default user from PloneTestCase: |
---|
[2947] | 63 | |
---|
| 64 | >>> from Products.PloneTestCase.setup import portal_owner, default_password |
---|
| 65 | >>> browser.open(portal_url) |
---|
| 66 | |
---|
| 67 | We have the login portlet, so let's use that. |
---|
| 68 | |
---|
| 69 | >>> browser.open('http://nohost/plone/login_form') |
---|
| 70 | >>> browser.getControl('Login Name').value = portal_owner |
---|
| 71 | >>> browser.getControl('Password').value = default_password |
---|
| 72 | >>> browser.getControl('Log in').click() |
---|
| 73 | >>> "You are now logged in" in browser.contents |
---|
| 74 | True |
---|
| 75 | >>> "Login failed" in browser.contents |
---|
| 76 | False |
---|
| 77 | >>> browser.url |
---|
| 78 | 'http://nohost/plone/login_form' |
---|
| 79 | |
---|
| 80 | |
---|
| 81 | Functionality |
---|
| 82 | ============= |
---|
| 83 | |
---|
[3005] | 84 | First, create some content for demonstration purpose. |
---|
[2947] | 85 | |
---|
| 86 | In the root of the portal |
---|
| 87 | |
---|
| 88 | >>> self.addDocument(self.portal, "doc1", "Document 1 text") |
---|
| 89 | >>> self.addDocument(self.portal, "doc2", "Document 2 text") |
---|
| 90 | |
---|
| 91 | And in the memeber's folder |
---|
| 92 | |
---|
| 93 | >>> self.addDocument(self.folder, "doc1", "Member Document 1 text") |
---|
| 94 | >>> self.addDocument(self.folder, "doc2", "Member Document 2 text") |
---|
| 95 | |
---|
[2997] | 96 | We need to add sitemap for demonstration. |
---|
[2947] | 97 | |
---|
| 98 | >>> browser.open(portal_url + "/prefs_gsm_settings") |
---|
| 99 | >>> browser.getControl('Add Content Sitemap').click() |
---|
| 100 | |
---|
[3005] | 101 | Now we are landed on the newly-created sitemap edit form. |
---|
| 102 | What we are interested in is "Blackout entries" field on the edit |
---|
| 103 | form, it should be empty by default settings. |
---|
[2997] | 104 | |
---|
[3497] | 105 | >>> file("/tmp/browser.test.html","wb").write(browser.contents) |
---|
[2947] | 106 | >>> blackout_list = browser.getControl("Blackout entries") |
---|
| 107 | >>> blackout_list |
---|
| 108 | <Control name='blackout_list:lines' type='textarea'> |
---|
[2949] | 109 | >>> blackout_list.value == "" |
---|
| 110 | True |
---|
[2948] | 111 | >>> save_button = browser.getControl("Save") |
---|
[2947] | 112 | >>> save_button |
---|
[2992] | 113 | <SubmitControl name='form...' type='submit'> |
---|
[2948] | 114 | >>> save_button.click() |
---|
[2947] | 115 | |
---|
| 116 | |
---|
[3005] | 117 | Clicking on "Save" button will lead us to the sitemap view. |
---|
[2947] | 118 | |
---|
[2950] | 119 | >>> print browser.contents |
---|
| 120 | <?xml version="1.0" encoding=... |
---|
[2947] | 121 | |
---|
[2950] | 122 | |
---|
[3005] | 123 | "sitemap.xml" link should appear on "Settings" page of the |
---|
| 124 | Plone Google Sitemap configlet after "Content Sitemap" |
---|
[2997] | 125 | was added. |
---|
[2947] | 126 | |
---|
[2949] | 127 | >>> browser.open(portal_url + "/prefs_gsm_settings") |
---|
| 128 | >>> smedit_link = browser.getLink('sitemap.xml') |
---|
[2950] | 129 | >>> smedit_url = smedit_link.url |
---|
[2947] | 130 | |
---|
[3005] | 131 | This link points to the newly-created sitemap.xml edit form. |
---|
| 132 | Let's prepare view link to simplify the following demonstrations. |
---|
[2947] | 133 | |
---|
[2950] | 134 | >>> smedit_url.endswith("sitemap.xml/edit") |
---|
[2949] | 135 | True |
---|
[2950] | 136 | >>> smview_url = smedit_url[:-5] |
---|
[2949] | 137 | |
---|
| 138 | |
---|
| 139 | No filters |
---|
| 140 | ========== |
---|
| 141 | |
---|
[3005] | 142 | The created sitemap has no filters applied and all documents should appear in it. |
---|
[2949] | 143 | |
---|
[2950] | 144 | >>> browser.open(smview_url) |
---|
[3497] | 145 | >>> file("/tmp/browser.test.html","wb").write(browser.contents) |
---|
[2949] | 146 | >>> no_filters_content = browser.contents |
---|
| 147 | |
---|
[3005] | 148 | Check if result page is really a sitemap... |
---|
[2949] | 149 | |
---|
[2950] | 150 | >>> print browser.contents |
---|
| 151 | <?xml version="1.0" encoding=... |
---|
[2949] | 152 | |
---|
[2950] | 153 | |
---|
[3005] | 154 | Create regular expression, which will help us to test which urls pass the filters. |
---|
[2949] | 155 | |
---|
[3163] | 156 | >>> import re |
---|
[2949] | 157 | >>> reloc = re.compile("<loc>%s([^\<]*)</loc>" % self.portal.absolute_url(), re.S) |
---|
| 158 | |
---|
[3494] | 159 | Test if all 4 documents are in the sitemap without filters. |
---|
[2949] | 160 | |
---|
| 161 | >>> no_filters_res = reloc.findall(no_filters_content) |
---|
| 162 | >>> no_filters_res.sort() |
---|
| 163 | >>> print "\n".join(no_filters_res) |
---|
| 164 | /Members/test_user_1_/doc1 |
---|
| 165 | /Members/test_user_1_/doc2 |
---|
| 166 | /doc1 |
---|
| 167 | /doc2 |
---|
| 168 | |
---|
| 169 | |
---|
| 170 | Check "id" filter |
---|
| 171 | ================= |
---|
| 172 | |
---|
[3494] | 173 | Go to the sitemap edit form and add "doc1" line with "id:" |
---|
[3005] | 174 | prefix to the "Blackout entries" field. |
---|
[2949] | 175 | |
---|
[2950] | 176 | >>> browser.open(smedit_url) |
---|
[2949] | 177 | >>> filtercontrol = browser.getControl("Blackout entries") |
---|
[2952] | 178 | >>> filtercontrol.value = """ |
---|
| 179 | ... id:doc1 |
---|
| 180 | ... """ |
---|
[2949] | 181 | >>> browser.getControl("Save").click() |
---|
| 182 | >>> id_filter_content = browser.contents |
---|
| 183 | |
---|
[3494] | 184 | "doc1" document should now be excluded from the |
---|
[2997] | 185 | sitemap. |
---|
[2949] | 186 | |
---|
| 187 | >>> id_filter_res = reloc.findall(id_filter_content) |
---|
| 188 | >>> id_filter_res.sort() |
---|
| 189 | >>> print "\n".join(id_filter_res) |
---|
| 190 | /Members/test_user_1_/doc2 |
---|
| 191 | /doc2 |
---|
| 192 | |
---|
| 193 | |
---|
| 194 | Check "path" filter |
---|
| 195 | =================== |
---|
| 196 | |
---|
[3494] | 197 | Suppose we want to exclude "doc2" document, |
---|
| 198 | located in test_user_1_ home folder, but leave "doc2" |
---|
[3005] | 199 | untouched in portal root with all other objects. |
---|
[2949] | 200 | |
---|
[2950] | 201 | >>> browser.open(smedit_url) |
---|
[2949] | 202 | >>> filtercontrol = browser.getControl("Blackout entries") |
---|
[2952] | 203 | >>> filtercontrol.value = """ |
---|
| 204 | ... path:/Members/test_user_1_/doc2 |
---|
| 205 | ... """ |
---|
[2949] | 206 | >>> browser.getControl("Save").click() |
---|
| 207 | >>> path_filter_content = browser.contents |
---|
| 208 | |
---|
[3494] | 209 | "/Members/test_user_1_/doc2" object should |
---|
[2997] | 210 | be excluded from the sitemap. |
---|
[2949] | 211 | |
---|
| 212 | >>> path_filter_res = reloc.findall(path_filter_content) |
---|
| 213 | >>> path_filter_res.sort() |
---|
| 214 | >>> print "\n".join(path_filter_res) |
---|
[3000] | 215 | /Members/test_user_1_/doc1 |
---|
[2949] | 216 | /doc1 |
---|
| 217 | /doc2 |
---|
| 218 | |
---|
| 219 | |
---|
| 220 | Check default filter |
---|
| 221 | ==================== |
---|
| 222 | |
---|
[3005] | 223 | Now I have a question: "What filter will be used when no |
---|
| 224 | filter name prefix is specified (e.g. old-fashion filters)?" |
---|
[2949] | 225 | |
---|
[3494] | 226 | Go to the sitemap edit form and add "doc1" line |
---|
| 227 | without any filter name prefix to the "Blackout entries" |
---|
[3005] | 228 | field. |
---|
[2949] | 229 | |
---|
| 230 | >>> browser.open(portal_url + "/sitemap.xml/edit") |
---|
| 231 | >>> filtercontrol = browser.getControl("Blackout entries") |
---|
[2952] | 232 | >>> filtercontrol.value = """ |
---|
| 233 | ... doc1 |
---|
| 234 | ... """ |
---|
[2949] | 235 | >>> browser.getControl("Save").click() |
---|
| 236 | >>> default_filter_content = browser.contents |
---|
| 237 | |
---|
[3494] | 238 | "id" filter must be used as default filter. So, "doc1" |
---|
| 239 | object should be excluded from the sitemap. |
---|
[2949] | 240 | |
---|
| 241 | >>> default_filter_res = reloc.findall(default_filter_content) |
---|
| 242 | >>> default_filter_res.sort() |
---|
| 243 | >>> print "\n".join(default_filter_res) |
---|
| 244 | /Members/test_user_1_/doc2 |
---|
| 245 | /doc2 |
---|
| 246 | |
---|
| 247 | |
---|
[3005] | 248 | Create your own filters |
---|
| 249 | ======================= |
---|
[2951] | 250 | |
---|
[3005] | 251 | Suppose we want to create our own blackout filter, which will |
---|
| 252 | behave like id-filter, but will have some differences. Our fitler |
---|
| 253 | has the following format: |
---|
[2951] | 254 | |
---|
| 255 | (+|-)<filtered id> |
---|
| 256 | |
---|
[3005] | 257 | - if the 1st sign is "+" then only objects with <filtered id> |
---|
| 258 | should be left in sitemap after filetering; |
---|
| 259 | - if the 1st sign is "-" then all objects with <filtered id> |
---|
| 260 | should be excluded from the sitemap (like default id filter). |
---|
[2951] | 261 | |
---|
[3005] | 262 | You need to create new IBlckoutFilter multi-adapter, and register |
---|
| 263 | it with unique name. |
---|
[2951] | 264 | |
---|
| 265 | >>> from zope.component import adapts |
---|
| 266 | >>> from zope.interface import Interface, implements |
---|
| 267 | >>> from zope.publisher.interfaces.browser import IBrowserRequest |
---|
| 268 | >>> from quintagroup.plonegooglesitemaps.interfaces import IBlackoutFilter |
---|
| 269 | >>> class SignedIdFilter(object): |
---|
| 270 | ... adapts(Interface, IBrowserRequest) |
---|
| 271 | ... implements(IBlackoutFilter) |
---|
| 272 | ... def __init__(self, context, request): |
---|
| 273 | ... self.context = context |
---|
| 274 | ... self.request = request |
---|
| 275 | ... def filterOut(self, fdata, fargs): |
---|
| 276 | ... sign = fargs[0] |
---|
| 277 | ... fid = fargs[1:] |
---|
| 278 | ... if sign == "+": |
---|
| 279 | ... return [b for b in fdata if b.getId==fid] |
---|
| 280 | ... elif sign == "-": |
---|
| 281 | ... return [b for b in fdata if b.getId!=fid] |
---|
| 282 | ... return fdata |
---|
| 283 | |
---|
| 284 | |
---|
| 285 | Now register this new filter as named multiadapter ... |
---|
| 286 | |
---|
| 287 | >>> from zope.component import provideAdapter |
---|
| 288 | >>> provideAdapter(SignedIdFilter, |
---|
| 289 | ... name=u'signedid') |
---|
| 290 | |
---|
[3005] | 291 | So that's all what needed to add new filter. Now test newly-created |
---|
| 292 | filter. |
---|
[2951] | 293 | |
---|
[2997] | 294 | Check whether white filtering ("+" prefix) works correctly. |
---|
[3005] | 295 | Go to the sitemap edit form and add "signedid:+doc1" |
---|
[2951] | 296 | to the "Blackout entries" field. |
---|
| 297 | |
---|
| 298 | >>> browser.open(smedit_url) |
---|
| 299 | >>> filtercontrol = browser.getControl("Blackout entries") |
---|
[2952] | 300 | >>> filtercontrol.value = """ |
---|
| 301 | ... signedid:+doc1 |
---|
| 302 | ... """ |
---|
[2951] | 303 | >>> browser.getControl("Save").click() |
---|
| 304 | >>> signedid_filter_content = browser.contents |
---|
| 305 | |
---|
[3005] | 306 | Only objects with "doc1" id should be left in the sitemap. |
---|
[2951] | 307 | |
---|
| 308 | >>> signedid_filter_res = reloc.findall(signedid_filter_content) |
---|
| 309 | >>> signedid_filter_res.sort() |
---|
| 310 | >>> print "\n".join(signedid_filter_res) |
---|
| 311 | /Members/test_user_1_/doc1 |
---|
| 312 | /doc1 |
---|
| 313 | |
---|
| 314 | |
---|
[3005] | 315 | Finally, check whether black filtering ("-" prefix) works correctly. |
---|
| 316 | Go to the sitemaps edit form and add "signedid:-doc1" to the "Blackout |
---|
| 317 | entries" field. |
---|
[2951] | 318 | |
---|
| 319 | >>> browser.open(smedit_url) |
---|
| 320 | >>> filtercontrol = browser.getControl("Blackout entries") |
---|
[2952] | 321 | >>> filtercontrol.value = """ |
---|
| 322 | ... signedid:-doc1 |
---|
| 323 | ... """ |
---|
[2951] | 324 | >>> browser.getControl("Save").click() |
---|
| 325 | >>> signedid_filter_content = browser.contents |
---|
| 326 | |
---|
[3005] | 327 | All objects, except those having "doc1" id, must be included in |
---|
[2997] | 328 | the sitemap. |
---|
[2951] | 329 | |
---|
| 330 | >>> signedid_filter_res = reloc.findall(signedid_filter_content) |
---|
| 331 | >>> signedid_filter_res.sort() |
---|
| 332 | >>> print "\n".join(signedid_filter_res) |
---|
| 333 | /Members/test_user_1_/doc2 |
---|
| 334 | /doc2 |
---|