
You may have seen my blog post announcing WWWizzzard. You may have toyed with it, and found that many webpages have frequent irrelevant tiny changes, such as a carousel, or a list in random order or made from a random selection. I have now added a feature to help ignore those changes.
As of recently, WWWizzzard will accept two blocks of text in each site description. Each one is treated as a list of strings. For the first one, WWWizzzard will ignore all lines containing any of those strings. (This is done case-insensitively.) For the second one, WWWizzzard will ignore all DOM elements matching any of those CSS selectors. (If you don’t know what DOM elements and CSS selectors are, just leave it blank, and it won’t change anything.)
Once I figure out a decent UI for it, I may add a feature to ignore blocks of text, from some starting string to some ending string. That will help ignore such pieces on pages that don’t use semantic classes or IDs on their elements. :-(