We have recently landed the
hide-if-matches-xpath snippet, which allows the usage of XPath queries to directly target elements.
This document summarizes how to use XPath, and explains how the snippet helps to hide unwanted ads from a generic page.
XPath is a web standard available since the year 2000. It is not an extremely common technology compared to CSS. However, it is different from CSS selectors because it allows queries to crawl any document in different axes.
For example, with CSS selectors it is not possible to target the parent node of an element that matches some specific rule, while XPath provides axes-related syntax to move around any specific target node.
This possibility alone makes XPath unique and more powerful than just CSS selectors, which is one of the reasons we decided to ship the
Visiting example.com after adding the following filter hides the main
<div> element by checking the content of any node in the document:
example.com#$#hide-if-matches-xpath '//*[contains(text(),"More information...")]/ancestor::div'
However, it is possible to also target any other node by crawling the hierarchy of such an ancestor via
/, resulting in hiding only the first
<p> element of the container. Example:
But, if it is the parent's previous sibling that we are after, we can obtain the same result via the following selector:
In these examples, we have already used a few XPath concepts, such as:
- a wild character or a tag name to reach any, or a specific kind of, node
- the document root via
//, or the immediate next path via
/, to continue crawling via other queries
- axes, such as
preceding, to move around the initial node
There is a list of available XPath functions in MDN, but the most interesting for ad-blocking purposes are:
- concat() - to concatenate strings from various attributes or multiple nodes
- contains() - to verify if a specific node attribute or content contains a specific string
- last() - to retrieve the last element that matches a specific query, as in
- not() - to negate any expression, as in
- position() - to compare a specific element position, as in
//p[position() = 2]; mostly useful together with
- starts-with() - to search a specific value at the beginning of some text or attribute, as in
There are surely other functions that might be handy in specific cases, but the best part is that all functions can be combined and used as expressions.
In MDN, there is also a list of usable axes that are all useful.
The handiest tip regarding axes is that
self can be represented as
. and parent can be represented as
The following query, for example, hides the first previous sibling from an
https: link, regardless of its tag name.
This time from the W3schools pages, the list of operators resembles most programming languages, and the only one that might cause confusion is the
div to divide numbers.
For ad-blocking use cases though, besides
=, and similar operations needed in conjunction with
position(), for example, it is important to remember that the pipe
| operator can be used to group multiple queries at once, which is basically the equivalent of the comma
, CSS separator.
Back to the standard XPath documentation, there is a long list of path examples, but the most important path for ad-blocking purposes is
text(), which retrieves the node text to search, and
node(), which grabs all children of a specific target.