We have recently landed the hide-if-matches-xpath
snippet, which allows the usage of XPath queries to directly target elements.
This document summarizes how to use XPath, and explains how the snippet helps to hide unwanted ads from a generic page.
Background
XPath is a web standard available since the year 2000. It is not an extremely common technology compared to CSS. However, it is different from CSS selectors because it allows queries to crawl any document in different axes.
For example, with CSS selectors it is not possible to target the parent node of an element that matches some specific rule, while XPath provides axes-related syntax to move around any specific target node.
This possibility alone makes XPath unique and more powerful than just CSS selectors, which is one of the reasons we decided to ship the hide-if-matches-xpath
snippet.
Basic Examples
Visiting example.com
after adding the following filter hides the main <div>
element by checking the content of any node in the document:
example.com#$#hide-if-matches-xpath '//*[contains(text(),"More information...")]/ancestor::div'
However, it is possible to also target any other node by crawling the hierarchy of such an ancestor via /
, resulting in hiding only the first <p>
element of the container. Example:
//*[contains(text(),"More information...")]/ancestor::div/p[1]
But, if it is the parent's previous sibling that we are after, we can obtain the same result via the following selector:
//*[contains(text(),"More information...")]/ancestor::*/preceding::p
In these examples, we have already used a few XPath concepts, such as:
- a wild character or a tag name to reach any, or a specific kind of, node
- the document root via
//
, or the immediate next path via/
, to continue crawling via other queries - the
contains(source, match)
function - axes, such as
ancestor
orpreceding
, to move around the initial node
Functions
There is a list of available XPath functions in MDN, but the most interesting for ad-blocking purposes are:
- concat() - to concatenate strings from various attributes or multiple nodes
- contains() - to verify if a specific node attribute or content contains a specific string
- last() - to retrieve the last element that matches a specific query, as in
//p[last()]
- not() - to negate any expression, as in
//*[not(*[last()])]
- position() - to compare a specific element position, as in
//p[position() = 2]
; mostly useful together withnot()
- starts-with() - to search a specific value at the beginning of some text or attribute, as in
//a[starts-with(@href,"https://www.iana.org/")]
There are surely other functions that might be handy in specific cases, but the best part is that all functions can be combined and used as expressions.
Axes
In MDN, there is also a list of usable axes that are all useful.
The handiest tip regarding axes is that self
can be represented as .
and parent can be represented as ..
.
The following query, for example, hides the first previous sibling from an https:
link, regardless of its tag name.
//a[starts-with(@href,"https:")]/../preceding-sibling::*[position()=1]
Operators
This time from the W3schools pages, the list of operators resembles most programming languages, and the only one that might cause confusion is the div
to divide numbers.
For ad-blocking use cases though, besides <
, >
, =
, and similar operations needed in conjunction with position()
, for example, it is important to remember that the pipe |
operator can be used to group multiple queries at once, which is basically the equivalent of the comma ,
CSS separator.
Location paths
Back to the standard XPath documentation, there is a long list of path examples, but the most important path for ad-blocking purposes is text()
, which retrieves the node text to search, and node()
, which grabs all children of a specific target.