Tree reduction

4.3. Tree reduction

We reduce the problem of graph pattern matching to matching on rooted trees – as we will see in section 4.6, a much simpler problem to solve. The map between graphs and (rooted) trees is given by rooted dual trees. Call GG tree-like if GG is connected and the underlying undirected graph GUG_U of GG is acyclic.

Definition 4.2Rooted dual tree

Let GG be a tree-like graph with operations OO. Then given a root operation rOr \in O, the rooted dual tree of GG rooted at rr, written τr(G)\tau_r(G) is the tree given by

  • the nodes of the tree are the operations OO of GG,
  • the parent and children of oOo \in O are the operations that share a value with oo in GG; the parent is the unique operation on the path from oo to rr,
  • the children of an operation are ordered according to the port labels.

Unlike graphs, tree nodes are identified uniquely by their path from the root. Trees isomorphic as graphs with identical root are thus considered equal.

A tree reduction using path splitting #

To reduce a graph GG to a tree using the rooted dual tree construction, it suffices to reduce GG to a tree-like graph. The following result shows that this can always be achieved by repeatedly applying operation splitting transformations.

Proposition 4.6Path splitting

A tree-like graph can be obtained from any connected graph GG by applying operation splittings. The resulting graph is a path-split graph (PSG) of GG.

Proof

Consider the undirected simple graph I\mathcal{I}, where vertices are linear paths, and there is an edge between two linear paths for every operation that belongs to both paths. We call I\mathcal{I} the interface graph of GG.

Splitting an operation oo in a graph GG corresponds to removing the corresponding edge in I\mathcal{I}. On the other hand, the underlying undirected graph GUG_U of GG has a cycle if and only if there is a cycle in I\mathcal{I}. Indeed, a cycle in GUG_U cannot belong to a single linear path in GG, by acyclicity of minIR graphs. There is, therefore, a cycle of operations that span multiple linear paths, thus forming a cycle in I\mathcal{I}.

Hence, the operations to be split to turn GG into a tree-like graph are given by the set of edges EE^- in I\mathcal{I} that must be removed to obtain a spanning tree of I.\mathcal{I}.1

As we consider typed graph, the splitting of an operation is unique; however the choice of spanning tree of I\mathcal I is not unique, and thus multiple PSGs exist for a given graph GG.

If GG' is a PSG of some graph GG, then call an operation oo of GG an anchor operation if it is on two linear paths and it is not split in GG'. The set of all anchors operations πO\pi \subseteq O fully determines the path-split graph. We write Gπ=GG^\pi = G' for the PSG of GG obtained by anchors π\pi.

Proposition 4.3Rooted dual trees are ternary
Consider a PSG GπG^\pi of a graph GG. There is a root operation rOr \in O such that the rooted dual tree τr(Gπ)\tau_r(G^\pi) is a ternary tree, i.e. every node of τr(Gπ)\tau_r(G^\pi) has at most three children.
Proof

We have assumed in section 4.2 that every operation in GG is on at most two linear paths and thus can be connected to at most four values. Each value is linear and hence connected to at most one other operation. It results that every operation in τr(Gπ)\tau_r(G^\pi) has at most four neighbouring operations – one parent and three children. A tree leaf can be chosen as the root operation to ensure the root does not have four children.

We can make the path splitting transformation GGπG \to G^\pi reversible by separately storing the set of split operations in GπG^\pi that correspond to a single operation in GG. As every operation of GG can get split in at most two split operations, we can store the pairs of split operations in GπG^\pi that correspond to an operation in GG in a partial map that defines weights for (a subset of) the operations OπO^\pi of GπG^\pi:

split:OπP.split: O^\pi \rightharpoonup P^\ast.

This maps a split operation oo to the unique undirected path in GπG^\pi from oo to the other half of the split operation.

This defines a map σ1:(Gπ,split)G\sigma_1: (G^\pi, split) \mapsto G, the inverse of the path splitting transformation GGπG \to G^\pi.

Contracted path-split graphs #

We can further simplify the structure of the data of a PSG by contracting all operations of GπG^\pi that are on a single linear path. The result is the contracted path-split graph (cPSG) of GG, written c(Gπ)c(G^\pi).

We employ a similar trick as above to make this transformation reversible, this time by introducing weights on the values of c(Gπ)c(G^\pi) that store the string of operations that were contracted2

contract:VC(Γπ)contract: V_C \rightharpoonup (\Gamma^\pi)^\ast

where VCV_C are the values of c(Gπ)c(G^\pi) and Γπ\Gamma^\pi are the optypes of operations in GπG^\pi, i.e. the optypes of the minIR graph GG along with the optypes of the split operations. This defines a second map σ2:(c(Gπ),contract)Gπ\sigma_2: (c(G^\pi), contract) \mapsto G^\pi that is the inverse of the path-split graph contraction transformation c()c(\cdot). In summary, we have the composition

(c(Gπ),contract,split)σ2×Id(Gπ,split)σ1G.(c(G^\pi), contract, split) \xrightarrow{\sigma_2 \times Id} (G^\pi, split) \xrightarrow{\sigma_1} G.

Contracted PSGs are particularly useful for the study of the asymptotic complexity of the pattern matching algorithm we propose, as they have a very regular structure. This is expressed by the following proposition that further extends the statement of Proposition 4.3:

Proposition 4.4Contracted PSG
Consider a PSG GπG^\pi of a graph GG. There is a root operation rOr \in O such that the rooted dual tree of the contracted PSG τr(c(Gπ))\tau_r(c(G^\pi)) is a ternary tree with width(G)1width(G) - 1 nodes.
Proof

That the tree is ternary follows from Proposition 4.3. Every node of the tree corresponds to an operation in c(Gπ)c(G^\pi), which is on exactly two linear paths. As a result of acyclicity of the tree, a tree of kk nodes spans k+1k+1 linear paths – and hence, we conclude k=width(G)1k = width(G) - 1.

We conclude the construction presented in this section with the following result, expressing graph pattern matching in terms of tree equality:

Proposition 4.5Reduction to Tree Pattern matching

Let PP be a pattern graph and GG a graph. Let PπP^\pi be a PSG of GG. There is an embedding PGP \hookrightarrow G if and only if there is HGH \subseteq G and a PSG HπH^{\pi'} of HH such that

τ(c(Pπ))=τ(c(Hπ))\tau(c(P^\pi)) = \tau(c(H^{\pi'}))

and the trees have equal weight maps splitsplit and contractcontract.

The proof of this follows directly from our construction, the unicity of trees under isomorphism and the bijection between the graphs P,HP, H and their cPSGs.

We have thus successfully reduced the problem of pattern matching to the problem of matching on trees. Given that the ordering of children of a node in a tree is fixed, checking trees for equality is a simple matter of checking node and weight equality, one node (and edge) at a time.

We conclude this section with a figure summarising the constructions we have presented.

A graph GGG, along with the path-split graph GπG^\piGπ, the contracted path-split graph c(Gπ)c(G^\pi)c(Gπ) and their rooted dual trees. The anchor operations are ddd (grey) and eee (red). The root of the rooted dual trees is eee.

A graph GG, along with the path-split graph GπG^\pi, the contracted path-split graph c(Gπ)c(G^\pi) and their rooted dual trees. The anchor operations are dd (grey) and ee (red). The root of the rooted dual trees is ee.


  1. It is a simple result from graph theory that such a set of edges always exists – it suffices to remove one edge from every cycle in the graph. ↩︎

  2. Because all contracted operations apply on a single, shared, linear path, they indeed form a string of operations. ↩︎