4.5. Enumerating all path-split graphs
The CanonicalPathSplit procedure in the previous section defines for all
graphs  and choice of root operation  a canonical PSG , and thus a
canonical set of anchors  that we write as 
Instead of CanonicalPathSplit, we can equivalently consider a
CanonicalAnchors procedure, which computes  directly instead of the
graph .
We formulate this computation below, using recursion instead of a for loop.
This form generalises better to the AllAnchors procedure that we will
introduce next.
The equivalence of the CanonicalAnchors procedure with CanonicalPathSplit
follows from the observation made in
section 4.2 that ordering operations in
lexicographic order of the port labels is equivalent to a depth-first traversal
of the graph.
CanonicalAnchors implements a recursive depth-first traversal (DFS), with the
twist that the recursion is explicit only on the anchor nodes and otherwise
relying on the lexicographic ordering just like in CanonicalPathSplit: lines
5–15 of CanonicalAnchors correspond to the iterations of the for loop (line
11–20) of CanonicalPathSplit until an anchor operation is found (i.e. the
else branch on lines 18–20 is executed). From there, the graph traversal
proceeds recursively.
We introduce the ConnectedComponent, Neighbours and RemoveOperation
procedures; the first returns the connected component of the current operation,
whereas the other two procedures are used to traverse, respectively modify, the
graph . Importantly, Neighbours(G, op) returns the neighbours of op
ordered by port label order.
To ensure that the recursive DFS does not visit the same operation twice, we
modify the graph with RemoveOperation on lines 11 and 15, ensuring that no
visited operation remains in G. As a consequence, CanonicalAnchors may be
called on disconnected graphs, which explains why an additional call to
ConnectedComponent (line 4) is required.
CanonicalPathSplit and CanonicalAnchorsLet  be a connected graph and let  be a root operation in . Then
CanonicalAnchors maps the graph to the canonical anchor set:
where is the set of all paths in and designates the empty graph.
The proof follows directly from the previous paragraphs.
 1def CanonicalAnchors(
 2    G: Graph, root: Operation, seen_paths: Set[int]
 3) -> (Set[Operation], Set[int], Graph):
 4  operations = Operations(ConnectedComponent(G, root))
 5  # sort by PathAsPortLabels, as previously
 6  sorted_operations := sort(operations)
 7  operations_queue := queue(sorted_operations)
 8
 9  # Skip all operations that are not anchors
10  op := operations_queue.pop() # never emtpy, contains root
11  G = RemoveOperation(G, op)
12  while len(LinearPaths(G, op)) == 1 or
13        issubset(LinearPaths(G, op), seen_paths):
14    op = operations_queue.pop() or return ({}, {}, G)
15    G = RemoveOperation(G, op)
16
17  # op is anchor, update seen_paths and recurse
18  seen_paths = union(seen_paths, LinearPaths(G, op))
19  anchors := [op]
20  # sort by port labels
21  for child in Neighbours(G, op):
22    (new_anchors, seen_paths, G) = CanonicalAnchors(
23        G, child, seen_paths
24    )
25    anchors += new_anchors
26
27  return (anchors, seen_paths, G)
Maximal PSGs #
In addition to “simplifying” the data required to define path splitting, the definition of PSGs using anchor operations has another advantage that is fundamental to the pattern matching algorithm.
Consider the rooted dual tree of a PSG with root operation in . Recall that tree nodes are uniquely identified by their path from the root and thus are considered equal if they are isomorphic as graphs. We can in the same way define a tree inclusion relation on rooted dual trees that corresponds to checking that the trees have the same root and that the left-hand side is isomorphic to a subtree of the right-hand side. We also require that the operation weights given by the map map coincide on the common subtree.
Let be a connected graph, a set of operations in and a root operation. Consider the set
There is a subgraph such that for all subgraphs : . Furthermore, for all graph , there is and such that
We call the maximal PSG with anchors in .
The proof gives an explicit construction for .
Proof
Assume , otherwise the statement is trivial.
Construction of . Let be the set of linear paths in that go through at least one operation in . Consider the set of operations in given by the operations whose linear paths are contained in . This defines a subgraph of . Since , there exists . By assumption, is connected, and thus the anchors of are connected in . There is therefore a connected component that contains the set .
Well-definedness of . Consider the PSG of . We must show that is a tree-like graph for the proposition statement to be well-defined. In other words, we must show that the interaction graph of is acyclic and connected. is connected by construction, which implies connectedness of and thus of . It is acyclic because and has exactly operations on more than one linear path. is a thus a tree.
. For any subgraph , its operations must be contained in . Since any is connected and contains , it must further hold that .
We can now prove the equivalence of (1).
: If , then there exists with rooted dual tree
Furthermore, by definition of on rooted trees, a map is defined on , given by the map of on the domain . Recall from section 4.3 that there is a map that maps
: Since , we know from point 1 that . Thus we can define an injective embedding .
Operation splitting leaves the set of values from to , as well as from to unchanged. Similarly, there is a bijection between values in and and thus between edges in and . The pattern embedding hence defines an injective map from tree edges in to tree edges in . We extend this map to a map on the trees by induction over the nodes set of . We start by the root map . Using , we can then uniquely define the image of any child node of in , and so forth inductively.
We show now that the map thus defined is injective. Suppose are nodes in such that . By the inductive construction there are paths from the root to and respectively such that their image under are two paths from to . But is a tree, so both paths must be equal. By bijectivity of , it follows , and thus is injective. Finally, the value and operation weights are invariant under pattern embedding and thus are preserved by definition.
This result means that instead of listing all PSGs for every possible subgraph of , it is sufficient to proceed as follows:
- for every pattern , fix a root operation  and construct the rooted
tree dual of the canonical PSG
 - enumerate every possible root operation in ,
 - enumerate every possible sets of anchors in with root ,
 - for each set , find the maximal PSG with anchors in , and take its rooted tree dual ,
 - find all patterns such that .
 
In other words, if AllAnchors is a procedure that enumerates all possible sets
of anchors  in  and MaximalPathSplit computes the maximal PSG  as
presented in the proof of Proposition 4.9, then
AllPathSplits(G) can simply be obtained by calling AllAnchors and then
returning their respective maximal PSGs in :
def AllPathSplits(G: Graph, root: Operation) -> Set[Graph]:
  all_anchors = AllAnchors(G, root)
  return {MaximalPathSplit(G, pi) for pi in all_anchors}
The missing piece: AllAnchors
#
We can now complete the definition of AllPathSplits by defining the
AllAnchors procedure, which enumerates all possible sets of anchors in 
given a root operation .
The procedure is similar to CanonicalAnchors, described in detail in the
previous paragraphs. In addition to the arguments of CanonicalAnchors,
AllAnchors requires a width  argument. It then returns all sets of
at most  operations1 that form the canonical anchors of some
width- subgraph of  with root . The main difference between
CanonicalAnchors and AllAnchors is that the successive recursive calls (line
22 in CanonicalAnchors) are replaced by a series of nested loops (lines 42–48
in AllAnchors) that exhaustively iterate over the possible outcomes for
different subgraphs of . The results of every possible combination of
recursive calls are then collected into a list of anchor sets, which is
returned.
The part of the pseudocode that is without comments is unchanged from
CanonicalAnchors. Using Proposition 4.3, we
know that we can assume that every operation has at most 3 children, and thus 3
neighbours in G, given that the operations equivalent to parent nodes were
removed.
 1def AllAnchors(
 2    G: Graph, root: Operation, w: int,
 3    seen_paths: Set[int] = {}
 4) -> List[(Set[Operation], Set[int], Graph)]:
 5  # Base case: return one empty anchor list
 6  if w == 0:
 7    return [({}, {}, G)]
 8
 9  operations = Operations(ConnectedComponent(G, root))
10  sorted_operations := sort(operations)
11  operations_queue := queue(sorted_operations)
12
13  op := operations_queue.pop()
14  G = RemoveOperation(G, op)
15  while len(LinearPaths(G, op)) == 1 or
16        issubset(LinearPaths(G, op), seen_paths):
17    op = operations_queue.pop() or return [({}, {}, G)]
18    G = RemoveOperation(G, op)
19
20  seen0 = union(seen_paths, LinearPaths(G, op))
21  # There are always at most three neighbours: we
22  # unroll the for loop of CanonicalAnchors.
23  [child1, child2, child3] = Neighbours(G, op)
24  # Iterate over all ways to split w-1 anchors over
25  # the three children and solve recursively
26  all_anchors = []
27  for 0 <= w1, w2, w3 < w with w1 + w2 + w3 == w - 1:
28    for (anchors1, seen1, G1) in
29        AllAnchors(G, child1, w1, seen0):
30      for (anchors2, seen2, G2) in
31          AllAnchors(G1, child2, w2, seen1):
32        for (anchors3, seen3, G3) in
33            AllAnchors(G2, child3, w3, seen2):
34          # Concatenate new anchor with anchors from all paths
35          anchors = union([op], anchors1, anchors2, anchors3)
36          all_anchors.push((anchors, seen3, G3))
37  return all_anchors
We can represent the sequence of recursive calls to AllAnchors as a tree. The
call tree for the graph used as example to illustrate CanonicalAnchors earlier
is given on the next page.
We now show correctness of the procedure. Let us write  for the set
of sets of anchors returned by AllAnchors(G, r, w, {}).
Let be a graph and be a subgraph of of width . Let be a choice of root operation in . We have
A call tree for an execution of AllAnchors on the example graph of the previous figure with . Starting from the root, each node in the tree corresponds to either picking an operation as anchor or not (thus splitting it). Edges are labelled by the values assigned to  for the respective children of the source node. One path from root to leaf leads to no solution (it is impossible to find an unseen linear path from operation . The other paths each lead to a valid set of three anchors.
The proof is by induction over the width  of the subgraph . The idea is to
map every recursive call in CanonicalAnchors to one of the calls to
AllAnchors on lines 29, 31 or 33. All recursive results are concatenated on
line 36, and thus, the value returned by CanonicalAnchors will be one of the
anchor sets in the list returned by AllAnchors.
Proof
Let  be a connected subgraph of  of width . We prove
inductively over  that if CanonicalAnchors$(H,
r,S)$
then there is a graph  such that  such
that
 AllAnchors
for all valid root operations  of  and all subsets of the linear paths of
 in seen_paths. The statement in the proposition directly follows this
claim.
For the base case , CanonicalAnchors will return the anchors
anchors = [op] as defined on line 19: there is only one linear path, and it is
already in seen_paths, thus for every recursive call to CanonicalAnchors,
the while condition on line 12 will always be satisfied until all operations
have been exhausted and empty sets are returned. In AllAnchors, on the other
hand, The only values of w1, w2 and w3 that satisfy the loop condition on
line 27 for  are w1  w2  w3 . As a result, given the w
 base case on lines 6–7, the lines 35 and 36 of AllAnchors are only
executed once, and the definition of anchors on line 36 is equivalent to its
definition in CanonicalAnchors.
We now prove the claim for  by induction. As documented in AllAnchors,
we can assume that every operation has at most 3 children. This simplifies the
loop on lines 21–25 of CanonicalAnchors to, at most, three calls to
CanonicalAnchors.
Consider a call to CanonicalAnchors for a graph , a root
operation  in  and a set  of linear paths. Let ,  and 
be the length of the values returned by the three recursive calls to
CanonicalAnchors of line 22 for the execution of CanonicalAnchors with
arguments ,  and . Let  and  be the three neighbours of
 in . If the child  does not exist, then one can set  and it
can be ignored – the argument below still holds in that case. The definition of
seen0 on line 20 in AllAnchors coincides with the update to the variable
seen_paths on line 18 of CanonicalAnchors; similarly, the updates to G on
lines 14 and 18 of AllAnchors are identical to the lines 11 and 15 of
CanonicalAnchors that update H. Let the updated seen_paths be the set
, the updated G be  and the updated  be , with
.
As every anchor operation reduces the number of unseen linear paths by exactly
one (using the simplifying assumptions of
section 4.2), it must hold that
. Thus, for a call to AllAnchors with the arguments
, ,  and , there is an iteration of the for loop on line 27 of
AllAnchors such that w1 , w2  and w3 . It follows
that on line 29 of AllAnchors, the procedure is called recursively with
arguments . From the induction hypothesis, we obtain that
there is an iteration of the for loop on line 29 in which the values of
anchors1 and seen1 coincide with the values of the new_anchors and
seen_paths variables after the first iteration of the for loop on line 21 of
CanonicalAnchors. Call the value of seen1 (and seen_paths) .
Similarly, call the updated value of G in AllAnchors  and the updated
value of G in CanonicalAnchors . We have, by the induction hypothesis,
that .
Repeating the argument, we obtain that there are iterations of the for loops
on lines 30 and 32 of AllAnchors that correspond to the second and third
recursive calls to CanonicalAnchors on line 22 of the procedure. Finally, the
concatenation of anchor lists on line 36 of AllAnchors is equivalent to the
repeated concatenations on line 25 of CanonicalAnchors, and so we conclude
that the induction hypothesis holds for .
We will see that the overall runtime complexity of AllAnchors can be easily
derived from a bound on the size of the returned list. For this, we use the
following result:
AllAnchorsAllAnchors is in , where
 is a constant.Proof
Let  be an upper bound for the length of the list returned by a call to
AllAnchors for width . For the base case , . The returned
all_anchors list is obtained by pushing anchor lists one by one on line 36. We
can count the number of times this line is executed by multiplying the length of
the lists returned by the recursive calls on lines 28–32, giving us the
recursion relation
Since is meant to be an upper bound, we replace with equality above to obtain a recurrence relation for . This recurrence relation is a generalisation of the well-known Catalan numbers Stanley, 2015. 2015. Catalan Numbers. Cambridge University Press. doi: 10.1017/CBO9781139871495, equivalent to counting the number of ternary trees with internal nodes: a ternary tree with internal nodes is made of a root along with three subtrees with and internal nodes respectively, with . A closed form solution to this problem can be found in Aval, 2008. 2008. Multivariate Fuss–Catalan numbers. Discrete Mathematics 308, 20 (October 2008, 4660–4669). doi: 10.1016/j.disc.2007.08.100:
satisfying the above recurrence relation with equality, where is a constant obtained from the Stirling approximation:
To obtain a runtime bound for AllAnchors, it is useful to identify how much of
 needs to be traversed. If we suppose all patterns have at most depth ,
then it immediately follows that any operation in  that is in the image of a
pattern embedding must be at most a distance  away from an anchor operation.
We can thus equivalently call AllAnchors on a subgraph of  such that no
linear path is longer than . We thus obtain the following runtime.
AllAnchorsFor
patterns with at most width  and depth , the total runtime of AllAnchors
is in
Proof
We restrict Operations on line 9 to only return the first  operations on
the linear path in each direction, starting at the anchor operation: operations
more than distance  away from the anchor cannot be part of a pattern of depth
.
We use the bound on the length of the list returned by calls to AllAnchors of
Proposition 4.10 to bound the runtime. We can ignore the
non-constant runtime of the concatenation of the outputs of recursive calls on
line 35, as the total size of the outputs is asymptotically at worst of the same
complexity as the runtime of the recursive calls themselves. Excluding the
recursive calls, the only remaining lines of AllAnchors that are not executed
in constant time are the while loop on lines 15–18 and the Operations and
sort calls on lines 9–11. Using the same argument as in CanonicalAnchors,
we can ignore the latter two calls by replacing the queue of operations by a
lazy iterator of operations. The next operation given op and the graph G can
always be computed in  time using a depth-first traversal of G.
Consider the recursion tree of AllAnchors, i.e. the tree in which the nodes
are the recursive calls to AllAnchors and the children are the executions
spawned by the nested for loops on line 28–32. This tree has at most
leaves. A path from the root to a leaf corresponds to a stack of recursive calls
to AllAnchors. Along this recursion path, seen_paths set is always strictly
growing (line 35) and the operations removed from G on lines 14 and 18 are all
distinct. For each linear path, at most  operations are traversed. Thus the
total runtime of the while loop (lines 15–18) along a path from root to leaf
in the recursion tree is in . We can thus bound the overall
complexity of executing the entire recursion tree by
.
Every anchor operation is on at least one previously unseen linear path, thus there can be at most operations in the set of anchors. ↩︎