4.6. An automaton for multi-pattern matching
We have shown in the previous sections that graph pattern-matching can be reduced to a problem of tree inclusions, with trees of fixed width . To complete the pattern-matching algorithm, we must provide a fast way to evaluate the subtree relation for many trees representing the set of all patterns we wish to match.
More precisely, for patterns with width , fix a root operation in for each and consider the rooted tree duals of the canonical PSGs , with the canonical anchors. Then given a subject graph , we wish to compute the set
for all anchor sets and root operation in . This
corresponds to the IsSubTree predicate introduced in the sketch of the
algorith in section 4.4.
Instead of considering the trees of PSGs, it will prove easier to consider the contracted PSGs (cPSGs)
Such tree inclusions are equivalent to finding embeddings in the subject graph itself, provided that we keep track of the and weight maps (see section 4.3).
It will be useful to remind ourselves the following properties of contracted PSGs. Every operation of a cPSG (and thus every node in its rooted dual tree) is an anchor operation of the PSG. Per Proposition 4.4, the rooted dual tree of a cPSG is a ternary tree and has exactly nodes. Finally, recall the concept of an open value of a graph, i.e. a value that is missing either a use or define operation (see section 4.2).
Reduction of tree inclusion to string prefix matching #
Now consider two contracted spanning tree reductions and with values and . To simplify notation, define
for some choice of root operations and in and , respectively. We lift the relation on rooted dual trees of PSGs introduced in section 4.5 to rooted dual trees of cPSGs in Such a way that there is an inclusion relation between two rooted dual trees of PSGs if and only if the same relation holds on the rooted duals of cPSGs.
We say that if and only if
- the trees share the same root operation,
- is a subtree of ,
- the map coincides on the common subtree, and
- the map satisfies for all :
where designates the embedding of into given by the tree embedding.
The first three conditions are taken as-is from the relation on non-contracted trees, whilst the fourth condition on the map is specific to contracted trees.
Using Proposition 4.2, there are at most 2 open values for each linear path in the graph, and thus at most open values in a rooted dual tree of a cPSG of width . For each such contracted rooted dual, we can thus define a contracted string tuple given by the values of the map evaluated in the (up to) open values1.
If is the restriction of to the domain of definition of non-open values of a cPSG, the fourth condition for the inclusion relation on rooted dual cPSGs, given above becomes an equality condition when restricted to non-open values. A special case of this property of particular interest to us is stated as the following result. The relation on strings refers to prefix inclusion, i.e. if and only if is a prefix of .
Let and be the contracted string tuples of and respectively. Then if and only if the trees share the same root, are isomorphic, have the same and maps and for all : .
The proof of this follows directly from observing that rooted duals of cPSGs have the same set of nodes and that the restriction to non-open values must satisfy equality.
Why restricting ourselves to trees of the same width ? It is sufficient for our purposes! All patterns are of width by assumption and so are the rooted dual trees of the form , given that .
The string prefix matching problem is a simple computational task that can be generalised to check for multiple string patterns at the same time using a prefix tree. An overview of this problem can be found in appendix A. We can thus obtain a solution for the pattern matching problem for patterns:
As above, let
- be a graph, with a set of operations and a choice of root operation,
- be patterns of width and depth , with choices of root operations and canonical anchors
The set of all pattern embeddings mapping the canonical anchor set to and root to for can be computed in time using at most pre-computed prefix tree of size at most , each constructed in time complexity .
Proof
For each pattern, we consider its canonical spanning tree reduction and construct a multi-dimensional prefix tree (see Appendix ) for each group of patterns that share the same spanning tree reduction.
Given a graph , we can compute the cPSG of for anchors and map its rooted dual tree to the corresponding prefix tree. This can be done in time by using a search tree. We can restrict to a graph of size by truncating the linear paths to at most length, as in the proof of Proposition 4.12. Thus we can assume .
The rest of the proof and the runtime follow from the multi-dimensional prefix tree construction detailed in Appendix ).
Combining everything #
Finally, putting Proposition . and Proposition 4.12 together, we obtain our main result.
Let be patterns with width and depth . The pre-computation runs in time and space complexity
For any subject graph , the pre-computed prefix tree can be used to find all pattern embeddings in time
where is a constant.
Proof
The pre-computation consists of running the CanonicalAnchors procedure on each
of the patterns and then transforming them into a map of prefix trees
using Proposition .. By
Proposition 4.7, CanonicalAnchors runs in
for each pattern, where we used that
for all patterns. The total runtime of prefix construction is thus
The complexity of pattern matching itself on the other hand is composed of two
parts: the computation of all possible anchor sets , and the
execution of the prefix string matcher for each of the trees resulting from
these sets . As AllAnchors must be run for every choice of
root vertex in , the runtime is thus obtained by multiplying i)
with ii) the runtime of the prefix tree matching
(Proposition .), and with iii) ,
i.e. the number of anchor lists returned by AllAnchors
(Proposition 4.10):
where is the bound for the number of anchor lists returned by
AllAnchors. The result follows.
The values can be ordered as usual by using the total lexicographic order on port labels of the tree. ↩︎