Exploration and extraction

5.4. Exploration and extraction

In the previous section, we proposed a data structure D\mathcal{D} that is confluently persistent and can be used to explore the space of all possible transformations of a graph transformation system (GTS). We are now interested in using D\mathcal{D} to solve optimisation problems over the space of reachable graphs in the GTS. Following the blueprint of equality saturation (see section 5.2), we proceed in two phases:

  1. Exploration.  Given an input graph GG, populate D\mathcal{D} with events that correspond to rewrites applicable to graphs reachable from GG,
  2. Extraction.  Given a cost function ff, extract the optimal graph in D\mathcal{D}, i.e. the graph that is a flattening of a set of compatible edits DDD \subseteq \mathcal{D} and minimises ff.

Each phase comes with its respective challenges, which we discuss in this section. We will first look at the exploration phase, which requires a way to find and construct new events δ\delta that can be added to D\mathcal{D}. We will consider the extraction phase in the second part of this section and see that the problem of optimisation over the power set P(D)\mathcal P(\mathcal D) can be reduced to boolean satisfiability formula that admit simple cost functions in the use cases of interest.

There is an additional open question that we do not cover in this section and would merit a study of its own: the choice of heuristics that guide the exploration phase to ensure the “most interesting parts” of the GTS rewrite space are explored. We propose a very simple heuristic to this end in the benchmarks of section 5.5, but further investigations are called for.

Exploring the data structure with pattern matching #

We established in the previous section that rewrites that apply on GG can equivalently be added as events to D\mathcal{D}. In other words, a graph GG' is reachable from GG using the rewrites of a GTS if and only if there is a set of compatible events DDD \subseteq \mathcal{D} such that GG' is the graph obtained from FlattenHistory on input DD.

To expand D\mathcal{D} to a larger set DD\mathcal{D}' \supseteq \mathcal{D}, we must find all applicable rewrites on all graphs within D\mathcal{D}. A naive solution would iterate over all subsets of DDD \subseteq \mathcal{D}, check whether they form a compatible set of events, compute FlattenHistory if they do, and finally run pattern matching on the obtained graph to find the applicable rewrites. We can do better.

The idea is to traverse the set of events in D\mathcal{D} using the glueing relations μ\mu that connect vertices between events. Define the function μˉ:V(D)P(V(D))\bar{\mu}: V(\mathcal{D}) \to \mathcal{P}(V(\mathcal{D})) that is the union of all glueing relations μ\mu in events in D\mathcal{D}:

μˉ(v)={μc(v)δc=(Vc,Vc,μc)P1(δv)}.\bar\mu(v) = \{\mu_c(v) \mid \delta_c = (V_c, V^-_c, \mu_c) \in P^{-1}(\delta_v)\}.

where we write δv\delta_v for the owner of vv, i.e. the (unique) event δvD\delta_v \in \mathcal{D} such that vV(δv)v \in V(\delta_v). We define the set ED(v)\mathcal{E}_D(v) of equivalent vertices of vv that are compatible with DD by applying μˉ(v)\bar\mu(v) recursively and filtering out vertices whose owner is not compatible with DD. It is easiest to formalise this definition using pseudocode for the EquivalentVertices procedure. The set of vertices in ED(v)\mathcal{E}_D(v) are vertices of descendant events of δv\delta_v.

def EquivalentVertices(
    v: Vertex, events: Set[Event]
) -> Set[Vertex]:
    all_vertices = set({v})
    for w in mu_bar(v):
        new_events = union(events, {owner(w)})
        if AreCompatible(new_events):
            all_vertices = union(all_vertices,
                EquivalentVertices(w, new_events)
            )
    return all_vertices

Whilst it looks as though EquivalentVertices does not depend on D\mathcal{D}, it does so through the use of the function calls to mu_bar.

We use EquivalentVertices to repeatedly extend a set of pinned vertices πV(D)\pi \subseteq V(\mathcal{D}). A set of pinned vertices must satisfy two properties:

  • the set Dπ={δvvπ}D_\pi = \{\delta_v \mid v \in \pi \} is a set of compatible events,
  • there is no vertex vπv \in \pi and event δD\delta \in D such that vV(δ)v \in V^-(\delta).

As a result, for the flattened graph G=flat(Dπ)G = flat(D_\pi), it always holds that πV(G)\pi \subseteq V(G). Furthermore, if G(π)GG(\pi) \subseteq G is the subgraph of GG induced by π\pi, then for any superset of pinned vertices ππ\pi' \supseteq \pi, we have G(π)G(π)G(\pi) \subseteq G'(\pi') where G=flat(Dπ)G' = flat(D_{\pi'}). In other words: extending a set of pinned vertices results in an extension of the flattened graph – a very useful property when pattern matching. This property follows from the second property above and the definition of FlattenHistory.

This gives us the following simple procedure for pattern matching:

  1. Start with a single pinned vertex π={v}\pi = \{v\}.
  2. Construct partial embeddings PG(π)P \rightharpoonup G(\pi) for patterns PP.
  3. Pick a new vertex vv in G=flat(Dπ)G = flat(D_\pi) but not in G(π)G(\pi) (that we would like to extend the domain of definition of our pattern embeddings to).
  4. For all vertices vED(v)v' \in \mathcal{E}_D(v), build new pinned vertex sets π=π{v}\pi' = \pi \cup \{v'\}, filter out the sets π\pi' that are not valid pinned vertex sets.
  5. Repeat steps 2–4 until all pattern embeddings have been found.

Step 1 is straightforward – notice that pattern matching must be started at a vertex in V(D)V(\mathcal{D}), so finding all patterns will require iterating over all choices of vv. The pattern embeddings are constructed over iterations of step 2: each iteration can be seen as one step of the pattern matcher – for instance, as presented in chapter 4 – extending the pattern embeddings that can be extended and discarding those that cannot. If all possible pattern embeddings have been discarded, then matching can be aborted for that π\pi set.

How step 3 should be implemented depends on the types of graphs and patterns that are matched on. It is straightforward in the case of computation graphs with only linear values, i.e. hypergraphs with hyperedges that have directed, ordered endpoints and vertices that are incident to exactly one incoming and one outgoing edge. In that case, vv can always be chosen in such a way as to ensure progress on the next iteration of step 2, i.e. the domain of definition of at least one partial pattern embedding PG(π)P \hookrightarrow G(\pi) will be extended by one vertex. The text in the blue box below explains this case in more detail.

Step 4 produces all possible extensions of π\pi to pinned vertex sets π\pi' that include a descendant vv' of vv (or vv itself). All vertices in ED(v)\mathcal{E}_D(v) are in events compatible with DD by definition, so to check that π\pi' is a valid pinned vertex set, we only need to check the second property of pinned vertices. Let PP be a pattern, let SS be the set of all π\pi sets under consideration. Step 4 increments the sizes of all pinned vertex sets πS\pi \in S whilst maintaining the following invariant.

Invariant for step 4.  If there is a superset DDπD' \supseteq D_\pi of compatible events such that PP embeds in G=flat(D)G' = flat(D'), then there is a superset ππ\pi' \supseteq \pi of vertices such that PP embeds in flat(Dπ)flat(D_{\pi'}).

Finally, step 5 ensures the process is repeated until, for all partial pattern embeddings, either the domain of definition is complete, or the embedding of PP is not possible. Given that step 4 increments the size of π\pi sets at each iteration, this will terminate as long as the vertex picking strategy of step 3 selects vertices that allow to extend (or refute) the partial pattern embeddings constructed and extended in step 2. This is satisfied, for example, in the case of linear minIR graphs, as explained in the box.

Choosing the next vertex to pin in linear minIR (step 3).   Assuming patterns are connected, for any partial pattern embedding PG(π)P \hookrightarrow G(\pi) there is an edge ePE(P)e_P \in E(P) with no image in G(π)G(\pi) but such that at least one of the endvertex vPv_P of ePe_P has an image vGv_G in π\pi – say, ePe_P is the outgoing edge of vPv_P. Let vPv'_P be an endvertex of ePe_P in PP that has no image in G(π)G(\pi) – and say, it is the ii-th outgoing endvertex of ePe_P in PP.

Then vPv_P uniquely identifies an edge eGe_G in G=flat(Dπ)G = flat(D_\pi) – the unique outgoing edge of vGv_G – which, in turn, uniquely identifies a vertex vGV(G)v'_G \in V(G) – the ii-th outgoing endvertex of eGe_G. By choosing vGv'_G in step 3, step 4 will create pinned vertex sets that include all possible vertices equivalent to vGv_G', which are all vertices that vGv_G might be connected to through its outgoing edge1. The next iteration of step 2 will then either extend the partial pattern embedding to vPv_P or conclude that an embedding of PP is not possible.

Using the approach just sketched, pattern matching can be performed on the persistent data structure D\mathcal{D}. The runtime of steps 2 and 3 depend on the type of graphs and patterns that are matched on – these are, however, typical problems that appear in most instances of pattern matching, independently of the data structure D\mathcal{D} used here. A concrete approach to pattern matching and results for the graph types of interest to quantum compilation was presented in chapter 4.

The runtime of step 4 and the number of overall iterations of steps 2–4 required for pattern matching will depend on the number of events in D\mathcal{D} (AreCompatible runs in runtime linear in the number of ancestors), the number of equivalent vertices that successive rounds of step 4 will return and the types of patterns and pattern matching strategies.

Extraction using SAT #

Moving on to the extraction phase, we are now interested in extracting the optimal graph from D\mathcal{D}, according to some cost function of interest. Unlike exploring the “naive” search space of all graphs reachable in the GTS, the optimal solution within the persistent data structure D\mathcal{D} cannot simply be read out.

We showed in section 5.3 that finding an optimal graph GG' that is the result of a sequence of rewrites on an input graph GG is equivalent to finding an optimal set of compatible events DΓ(D)P(D)D \in \Gamma(\mathcal{D}) \subseteq \mathcal{P}(\mathcal{D}) – the optimal graph GG' is then recoved by taking G=flat(D)G' = flat(D).

There are 2D2^{|\mathcal{D}|} elements in P(D)\mathcal{P}(\mathcal{D}), which we encode as a boolean assignment problem by introducing a boolean variable xδx_\delta for all events δD\delta \in \mathcal{D}. The set of events DD is then given by

D={δDxδ}P(D).D = \{\delta \in \mathcal{D} \mid x_\delta\} \in \mathcal{P}(\mathcal{D}).

We can constrain the boolean assignments to compatible sets DD by introducing a boolean formula

¬(xδxδ)\neg (x_\delta \land x_{\delta'})

for all δ,δD\delta,\delta' \in \mathcal{D} such that their vertex deletion sets intersect V(δ)V(δ)V^-(\delta) \cap V^-(\delta') \neq \varnothing. Any assignment of {xδδD}\{x_\delta \mid \delta \in \mathcal{D}\} that satisfies all constraints of this format defines a compatible set of events.

How many such pairs of events (δ,δ(\delta,\delta') are there? By definition of parents, two events δ\delta and δ\delta' can only have overlapping vertex deletion sets if they share a parent. Assuming all events have at most ss children, ensuring DD is a set of compatible events requires at most O(s2D)O(s^2 \cdot |\mathcal{D}|) constraints.

To further restrict to DΓ(D)D \in \Gamma(\mathcal{D}), i.e. to sets of compatible events D=DD = \lfloor D \rfloor that contain all ancestors, we can add the further constraints: δD\delta \in D implies P(δ)DP(\delta) \subseteq D. This introduces up to sDs \cdot |\mathcal{D}| implication constraints

xδ(¬xδ),x_\delta \lor (\neg x_{\delta'}),

for all δ,δD\delta,\delta' \in \mathcal{D} such that δchildren(δ)\delta' \in children(\delta).

For any set of events D\mathcal{D}, the conjunction of all constraints presented above, i.e. the event compatibility constraints (1) and the parent-child relation constraints (2), defines a boolean satisfiability problem (SAT) with variables xδx_\delta. We have shown:

Proposition 5.4Extraction as SAT problem

Consider a GTS with a constant upper bound ss on the number of rewrites that may overlap any previous rewrite.

The set of valid sequences of rewrites that can be extracted from a set of events D\mathcal{D} in the GTS is given by the set of satisfying assignments of a SAT problem Cook, 1971Stephen A. Cook. 1971. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing - STOC ’71. ACM Press, 151--158. doi: 10.1145/800157.805047 Moskew., 2001Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang and Sharad Malik. 2001. Chaff: engineering an efficient SAT solver. In Proceedings of the 38th conference on Design automation - DAC ’01. ACM Press, 530--535. doi: 10.1145/378239.379017 with D|\mathcal{D}| variables of size O(D)O(|\mathcal{D}|).

Finding the optimal assignment #

We now have to find the optimal assignment among all satisfiable assignments for the SAT problem given above. In the most general case where the cost function ff to be minimised is given as a black box oracle on the graph GG', i.e. on the flattened history of the solution set DDD \subseteq \mathcal{D}, this optimisation problem is hard2.

However, if ff can be expressed as a function of xδx_\delta instead of the flattened history G=flat(D)G' = flat(D), then the ‘hardness’ can be encapsulated within an instance of a SMT problem (satisfiability modulo theories Nieuwe., 2006Robert Nieuwenhuis and Albert Oliveras. 2006. On SAT Modulo Theories and Optimization Problems. In Theory and Applications of Satisfiability Testing - SAT 2006. Springer Berlin Heidelberg, 156--169. doi: 10.1007/11814948_18 Barrett, 2018Clark Barrett and Cesare Tinelli. 2018. Satisfiability Modulo Theories), a well-studied generalisation of SAT problems for which highly optimised solvers exist Moura, 2008Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 337--340. doi: 10.1007/978-3-540-78800-3_24 Sebast., 2015Roberto Sebastiani and Patrick Trentin. 2015. OptiMathSAT: A Tool for Optimization Modulo Theories. In Computer Aided Verification. Springer International Publishing, 447--454. doi: 10.1007/978-3-319-21690-4_27. A class of cost functions for which the SMT encoding of the optimisation problem becomes particularly simple are local cost functions:

Definition 5.1Local cost function

A cost function ff on graphs is local if for all rewrites rr there is a cost Δfr\Delta f_r such that for all graphs GG that rr applies to

f(r(G))=f(G)+Δfr.f(r(G)) = f(G) + \Delta f_r.

The cost Δfr\Delta f_r of a rewrite rr also immediately defines a cost to the event that rr defines δ=r\delta = r. We can thus associate a cost Δfδ\Delta f_\delta with each event δD\delta \in \mathcal{D}, given by the cost of any of the rewrites that δ\delta defines.

An instance of such a local cost function often used in the context of the optimisation of computation graphs are functions of the type

f(G)=vV(G)w(v)f(G) = \sum_{v \in V(G)} w(v)

for some vertex weight function ww – i.e. cost functions that can be expressed as sums over the costs w()w(\cdot) associated to individual vertices in GG3. Indeed, it is easy to see that in this case we can write

f(r(G))=vr(V(G))w(v)=vV(G)w(v)vVw(v)+vVRw(v):=Δfr=f(G)+Δfr,\begin{aligned}f(r(G)) &= \sum_{v \in r(V(G))} w(v)\\&= \sum_{v\in V(G)} w(v) - \underbrace{\sum_{v \in V^-} w(v) + \sum_{v \in V_R} w(v)}_{:= \Delta f_r}\\&= f(G) + \Delta f_r,\end{aligned}

where VV^- and VRV_R are the vertex deletion set and replacement graph of rr respectively.

As discussed in section 2.2, many of the most widely used cost functions in quantum compilation are local, as the cost of a quantum computation is often estimated by the required number of instances of the most expensive gate type (such as \texttt{CX} gates on noisy devices, or \texttt{T} gates for hardware with built-in fault tolerance protocols).

In these cases, the cost function is integer valued and the extraction problem is indeed often sparse:

Definition 5.2Sparse cost function

The local cost function ff is said to be ε\varepsilon-sparse on D\mathcal{D} if

{δDΔfδ=0}(1ε)D.\big|\{\delta \in \mathcal{D}\,|\,\Delta f_\delta = 0 \}\big| \geq (1 - \varepsilon) |\mathcal{D}|.

In case of ε\varepsilon-sparse local cost functions, the SAT problem on D\mathcal{D} can be simplified to only include

D0={δDΔfδ0}\mathcal{D}_{\neq 0} = \{\delta \in \mathcal{D} \mid \Delta f_\delta \neq 0\}

by repeatedly applying the following constraint simplification rules on any δ0D\delta_0 \in \mathcal{D} such that Δfδ0=0\Delta f_{\delta_0} = 0:

  • for every parent δpparents(δ0)\delta_p \in parents(\delta_0) and child δcchildren(δ0)\delta_c \in children(\delta_0), remove the parent-child constraints between δp\delta_p and δ0\delta_0 and between δ0\delta_0 and δc\delta_c. Insert in their place a parent-child constraint between δp\delta_p and δc\delta_c.
  • for every non-compatible sibling event δsD,δsδ0\delta_s \in \mathcal{D}, \delta_s \neq \delta_0, remove the compatibility constraint between δ0\delta_0 and δs\delta_s. Insert in its place a compatibility constraint between δs\delta_s and δc\delta_c for all δcchildren(δs)\delta_c \in children(\delta_s).

This reduces the SAT or SMT problem to a problem with D0=εD|\mathcal D_{\neq 0}| = \varepsilon |\mathcal{D}| variables and at most O(min(D,ε2D2)O(min(|\mathcal{D}|, \varepsilon^2|\mathcal{D}|^2) constraints.

With the completion of this section, we have described an equivalent computation on D\mathcal{D} for every step of a GTS-based optimisation problem:

  1. a rewrite that can be applied on a graph GG can be added as an event to D\mathcal{D},
  2. a graph GG' that results from a sequence of rewrites can be recovered from D\mathcal{D} using FlattenHistory,
  3. the set of all graphs reachable from events in D\mathcal{D} can be expressed as a SAT problem; depending on the cost function, the optimisation over that space can then take the form of an SMT problem.

In essence, using the confluently persistent data structure D\mathcal{D} we replace a naive, exhaustive search over the space G\mathcal{G} of all graphs reachable in the GTS with a SAT (or SMT) problem – solvable using highly optimised dedicated solvers that could in principle handle search spaces with up to millions of possible rewrites Zulkos., 2018Edward Zulkoski. 2018. Understanding and Enhancing CDCL-based SAT Solvers.


  1. To realise this, notice that all vertices equivalent to vGv_G' are vertices that will be merged with vGv_G'. Hence, they will all be attached to the outgoing edge of vGv_G at its ii-th outgoing endvertex. ↩︎

  2. Hardness can be seen by considering the special case of the extraction problem in which all events are compatible and no two events have a parent-child relation: then there are no constraints on the solution space and the optimisation problem requires finding the minimum of an arbitrary oracle over 2D2^{|\mathcal{D}|} inputs. ↩︎

  3. A similar argument also applies to cost functions that sum over graph edges, as would be the case in minIR, where operations are modelled as hyperedges. ↩︎