Computation graphs and linearity

3.2. Computation graphs and linearity

Computation graphs represent the flow of data between operations in a program, with nodes as operations and edges as data dependencies. Widely used in machine learning frameworks and GPU optimisations Bergst., 2011James Bergstra, Frédéric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, David Warde-Farley, Ian Goodfellow, Arnaud Bergeron and others. 2011. Theano: Deep learning on gpus with python. In NIPS 2011, BigLearning Workshop, Granada, Spain Zhao, 2023Yuxuan Zhao, Qi Sun, Zhuolun He, Yang Bai and Bei Yu. 2023. AutoGraph: Optimizing DNN Computation Graph for Parallel GPU Kernel Execution. Proceedings of the AAAI Conference on Artificial Intelligence 37, 9 (June 2023, 11354--11362). doi: 10.1609/aaai.v37i9.26343, they are conceptually equivalent to dataflow graphs used in compiler design, which were pioneered by Feo, 1990John T. Feo, David C. Cann and Rodney R. Oldehoeft. 1990. A report on the sisal language project. Journal of Parallel and Distributed Computing 10, 4 (December 1990, 349--366). doi: 10.1016/0743-7315(90)90035-n and Kahn, 1976Gilles Kahn and David MacQueen. 1976. Coroutines and networks of parallel processes and are now central to most compiler IRs.

In classical computations, these graph representations of computations are essentially term graphs Barend., 1987H. P. Barendregt, M. C. J. D. Eekelen, J. R. W. Glauert, J. R. Kennaway, M. J. Plasmeijer and M. R. Sleep. 1987. Term Graph Rewriting – sets of algebraic expressions that are stored as trees, combined with an important optimisation known as term sharing. When identical subexpressions appear multiple times, they can be represented as one computation and referenced from multiple locations, creating a directed acyclic graph rather than a term tree Plump, 1999Detlef Plump. 1999. Term Graph Rewriting. (October 1999). This sharing enables a more efficient representation. It can also be used as a compiler optimisation to identify subexpressions that can be cached and shared across expression evaluations for a more efficient execution – a technique known as common subexpression elimination (CSE) Cocke, 1970John Cocke. 1970. Global common subexpression elimination. In Proceedings of a symposium on Compiler optimization -. ACM Press, 20--24. doi: 10.1145/800028.808480.

Each edge of a computation graphs corresponds to a unique value: the output of a previous computation that is being passed on to new operations. These values flow along edges in the graph – hence dataflow graph. Values are immutable: they are defined once and then passed as input to further operations, where they can only be consumed, never modified. In compiler speak, programs expressed using such immutable values are often called single static assignment (SSA) programs Cytron, 1991Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman and F. Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4 (October 1991, 451--490). doi: 10.1145/115372.115320 Rosen, 1988B. K. Rosen, M. N. Wegman and F. K. Zadeck. 1988. Global Value Numbers and Redundant Computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL ’88. ACM Press, 12--27. doi: 10.1145/73560.73562. In SSA:

Every value is defined exactly once,
Every value may be used any number of times (including zero).

Quantum computing throws this second pillar of SSA into the bin. Values in quantum computations are the result of computations on quantum data, and as such must obey the no-cloning and no-deleting theorems (section 2.1). We call values subject to these restrictions linear¹. They introduce the following constraint on valid computation graphs:

Every linear value must be used exactly once.

Linear values change fundamentally how transformations of the computation graph must be specified. Where compilers on classical data can:

freely share common subexpressions (term sharing),
undo term sharing, i.e. duplicate shared terms into independent subterms, and
delegate the identification and deletion of obsolete code to specialised passes (e.g., dead code elimination Cytron, 1991Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman and F. Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4 (October 1991, 451--490). doi: 10.1145/115372.115320 Briggs, 1994Preston Briggs and Keith D. Cooper. 1994. Effective partial redundancy elimination. ACM SIGPLAN Notices 29, 6 (June 1994, 159--170). doi: 10.1145/773473.178257),

quantum compilers must enforce much stricter invariants on IR transformations – or risk producing invalid programs.

In classical compilers, IR modification APIs (such as MLIR’s PatternRewriter) decouple program transformation from code deletion. Program transformations are specified by copying existing values and introducing new values and operations as needed, while the actual deletion of unused code is deferred to specialized dead code elimination passes. This approach is no longer feasible in the presence of linear values. Computation graphs for quantum computations must adopt proper graph rewriting semantics, in which the explicit deletion of obsolete values and operations is just as much a part of the rewriting data as the new code generation.

The terminology comes from “linear” logic Girard, 1987Jean-Yves Girard. 1987. Linear logic. Theoretical Computer Science 50, 1 (1--101). doi: 10.1016/0304-3975(87)90045-4. I apologise for slamming additional semantics on what I recognise is an already very overloaded term. ↩︎

←Return to chapter