Related work

4.1. Related work

Our proposed solution can be seen as a specialisation of RETE networks Forgy, 1982Charles L. Forgy. 1982. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence 19, 1 (Septempter 1982, 17--37). doi: 10.1016/0004-3702(82)90020-0 Varró, 2013Gergely Varró and Frederik Deckwerth. 2013. A Rete Network Construction Algorithm for Incremental Pattern Matching and derivatives Ian, 2003Wright Ian and James A. R. Marshall. 2003. The execution kernel of RC++: RETE*, a faster RETE with TREAT as a special case. International Journal of Intelligent Games and Simulation 2, 1 (Feb 2003, 36-48) Armstr., 2014Dylan Armstrong. 2014. Memory Efficient Stream Reasoning onResource-Limited Devices Mirank., 1987D. P. Miranker. 1987. TREAT: a new and efficient match algorithm for AI production systems to the case of graph pattern matching. The additional structure obtained from restricting our considerations to graphs results in a simplified network design that allows us to derive worst-case asymptotic runtime and space bounds that are polynomial in the parameters relevant to our use case¹ – overcoming a key limitation of RETE.

Another well-studied application of large-scale pattern matching is in the context of stochastic biomolecular simulations Sneddon, 2010Michael W Sneddon, James R Faeder and Thierry Emonet. 2010. Efficient modeling, simulation and coarse-graining of biological complexity with NFsim. Nature Methods 8, 2 (December 2010, 177--183). doi: 10.1038/nmeth.1546 Bachman, 2011John A Bachman and Peter Sorger. 2011. New approaches to modeling complex biochemistry. Nature Methods 8, 2 (January 2011, 130--131). doi: 10.1038/nmeth0211-130, particularly the Kappa project Danos, 2004Vincent Danos and Cosimo Laneve. 2004. Formal molecular biology. Theoretical Computer Science 325, 1 (Septempter 2004, 69--110). doi: 10.1016/j.tcs.2004.03.065. Stochastic simulations depend on performing many rounds of fast pattern-matching for continuous Monte Carlo simulations Yang, 2008Jin Yang, Michael I. Monine, James R. Faeder and William S. Hlavacek. 2008. Kinetic Monte Carlo method for rule-based modeling of biochemical networks. Physical Review E 78, 3 (Septempter 2008, 031910). doi: 10.1103/physreve.78.031910. However, unlike our use case, the procedure typically does not need to scale well to a large number of patterns. In Danos, 2007Vincent Danos, Jérôme Feret, Walter Fontana and Jean Krivine. 2007. Scalable Simulation of Cellular Signaling Networks, Danos et al. introduced a pre-computation step to accelerate matching by establishing relations between patterns that activate or inhibit further patterns. This idea was later expanded upon and formalised in categorical language in Boutil., 2017Pierre Boutillier, Thomas Ehrhard and Jean Krivine. 2017. Incremental Update for Graph Rewriting. The ideas presented in Boutil., 2017Pierre Boutillier, Thomas Ehrhard and Jean Krivine. 2017. Incremental Update for Graph Rewriting are similar to ours; their formalism has the advantage of being more general but does not present any asymptotic complexity bounds and suffers from similar worst-case complexities as RETE.

A similar problem has also been studied in the context of multiple-query optimisation for database queries Sellis, 1988Timos K. Sellis. 1988. Multiple-query optimization. ACM Transactions on Database Systems 13, 1 (March 1988, 23–52). doi: 10.1145/42201.42203 Ren, 2016Xuguang Ren and Junhu Wang. 2016. Multi-Query Optimization for Subgraph Isomorphism Search. Proceedings of the VLDB Endowment 10, 3 (November 2016, 121–132). doi: 10.14778/3021924.3021929, but it has limited itself to developing caching strategies and search heuristics for specific use cases. Finally, using a pre-compiled data structure for pattern matching was already proposed in Messmer, 1999Bruno T. Messmer and Horst Bunke. 1999. A decision tree approach to graph and subgraph isomorphism detection. Pattern Recognition 32, 12 (December 1999, 1979--1998). doi: 10.1016/S0031-3203(98)90142-X. However, with a $n^{\Theta(m)}$ space complexity – $n$ is the input size and $m$ the pattern size – it does not scale to large input graphs, even for small patterns.

RETE networks have been shown to have exponential worst-case space (and thus time) complexity Rakib, 2018Abdur Rakib and Ijaz Uddin. 2018. An Efficient Rule-Based Distributed Reasoning Framework for Resource-bounded Systems. Mobile Networks and Applications 24, 1 (October 2018, 82--99). doi: 10.1007/s11036-018-1140-x, although performance in practical use cases can vary widely Uddin, 2016Ijaz Uddin, Hafiz Mahfooz Ul Haque, Abdur Rakib and Mohamad Rafi Segi Rahmat. 2016. Resource-Bounded Context-Aware Applications: A Survey and Early Experiment. ↩︎

←Return to chapter