SPLC 2020 Challenge

A BDD for Linux? The Knowledge Compilation Challenge for Variability

Thomas Thüm, University of Ulm, Germany
What is the number of valid configurations for Linux? How to generate uniform random samples for Linux? Can we create a binary decision diagram for Linux? It seems that the product-line community tries hard to answer such questions for Linux and other configurable systems. However, attempts are often not published due to the publication bias (i.e., unsuccessful attempts are not published). As a consequence, researchers keep trying by potentially spending redundant effort. The goal of this challenge is to guide research on these computationally complex problems and to foster the exchange between researchers and practitioners.

Overview

What is the holy grail of the product-line community? A binary decision diagram for Linux.

Linux is an operating system with thousands of configuration options. These options cannot be arbitrarily combined, as every option typically comes with constraints with respect to several other options. Constraints are specified in KConfig, but can be translated into feature models or propositional logic. Whenever we analyze the Linux kernel for errors, ignoring those constraints would lead to false positives. That is, tools would report errors for invalid configurations, which cannot be used to compile a kernel. Hence, these constraints are crucial for any kind of analysis of Linux.

A binary decision diagram (short BDD) is data structure representing a propositional formula. While there are multiple representations of propositional formulas, BDDs can have the advantage of reducing NP-complete problems into more tractable problems (aka. knowledge compilation). For instance, checking whether a formula represented as a BDD is satisfiable is an operation with constant effort. While operations on BDDs might scale well, the downside of BDDs is that their construction can be intractable. The main reason for the scalability challenge is that the variable ordering heavily influences the size of the BDD and they tend to explode for most variable orderings.

Why do we argue that a BDD for Linux is the holy grail of the product-line community?

First, it seems to be a challenging task. To the best of our knowledge, no one has been able to create a BDD for Linux so far. In recent work, we tried to create a BDD for hundreds of large feature models and failed for 98\% of them (i.e., 100\% of models having more than 550 features). Furthermore, even though other knowledge compilation techniques scaled to many large feature models, not a single knowledge compilation tool scaled to Linux when we tried to compute the number of valid configurations. Similarly, t-wise sampling algorithms typically do not scale to Linux.

Second, there is a considerable amount of research that is based on the translation of the feature model into a BDD. In the past two decades, researchers proposed the use of BDDs to count the number of valid configurations, to compute feature-model slices and differences, for interactive product configuration, to check whether product-line artifacts are consistent, to parse preprocessor-based product lines, to simplify preprocessor annotations, and to lift test-suite generation, data-flow analyses, or model checking to product lines. If we aim to apply that research to Linux or similarly complex configuration spaces, it is an open question whether BDDs can be created for them.

Third, there are many advantages of having a BDD. While it is widely accepted that satisfiability solving scales well for product lines, the potential advantage of a BDD is that one-time effort for the construction can pay-off when the BDD is later employed in follow-up analyses. The potential is amplified by several factors. First, a feature model is typically changed less frequently than implementation artifacts. As a consequence, a new BDD only needs to be created when the feature model changes. Second, for every new revision of the product line there are several analyses needed of which each typically is reduced to numerous satisfiability problems. It is likely that a large portion of the satisfiability problems can be solved more efficiently using a BDD.

What is the goal of this challenge?

While our claim about the holy grail is focused on BDDs and Linux, there is a more general challenge behind this specific one. Linux is just one example of a large-scale configuration space. A BDD is just one example of a knowledge compilation technique. The goal of this challenge is to promote the problem of knowledge compilation for large-scale configuration spaces. In particular, this challenge is not only focused on software configuration, but also configurable systems and product configuration. Furthermore, any representation of the configuration space that invests offline computations (i.e., compilation) in favor of faster online computations would fall into the scope of this challenge.

What is the problem of the current situation? Many product-line researchers seem to address very similar problems for various reasons, but largely without documenting their failed attempts. Other researchers are most likely repeating the same mistakes again. The goal of this challenge is to make this problem explicit, to give researchers a forum to discuss attempts, and to exchange ideas on possible solutions.

Solutions

On the scalability of building binary decision diagrams for current feature models

Tobias Heß, Chico Sundermann and Thomas Thüm
SPLC 2021 Solution
Binary decision diagrams (BDD) have been proposed for numerous product-line analyses. These analyses typically exploit properties unique to decision diagrams, such as negation in constant time and space. Furthermore, the existence of a BDD representing the configuration space of a product line removes the need to employ SAT or #SAT solvers for their analysis. Recent work has shown that the performance of state-of-the-art BDD libraries is significantly lower than previously reported and hypothesized. In this work, we provide an assessment of the state-of-the-art of BDD scalability in this domain and explain why previous results on the scalability of BDDs do not apply to more recent product-line instances.
Solution paper
Video presentation of the solution

Discussion

No discussion section for the moment.