SPLC 2018 Challenge

Feature Location Benchmark with ArgoUML SPL

Jabier Martinez, Sorbonne Université, CNRS, LIP6, France
Nicolas Ordoñez, Universidad Nacional de Colombia, Colombia
Xhevahire Tërnava, Sorbonne Université, CNRS, LIP6, France
Tewfik Ziadi, Sorbonne Université, CNRS, LIP6, France
Jairo Aponte, Universidad Nacional de Colombia, Colombia
Eduardo Figueiredo, Universidade Federal de Minas Gerais, Brazil
Marco Tulio Valente, Universidade Federal de Minas Gerais, Brazil
Feature location is a traceability recovery activity to identify the implementation elements associated to a characteristic of a system. Besides its relevance for software maintenance of a single system, feature location in a collection of systems received a lot of attention as a first step to re-engineer system variants (created through clone-and-own) into a Software Product Line (SPL). In this context, the objective is to unambiguously identify the boundaries of a feature inside a family of systems to later create reusable assets from these implementation elements. Among all the case studies in the SPL literature, variants derived from ArgoUML SPL stands out as the most used one. However, the use of different settings, or the omission of relevant information (e.g., the exact configurations of the variants or the way the metrics are calculated), makes it difficult to reproduce or benchmark the different feature location techniques even if the same ArgoUML SPL is used. With the objective to foster the research area on feature location, we provide a set of common scenarios using ArgoUML SPL and a set of utils to obtain metrics based on the results of existing and novel feature location techniques.

Solutions

A Graph-Based Feature Location Approach Using Set Theory

Richard Müller and Ulrich Eisenecker
SPLC 2019 Solution
The ArgoUML SPL benchmark addresses feature location in Software Product Lines (SPLs), where single features as well as feature combinations and feature negations have to be identified. We present a solution for this challenge using a graph-based approach and set theory. The results are promising. Set theory allows to exactly define which parts of feature locations can be computed and which precision and which recall can be achieved. This has to be complemented by a reliable identification of feature-dependent class and method traces as well as refinements. The application of our solution to one scenario of the benchmark supports this claim.
Solution paper

Comparison-Based Feature Location in ArgoUML Variants

Gabriela Karoline Michelon, Lukas Linsbauer, Wesley Klewerton Guez Assunção and Alexander Egyed
SPLC 2019 Solution
Identifying and extracting parts of a system’s implementation for reuse is an important task for re-engineering system variants into Software Product Lines (SPLs). An SPL is an approach that enables systematic reuse of existing assets across related product variants. The re-engineering process to adopt an SPL from a set of individual variants starts with the location of features and their implementation, to be extracted and migrated into an SPL and reused in new variants. Therefore, feature location is of fundamental importance to the success in the adoption of SPLs. Despite its importance, existing feature location techniques struggle with huge, complex, and numerous system artifacts. This is the scenario of ArgoUML-SPL, which stands out as the most used case study for the validation of feature location approaches. In this paper we use an automated feature location technique and apply it to the ArgoUML feature location challenge posed.
Solution paper

Discussion

References outside the SPLC Challenge track

Previous to the Challenge paper, we can find many references to the case study.
ArgoUML SPL case study has its own entry in the ESPLA Catalog (extractive SPL adoption catalog of case studies).

Daniel Cruz et al. VAMOS 2019
A Literature Review and Comparison of Three Feature Location Techniques using ArgoUML-SPL
Among the contributions of this work, we can find some directly related to their use of the ArgoUML-SPL benchmark
  • A characterization of generated variants regarding the textual information that they contain. This information is relevant as this is the primary source for text-based information retrieval techniques.
  • A comparison of three text-based information retrieval techniques (Paragraph Vectors, Latent Dirichlet Allocation, and Latent Semantic Indexing) using only the feature names as queries and documents pre-processing.
  • The results suggest that Latent Semantic Indexing (LSI) outperforms the other two in this benchmark. However, precision, recall, and F-Measure have very low values (0.16, 0.19, and 0.079 respectively). This suggests that text-based information retrieval techniques should be combined with other techniques forming hybrid feature location techniques, and that LSI seems to be a good candidate for the text-based information retrieval part.
  • The implementation of the techniques are publicly available: https://github.com/DVSCross/TextualIRFeaturesImpl