Jabier Martinez, Sorbonne Université, CNRS, LIP6, France
Nicolas Ordoñez, Universidad Nacional de Colombia, Colombia
Xhevahire Tërnava, Sorbonne Université, CNRS, LIP6, France
Tewfik Ziadi, Sorbonne Université, CNRS, LIP6, France
Jairo Aponte, Universidad Nacional de Colombia, Colombia
Eduardo Figueiredo, Universidade Federal de Minas Gerais, Brazil
Marco Tulio Valente, Universidade Federal de Minas Gerais, Brazil
Feature location is a traceability recovery activity to identify the implementation elements associated to a characteristic of a system. Besides its relevance for software maintenance of a single system, feature location in a collection of systems received a lot of attention as a first step to re-engineer system variants (created through clone-and-own) into a Software Product Line (SPL). In this context, the objective is to unambiguously identify the boundaries of a feature inside a family of systems to later create reusable assets from these implementation elements. Among all the case studies in the SPL literature, variants derived from ArgoUML SPL stands out as the most used one. However, the use of different settings, or the omission of relevant information (e.g., the exact configurations of the variants or the way the metrics are calculated), makes it difficult to reproduce or benchmark the different feature location techniques even if the same ArgoUML SPL is used. With the objective to foster the research area on feature location, we provide a set of common scenarios using ArgoUML SPL and a set of utils to obtain metrics based on the results of existing and novel feature location techniques.
Previous to the Challenge paper, we can find many references to the case study.
ArgoUML SPL case study has its own entry in the ESPLA Catalog (extractive SPL adoption catalog of case studies).
Daniel Cruz et al. VAMOS 2019 A Literature Review and Comparison of Three Feature Location Techniques using ArgoUML-SPL
Among the contributions of this work, we can find some directly related to their use of the ArgoUML-SPL benchmark
A characterization of generated variants regarding the textual information that they contain. This information is relevant as this is the primary source for text-based information retrieval techniques.
A comparison of three text-based information retrieval techniques (Paragraph Vectors, Latent Dirichlet Allocation, and Latent Semantic Indexing) using only the feature names as queries and documents pre-processing.
The results suggest that Latent Semantic Indexing (LSI) outperforms the other two in this benchmark. However, precision, recall, and F-Measure have very low values (0.16, 0.19, and 0.079 respectively). This suggests that text-based information retrieval techniques should be combined with other techniques forming hybrid feature location techniques, and that LSI seems to be a good candidate for the text-based information retrieval part.