Programming languages are evolving rapidly, with functionally enabled
languages emerging as a strong, viable solution for fast development
turnover. The EU-funded
SEFUNC
(Software engineering properties of functionally enabled languages)
project explored the advantages of these new languages. It looked at
whether the functional characteristics in modern programming languages
improve developer productivity and reduce code complexity.
Since empirical software engineering studies require the acquisition and processing of data from software repositories, the project developed new tools to evaluate existing applications related to functionally enabled languages. It built a formidable repository mining operation to retrieve all data available from GitHub, a popular project hosting, mirroring and collaboration platform.
To achieve its aims, the project team designed GHTorrent – a scalable and queriable offline mirror of data – in order to exploit distributed data collection through pull-based development and visualised language ecosystems. After collection, the data is offered back to the user community through the project's web page, with more than 2 terabytes of data available in two database formats. This is enabling the researchers to conduct comprehensive population quantitative studies in areas such as software ecosystems, distributed collaboration and repository mining.
Importantly, the project team also explored pull-based development, a new paradigm for distributed software development. It performed the first large-scale quantitative analysis of how the pull-based development model works by extracting data from 300 large projects or 170 000 pull requests. This revealed the factors that affect the decision to merge a pull request and the time to process it, providing key data to improve the efficiency of distributed collaborative projects.
It is worthy to note that the project won 'Best data showcase award' at the 2013 Mining Software Repositories conference for its innovative use of distributed crawling and the sharing of valuable data with the community. Project work has led to the publication of numerous papers on the topic and has been cited as the latest research in conferences. Its results will no doubt contribute to advancing software engineering and the rapid emergence of functionally enabled languages.