Your first time on this page? Allow me to give some explanations.
Awesome Empirical Software Engineering
A curated repository of software engineering repository mining data sets
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you dspinellis & contributors
View Topic on GitHub:
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
Awesome Empirical Software Engineering
😎 Awesome lists about all kinds of interesting topics
Software-artifact infrastructure repository; Java, C, C++, and C# software together with test suites and fault data.
About 20 datasets related to software engineering research.
Collaborative collection and analysis of free/libre/open source project data.
A Database of Real Faults and an Experimental Infrastructure to Enable Controlled Experiments in Software Engineering Research
The Bug Catalog of the Maven Ecosystem
Generating the Blueprints of the Java Ecosystem (MSR Data Paper 2015)
Multi-extract and Multi-level Dataset of Mozilla Issue Tracking History
A Data Set of OCL Expressions on GitHub
Continuous Unix commit history from 1970 until today
Graph-based dataset of commit history of 8,431 real-world Android apps.
Collection of models and metrics from Eclipse JDT Core, PDE UI, Equinox Framework, Lucene, Mylyn, and their histories.
Code reviews of OpenStack, LibreOffice, AOSP, Qt, Eclipse.
Collection of 70 realistically Complex Regression Errors that were systematically extracted from the repositories and bug reports of four open-source software projects: Make, Grep, Findutils, and Coreutils.
Activity such as commits, stars, prices, and market cap of over 200 cryptocurrency projects on GitHub over time. Raw, historic data is also available.
Collection of stacktraces of Exceptions encountered by users of the Eclipse IDE, as retrieved by the AERI reporting system.
All the spreadsheets and emails used in the paper 'Enron's Spreadsheets and Related Emails: A Dataset and Analysis'.
Bug Dataset of 15 Java open-source projects characterized by static source code metrics.
GitHub data accessible through Google's BigQuery platform.
Collection of grammars of DSLs and GPLs, some extracted from metamodels and document schemata.
The Linux Kernel 4.21 Call Graphs produced using CScout.
Snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database.
Data set containing a collection of engineered software projects from GHTorrent.
Graph of the development history and file metadata of >80 million software projects from various forges (GitHub, Gitlab, Debian, PyPI, Google Code, etc) in a deduplicated and unified representation (paper here).
STAte Machine INference Approaches) data are used to benchmark techniques for learning deterministic finite state machines (FSMs).
Anonymized dump of all user-contributed content on the Stack Exchange network.
Provides free and easy-to-use Traivs CI build analyses.
Data about various aspects of Debian (e.g. packages, bugs, mainteners) in the same SQL database.
A library for mining of path-based representations of code (and more)
A multi-language tokenizer for extracting identifiers from source code.
A tool for mining commits from Git repositories and diffs to automatically extract code change pattern instances and features with ast analysis
Collect and view OSS cryptocurrency development.
Database smell detector
Detects smells and computes metrics of Java code
An agile tool to analyze Git repositories
This projects mines maven central and creates a global dependency graph
Send Sir Perceval on a quest to retrieve and gather data from software repositories.
Smell detection tool for Puppet code
Python Framework to analyse Git repositories
C Quality Metrics
Calculate the score of a repository based on best engineering practices.
A vulnerability patch gathering tool
Domain-specific language and infrastructure that eases mining software repositories.
Compute source code metrics and detect a variety of implementation, design, and architecture smells for C#.
Free/Libre/Open Source tools for Software Development Analytics.