Sponsors

Microsoft logo

Radboud University logo

TU Delft logo

Become a sponsor

Donations

  • [17 Nov 2015] Microsoft donated $98,000 in Azure credits
  • [30 Oct 2016] Google donated $1000 in Google Cloud credits

The project would also like to thank the anonymous donors for their generocity. GHTorrent will become a better project thanks to you!

Papers using GHTorrent

This list is a subset of researchers who have used GHTorrent for research or teaching. If you are a user of the dataset, please consider adding your details. You can do it using the following simple steps:

  • Add information about your organization and yourself to this file on Github. You should describe how you used GHTorrent in a few lines. It is OK to include links. Please ensure that institution names are listed in alphabetic order.

  • If you are interested to link your publications referencing GHTorrent, you should include a Bibtex record in this file on Github. You can then reference them in this file.

Inria/Mines Nantes/LINA/AtlanMod

  • Jordi Cabot: Research on usage of issue labels in GitHub.
    1. Cabot, J., Cánovas Izquierdo, J. L., Cosentino, V., & Rolandi, B. (2015). Exploring the Use of Labels to Categorize Issues in Open-Source Software Projects. In Proceedings of the 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (pp. 479–483).
    2. Cánovas Izquierdo, J. L., Cosentino, V., Rolandi, B., Bergel, A., & Cabot, J. (2015). GiLA: GitHub Label Analyzer. In Proceedings of the 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (pp. 550–554).

NUDT/Trustie

  • Yue Yu: Research on reviewer recommendation, and latency of pull requests. Used GHTorrent to extract our dataset.
    1. Yu, Y., Wang, H., Yin, G., & Ling, C. X. (2014). Reviewer Recommender of Pull-Requests in GitHub. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution (pp. 609–612). IEEE. doi:10.1109/ICSME.2014.107
    2. Yu, Y., Wang, H., Filkov, V., Devanbu, P., & Vasilescu, B. (2015). Wait For It: Determinants of Pull Request Evaluation Latency on GitHub. In 12th Working Conference on Mining Software Repositories. IEEE.

Radboud University Nijmegen/DS

  • Georgios Gousios: Maintentance, qualitative research on pull requests, pull request prioritization,developer profiles
    1. Gousios, G., Zaidman, A., Storey, M.-A., & Deursen, A. van. (2015). Work Practices and Challenges in Pull-Based Development: The Integrator’s Perspective. In Proceedings of the 37th International Conference on Software Engineering.
    2. Hauff, C., & Gousios, G. (2015). Matching GitHub developer profiles to job advertisements. In Proceedings of the 12th International Conference on Mining Software Repositories.
    3. van der Veen, E., Gousios, G., & Zaidman, A. (2015). Automatically Prioritizing Pull Requests. In Proceedings of the 12th International Conference on Mining Software Repositories.

TU Delft/SERG

  • Georgios Gousios: Initial design and implementation. Project hosting. Lean GHTorrent. Research on pull requests. Project openess reports.
    1. Gousios, G., & Spinellis, D. (2012). GHTorrent: GitHub’s Data from a Firehose. In Proceedings of the 9th Working Conference on Mining Software Repositories (pp. 12–21). IEEE. doi:10.1109/MSR.2012.6224294
    2. Gousios, G. (2013). The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (pp. 233–236). Retrieved from http://www.gousios.gr/bibliography/G13.html
    3. Gousios, G., Pinzger, M., & Deursen, A. van. (2014). An Exploratory Study of the Pull-based Software Development Model. In Proceedings of the 36th International Conference on Software Engineering (pp. 345–355). New York, NY, USA: ACM. doi:10.1145/2568225.2568260
    4. Gousios, G., & Zaidman, A. (2014). A Dataset for Pull-based Development Research. In Proceedings of the 11th Working Conference on Mining Software Repositories (pp. 368–371). New York, NY, USA: ACM. doi:10.1145/2597073.2597122
    5. Gousios, G., Vasilescu, B., Serebrenik, A., & Zaidman, A. (2014). Lean GHTorrent: GitHub Data on Demand. In Proceedings of the 11th Working Conference on Mining Software Repositories (pp. 384–387). New York, NY, USA: ACM. doi:10.1145/2597073.2597126

TU Eindhoven/SET

  • Bogdan Vasilescu: Integration of GitHub and Stack Overflow data. Research on productivity of GitHub developers. Sentiment analysis of GitHub discussions. Lean GHTorrent. Continuous integration in GitHub.
  • Alexander Serebrenik: Research on productivity of GitHub developers. Sentiment analysis of GitHub discussions. Research on continuous integration in GitHub.
    1. Vasilescu, B., Filkov, V., & Serebrenik, A. (2013). Stack Overflow and GitHub: Associations between software development and crowdsourced knowledge. In Proceedings of the 2013 ASE/IEEE International Conference on Social Computing (pp. 188–195). IEEE. doi:http://dx.doi.org/10.1109/SocialCom.2013.35
    2. Gousios, G., Vasilescu, B., Serebrenik, A., & Zaidman, A. (2014). Lean GHTorrent: GitHub Data on Demand. In Proceedings of the 11th Working Conference on Mining Software Repositories (pp. 384–387). New York, NY, USA: ACM. doi:10.1145/2597073.2597126
    3. Pletea, D., Vasilescu, B., & Serebrenik, A. (2014). Security and Emotion: Sentiment Analysis of Security Discussions on GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (pp. 384–387). ACM.
    4. Vasilescu, B., van Schuylenburg, S., Wulms, J., Serebrenik, A., & van den Brand, M. G. J. (2014). Continuous integration in a social-coding world: Empirical evidence from GitHub. In Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, Early Research Achievements (pp. 401–405). IEEE.

University of California, Davis/DECAL

  • Bogdan Vasilescu: Research on effects of diversity in GitHub teams.
    1. Vasilescu, B., Posnett, D., Ray, B., van den Brand, M. G. J., Serebrenik, A., Devanbu, P., & Filkov, V. (2015). Gender and tenure diversity in GitHub teams. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems. ACM.
    2. Vasilescu, B., Filkov, V., & Serebrenik, A. (2015). Perceptions of Diversity on GitHub: A User Survey. In Proceedings of the 8th International Workshop on Cooperative and Human Aspects of Software Engineering. IEEE.

University of Victoria/SEGAL

  • Kelly Blincoe: Research on Implicit Coordination and its impact on productivity.
  • Eirini Kalliamvakou: Research on collaborative development using decentralized workflows and GitHub. Used GHTorrent to extract information about pull requests for potential mining perils.
    1. Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D. M., & Damian, D. (2014). The Promises and Perils of Mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (pp. 92–101). ACM.

API keys contributors

The following people's contributions of GitHub OAuth API keys has allowed the data collection process to catch on with GitHub's 10x growth since the GHTorrent project started. If you would like to contribute and API key, please follow the process specified here.

Bram Adams, Maryi Arciniegas Méndez, Syed Arefinul Haque, Efthimia Aivaloglou, Alberto Bacchelli, Moritz Beller, Matthieu Bizien, Erik Bowers, Frederic Gingras, Roberta de Souza Coelho, Victor Costan, Ayushi Dalmia, Jos Demmers, Arie van Deursen, Niel Ernst, Joe Fleming, Georgios Gousios, Samarendra M Hedaoo, Mark Hills, Arun Kalyanasundaram, Syafiq Kamarul Azman, Lindsey Lanier, Pablo Loyola, Yao Lu, Mahdi Moqri, Graeme Nathan, Matteo Orrù, Gustavo Pinto, Dominic Safaric, Jasmine Sandhu, Alexander Serebrenik, Diomidis Spinellis, Simon Symeonidis, Chris Thompson, Peter Tröger, Bogdan Vasilescu, Marko Vit, Meike Wiemann, Yue Yu, Alexey Zagalsky, Andy Zaidman, Nosheen Zaza