GHTorrent on the Google cloud
GHTorrent can be accessed over Google Cloud services. To access the data requires you to have a Google Cloud account. Reasonable use is free of charge and, in the case of BigQuery, it should no longer require a credit card. (Pub/Sub still requires a credit card). You can check what Google considers reasonable at any given moment here.
Google BigQuery contains an up to date import of the latest GHTorrent MySQL dump.
Google Pub/Sub exposes real-time streams of GitHub activity.
Both services can be accessed through the Web, the command line (after installing the Google Cloud command line utils) or though various programming languages.
With BigQuery, you can query GHTorrent's MySQL dataset using an SQL-like language (lately, BigQuery also supports vanilla SQL); more importantly, you can join the dataset with other open datasets (e.g. GitHub's own project data, Reddit, TravisTorrent etc) hosted on BigQuery.
To get the most popular programming languages by number of bytes written, run the following:
To get the user with the most Java commits in the Netherlands in June 2016, do the following:
See also some queries by Felipe Hoffa.
Pub/Sub allows subscribers to get events of what is happening on GitHub (or at least GHTorrent's interpretation of what is happening on GitHub) in almost real time. To do so, one needs to subscribe to one of the available topics with a client in order to start receiving events.
The service is complimentary, even though less fine-grained, to GHTorrent's own streaming interface. As is also the case with GHTorrent streaming, the contents of the streams are generated by following the live MongoDB server replication stream. See the code here.
To subscribe to a topic, e.g.
commits, run the following:
gcloud beta pubsub subscriptions create my_commits_subscription --topic projects/ghtorrent-bq/topics/commits
To start receiving events, you can try the command line
gcloud beta pubsub subscriptions pull --auto-ack --max-messages 5 -- my_commits_subscription
The available topics are the following: