Streaming updates from GHTorrent
To obtain access, please send us your public key as described here.
ssh -L 5672:streamer.ghtorrent.org:5672 firstname.lastname@example.org
This will create a local port 5672 to which you can connect your AMQP client. No shell is allocated for security reasons.
Our queue server, RabbitMQ implements the AMQP protocol. Some familiarity with it is necessary before using the streaming service. The RabbitMQ getting started page is a very good starting point with lots of examples in many languages.
The streaming service uses topic exchanges and concequently message-based routing (see here for details). To start receiving messages, a client needs to:
- connect to the server
- declare a queue
- bind the declared queue to the default exchange with routing key
The following examples are in Ruby.
Connecting to the server
Assuming your connection works as described above, you should have port
5672 listening on localhost. You should connect and define the
exchange (if you define other exchnages, you will receive no messages
as there is no script posting to them).
require 'bunny' conn = Bunny.new(:host => '127.0.0.1', :port => 5672, :username => 'streamer', :password => 'streamer') conn.start ch = conn.create_channel exchange = ch.topic('ght-streams', :durable => true)
Declaring a queue
You can declare as many queues as you want (within reasonable limits). To
make the queue unique, we ask you to prefix your queue name with your
gousiosg_queue). You should also make your queue
non persistent, to avoid consuming server resouces when your program
Binding queues to routing keys
All messages posted to
ght-streams exchange have an attached routing key.
This allows clients to declare queues that selectively receive only
the messages they are interested into. The routing key is structured as
prefix denotes the type of the updated item
evt: Denotes a GitHub event, as received by GHTorrent
ent: Denotes an update in a MongoDB collection
The second part of the key denotes the updated item; its value depends on
prefix. The permitted values are the following:
evtprefixes, it is the name of a public GitHub event shortened and lower-cased:
entprefixes, it is the name of the MongoDB collection that was updated. One of:
The third part of the routing key denotes the update action. The allowed
values are (this only applies to
ent type messages;
evt type messages
are only marked as
insert: An insertion of a record to a MongoDB collection
delete: A deletion from a MongoDB record
update: An update to a MongoDB record
Let's see some example routing keys:
evt.repos.insert: This will retrieve all new inserts to the
evt.fork.*: This will retrieve all fork events
ent.*.update: This will retrieve all updates on MongoDB collections
*.*.insert: This will retrieve all new events and all MongoDB inserts
Things to consider
- Queues are configured to be garbage collected when the client that declared them has been disconnected.
- Messages have a pre-configured Time-To-Live equal to 1 minute. If your client is not fast enough, they will be discarded. For this reason, we recommend client-side buffering of unprocessed messages.
- All exchanges not named
ght-streamsare deleted every 5 minutes.
- All queues not prefixed with
username_are deleted every 5 minutes.