LinuxCon Keynote Q&A: The OSS Behind a Tweet

38

The roster of keynote speakers for this year’s LinuxCon North America and CloudOpen events is mind blowing. I’ve got my favorites marked on my schedule and am excited to share a recent conversation I had with one of them. Twitter’s Open Source Manager Chris Aniszczyk gives us a teaser for his keynote, details the open source projects Twitter is using, including Linux, and shares his favorite tweets of all time.

You are keynoting at the upcoming LinuxCon/CloudOpen about “The Open Source Technology Behind a Tweet.” Can you give us a teaser of what we’re going to hear from you on that topic?

Aniszczyk: On the surface, Twitter is a simple real time service where the unit currency is 140 character messages called Tweets. However, if you look underneath the surface, there are over 400,000,000 tweets being sent out a day at an average steady state of 4,500 Tweets a second. At this scale, you have to deal with some interesting real time engineering problems. In the keynote, I will address why Twitter tends to favor open source software and how we tackle some of these problems. The context of the talk will revolve around what happens behind the scenes when you send a Tweet and will trace the life of a Tweet from our backend to the eventual frontend. In the end, I expect the audience to leave with a better appreciation of open source technology and what happens behind the scenes when a humble Tweet appears in their timeline.

What role does open source software play at Twitter? Is it a tool, a philosophy, a no-brainer or all of the above? Why?

Aniszczyk: In my opinion, it’s a no-brainer as open source software allows us to customize and tweak code to meet our fast-paced engineering needs as our service and community grows. When we plan new engineering projects at Twitter, we always make sure to measure our requirements against the capabilities of open source offerings, and prefer to consume open source software whenever it makes sense. Through this method, much of Twitter is now built on open source software and as a result, the open source way is now a pervasive part of our culture. On top of that, there is a positive cycle of teaching and learning within open source communities that we benefit from.

Here are a few concrete examples of open source software we consume:

  • MySQL is heavily used for primary storage of Tweets; we develop our MySQL fork in the open to collaborate with the upstream community.
  • Cassandra, Hadoop, Lucene, Pig and a variety of Apache projects are used within our infrastructure to power services such as analytics and search. We also contribute back to these projects and have sponsored the Apache Software Foundation.
  • Memcached is used heavily in our caching infrastructure to scale our ever-growing traffic; we recently open sourced Twemcache which was heavily inspired by the Memcached code base.

On top of that, we produce a variety of open source software too:

  • Iago is a load generator that we created to help us test services before they encounter production traffic. Iago provides us with capabilities that are uniquely suited for Twitter’s environment and the precise degree to which we need to test our services.
  • Zipkin is a distributed tracing system that we created to help us gather timing data for all the disparate services involved in managing a request to the Twitter API.
  • Scalding is a Scala library that makes it easy to write MapReduce jobs in Hadoop by taking advantage of built-in integration with Scala and the JVM.

How does Twitter look at Linux as it builds out its technology and its business? What advantages does Linux provide a company like Twitter?

Aniszczyk: Linux powers the majority of Twitter and serves as our technology backbone. We have tens of thousands machines running all types of services that run a customized version of Linux. The reason we prefer Linux is that we are able to innovate faster given the flexibility to customize it based on our needs. We also love the large and mature development community moving the state of the kernel forward. For example, if you look at the latest Linux Development Report, there are more than 7,800 developers from 800 different companies contributing to make Linux better for everyone.

In terms of specifics, we use a few different versions to see what works best in production, but as of today, we are mainly on the 2.6.39 release. We customize the kernel by adding some patches such as enhanced core dump functionality, UnionFS support and the ability to allow TCP congestion window to be set on a socket basis.

In the future, we are looking to further customize the kernel to optimize it for our production environment and contributing some of the work upstream. If this type of work interests you, might I remind you that we are looking for Linux developers to join the flock and would love to hear from you at
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
.

What’s your favorite all-time tweet?

Aniszczyk: This question isn’t fair without more context, I have so many favorite Tweets! I’ll give you two though. The first one is technical and from @DEVOPS_BORAT parody account. Anyone who learned Git recently should be able to relate to the tweet.

The second Tweet comes from Kevin Durant (@KDTrey5) who I loved to watch when he was playing for the University of Texas in Austin. I just loved the serendipity of it all, Durant ended up playing an impromptu pick-up game of flag football with one of his followers.

Thanks to Chris for taking time to answer a few questions as we prepare for next month’s big event. Don’t forget to register by July 28 at the $500 rate.