Measuring the Value of Corporate Open Source Contributions

194
Brian Warner is the Senior Open Source Strategist for the Samsung Open Source Group.

This is part 1 of a series on the value of an open source software development group for companies that rely on open source technology. See Part 2: Hiring Open Source Maintainers is Key to Stable Software Supply Chain.

If you’ve worked in a corporate development environment, you certainly understand that metrics are everything. If you’re doing development, you are probably familiar with the feeling that metrics aren’t perfect. I can’t count how many times I’ve heard, “Well, I’m measured on X because it generates a number, but let me tell you the real story…”

Certain things are both meaningful and easy to measure such as the number of conference talks accepted and presented, internal training sessions delivered, or other employees that are mentored. But what do you do about code?

What Does it Mean to Measure the Value of Your Open Source Contributors?

As hard as it is to measure an individual developer’s code contributions using a standardized set of statistics, it can be even harder to measure the value of a company’s open source contribution strategy. This is one of those things that everybody knows has value (we all know it’s a lot), but how do you quantify it?

One of the first things we have to get comfortable with is the arbitrary nature of valuing contributions. A single line that fixes a buffer overflow is arguably more valuable than a single line of documentation, but by how much? We can argue this endlessly, but it very quickly becomes a problem of bikeshedding, where arguments about measurements become a distraction from development itself. Realistically, aside from giving people like me something to think about, there’s not a lot of value in arguing the semantics…

However, one valuable measurement is how much of your code has landed in an upstream project. In our case, this is the single most important metric for the members of the Samsung Open Source Group. As a result, we specifically hire high-performance maintainers and committers. I’ll dive more deeply into this in the next article in this series.

The Methodology We Use

There’s a fantastic tool, written by Jon Corbet of LWN.net and Greg Kroah-Hartman, called gitdm. Jon is famously modest about it, but the value of this tool cannot be understated. In essence it parses git log and extracts meaningful statistics like who added/removed how many lines of code, what company they were from, etc. If you feed it a specific range of time or versions, it’ll tell you who did what.

A while back I wrote a front-end script called gitalize (which I’m fully comfortable admitting is a bit of a hack) that calls gitdm recursively on an arbitrary number of repositories and allows you to slice up the analysis over periods of time. This is great for seeing trends in the data, and with a bit of graph work it’s pretty easy to benchmark your contributions against others in your company, or other companies at large.

How to Measure the Value of Your Open Source Contributors

There are two key methods for measuring the contributions of our developers: patch count and lines committed.

For patch count, we go not by patches generated, nor patches sent, but rather we consider actual patches that land upstream in a project’s repositories. At first glance you might think this is another one of those arbitrary metrics; just because we can measure it doesn’t mean it’s useful, but there’s more to it.

In open source, etiquette is very important when sending code. It needs to be sent in small, understandable series of patches. It also needs to either be entirely obvious or the author must explain it very well. While it may not be perfect or final, it must not introduce security or stability issues. Finally, and most importantly, it must pass peer review.

By measuring and incentivizing our team to improve the number of patches that land upstream we are implicitly saying that the behaviors that get code accepted upstream, whatever those behaviors may be for your particular project, are the ones we value in Samsung’s Open Source Group. It just so happens that generating more small patches is a better community behavior than a few huge ones. Essentially, the better you play within your contributor community, the higher your accepted patch count will be, and we want to reward that.

Our second metric is the number of lines of code that are committed. While this is far from a perfect measure, it is generally recognized that productive coders produce a lot of code.

Taken together, these two metrics do a pretty good job of providing an aggregate view of productivity, impact, and good OSS project citizenship. We know there will always be nuances that can’t possibly be captured by statistics, but these strike the best balance between measurements that satisfy corporate metrics requirements, and letting our people stay focused on what they do best.

For us this is critical, because at the end of the day productivity, i.e. landing patches upstream, is what matters most. Stay tuned for the next post in this series as it will cover what these measurements have told us about the success of the Open Source Group here at Samsung.

 This article is republished with permission from the Samsung Open Source Group Blog.

Read more: Hiring Open Source Maintainers is Key to Stable Software Supply Chain.