Maintainer confidential: Opportunities and challenges of the ubiquitous but under-resourced Yocto Project

7318

By Richard Purdie

Maintainers are an important topic of discussion. I’ve read a few perspectives, but I’d like to share mine as one of the lesser-known maintainers in the open source world.

Who am I, and what do I do? I have many job titles and, in many ways, wear many hats. I’m the “architect” for the Yocto Project and the maintainer and lead developer for both OpenEmbedded-Core and BitBake. I’m the chair of the Yocto Project Technical Steering Committee (TSC) and a member of the OpenEmbedded TSC. I am also a Linux Foundation Fellow, representing a rare “non-kernel” perspective. The fellowship was partly a response to an industry-wide desire for me to work in a position of independence for the good of the projects and communities I work with rather than any one company.

The different roles I’ve described hint at the complexities that are part of the everyday tasks of maintaining a complex open source project. Still, to many, it could look like a complex labyrinth of relationships, directions, and decisions to balance.

What the Yocto Project is

I still need to tell you more about what I do, so I should explain what the Yocto Project does. Most people realize Linux is all around us but have yet to think much about how it gets there or how to maintain or develop such systems. There is much more to a Linux system than just a kernel, and there are many use cases where a traditional desktop Linux distribution isn’t appropriate. In simple terms, the Yocto Project allows people to develop custom Linux (and non-Linux) systems in a maintainable way.

For a sense of scale, around 65% of the world’s internet traffic runs through devices from a specific manufacturer, and they have hundreds of millions of devices in the field. Those devices have software derived from the Yocto Project. The copy of Linux in Windows, “Windows Subsystem for Linux”, originally derived from the Yocto Project. Alongside the main operating system, most servers have a base management controller, which looks after the server’s health. The openBMC project provides that software and builds on the Yocto Project. A similar situation exists for cars using Automotive Grade Linux, which derives from the Yocto Project as well. The Comcast RDK is an open source UI software stack built using the project and is widely used on media devices such as set-top boxes, and the Yocto Project has also built LG’s TV WebOS operating system. We’ve even had a Yocto Project built system orbiting Mars!

Those examples are tips of the iceberg, as we only know some of the places it is in use; being open source, they don’t have to tell us. The Yocto Project feeds into things all around us. The fact that people don’t know about it is a sign we’ve done a good job—but a low profile can also mean it misses out on recognition and resourcing.

The premise of the Yocto Project is to allow companies to share this work and have one good shared toolset to build these custom systems in a maintainable, reproducible, and scalable way.

How we got here

Now, we come to my role in this. I’m the crazy person who thought this project was possible and said so to several companies just over a decade ago. Then, with the support of some of them, many very talented developers, and a community, I took some existing open source projects and grew and evolved them to solve the problem, or at least go a significant way to doing so! 

The project holds the principle of shared contributions and collaboration, resulting in a better toolset than any individual company or developer could build. Today, I keep this all working.

It may sound like a solved problem, but as anyone working with a Linux distribution knows, open source is continually changing, hardware is continually changing, and the “distro” is where all this comes together. We must work to stay current and synchronized with the components we integrate. 

The biggest challenge for us now is being a victim of our success. The original company sponsorship of developers to work on Yocto understandably scaled back, and many of those developers moved on to other companies. In those companies, they’re often now focused on internal projects/support, and the core community project feels starved of attention. It takes time to acquire the skillsets we need to maintain the core, as the project is complex. Everyone is hoping someone else helps the project core.

I’m often asked what features the project will have in its next release. My honest answer is that I don’t know, as nobody will commit to contributions in advance. Most people focus on their own products or projects, and they can’t get commitment from their management to spend time on features or bug fixing for the core, let alone agree to any timescale to deliver them. This means I can’t know when or if we will do things.

A day in my life as the Yocto Project architect 

I worked for a project member company until 2018, which generously gave me time to work on the project. Times change, and rather than moving on to other things, I took a rather risky decision at the time to move to get funding directly from the project as I feared for its future. Thankfully it did work out, and I’ve continued working on it.

Richard Purdie, Linux Foundation Fellow and Yocto Project architect

There are other things the project now funds. This includes our “autobuilder” infrastructure, a huge automated test matrix to find regressions. Along with the autobuilder and admin support to keep it alive, the project also funds a long-term support (LTS) release maintainer (we release an LTS every two years), documentation work, and some help in looking after incoming patch testing with the autobuilder, integrating new patches and features. 

There are obvious things in my day-to-day role, such as reviewing patches, merging the ones that make sense, and giving feedback on those with issues. Less obvious things include needing to debug and fix problems with the autobuilder. 

Sadly, no one else can keep the codebase that supports our test matrix alive. The scale of our tests is extensive, with 30+ high-power worker machines running three builds at a time, targeting the common 32- and 64-bit architectures with different combinations of core libraries, init systems, and so on. We test under qemu and see a lot of “intermittent” failures in the runtime testing where something breaks, often under high load or sometimes once every few months. Few people are willing to work on these kinds of problems, but, left unchecked, the number of them makes our testing useless as you can’t tell a real failure from the random, often timing-related ones. I’m more of a full-time QA engineer than anything else!

Bug fixing is also an interesting challenge. The project encourages reporting bugs and has an active team to triage them. However, we need help finding people interested in looking into and fixing identified issues. There are challenges in finding people with both the right skills and time availability. Where we have trained people, they generally move on to other things or end up focused on internal company work. The only developer time I can commit is my own.

Security is a hot topic. We do manage to keep versions of software up to date, but we don’t have a dedicated security team; we rely on the teams that some project users have internally. We know what one should do; it is just unfortunate that nobody wants to commit time to do it. We do the best we can. People love tracking metrics, but only some are willing to do the work to create them or keep them going once established.

Many challenges arise from having a decent-sized team of developers working on the project, with specific maintainers for different areas, and then scaling back to the point where the only resource I can control is my own time. We developed many tools currently sitting abandoned or patched up on an emergency basis due to a lack of developer resources to do even basic maintenance. 

Beyond the purely technical, there are also collaboration and communication activities. I work with two TSCs, the project member organizations, people handling other aspects of the project (advocacy, training, finance, website, infrastructure, etc.), and developers. These meetings add up quickly to fill my calendar. If we need backup coverage in any area, we don’t have many options besides my time to fall back on.

The challenges of project growth and success

Our scale also means patch requirements are more demanding now. Once, when the number of people using the project was small, the impact of breaking things was also more limited, allowing a little more freedom in development. Now, if we accept a change commit and something breaks, it becomes an instant emergency, and I’m generally expected to resolve it. When patches come from trusted sources, help will often be available to address the regressions as part of an unwritten bond between developers and maintainers. This can intimidate new contributors; they can also find our testing requirements too difficult.

We did have tooling to help new contributors—and also the maintainers—by spotting simple, easily detected errors in incoming patches. This service would test and then reply to patches on the mailing list with pointers on how to fix the patches, freeing maintainer time and helping newcomers. Sadly, such tools require maintenance, and we lost the people who knew how to look after this component, so it stopped working. We formed plans to bring it back and make the maintenance easier, but we’ve struggled to find anyone with the time to do it. I’ve wondered if I should personally try to do it; however, I just can’t spend the chunk of time needed on one thing like that, as I would neglect too many other things for too long.

I wish this were an isolated issue, but there are other components many people and companies rely upon that are also in a perilous state. We have a “layer index,” which allows people to search the ecosystem to find and share metadata and avoid duplicating work. Nobody is willing and able to spend time to keep it running. It limps along; we do our best to patch up issues, but we all know that, sooner or later, something will go badly wrong, and we will lose it. People rely on our CROPs container images, but they have no maintainer.

I struggle a lot with knowing what to do about these issues. They aren’t a secret; the project members know, the developers know, and I raise them in status reports, in meetings, and wherever else I can. Everyone wants to work elsewhere as long as they ‘“kind of ’work” or aren’t impacting someone badly. Should I feel guilty and try to fix these things, risking burnout and giving up a social life, so I have enough time to do so? I shouldn’t, and I can’t ask others to do that, either. Should I just let these things crash and burn, even if the work in rebuilding would be much worse? That will no longer be a choice at some point, and we are slowly losing components.

Over the holiday period, I also realized that project contributions have changed. Originally, many people contributed in their spare time, but many are now employed to work on it and use it daily as part of their job. There have been more contributions during working hours than on weekends or holidays. During the holiday period, some key developments were proposed by developers having “fun” during their spare time. Had I not responded to these, helping with wider testing, patch review, and feedback, they likely would have stalled and failed, with people no longer having time when back outside the holiday period. The contributions were important enough that I strongly felt I should support them, so I did, the cost being that I didn’t get so much of a break myself.

As you read this blog and get a glimpse of my day, I want you to leave with an understanding that all projects, large and small, have their own challenges, and Yocto isn’t alone. 

I love the project; I’m proud of what we’ve done with it, companies, and a community together. Growth and success have their downsides, though we see some issues I never expected. I am confident that the project can and will survive one way or another, come what may, as I’ve infused survival traits into its DNA.

Where the Yocto Project is going

There is also the future-looking element. What are the current trends? What do we need to adapt to? How can we improve our usability, particularly for new users? There is much to think about.

Recently, after I raised concerns about feature development, the project asked for a “five-year plan” showing what we could do in that timeframe. It took a surprising amount of work to pull together the ideas and put cost/time estimates against them, and I put a lot of time into that. Sadly, the result doesn’t yet have funding. I keep being asked when we’ll get features, but there needs to be more willingness to fund the development work needed before we even get to the question of which developers would actually do it!

One question that comes up a lot is the project’s development model. We’re an “old school” patch on a mailing list, similar to the kernel. New developers complain that we should have GitHub workflows so they can make point-and-click patch submissions. I have made submissions to other projects that way, and I can see the attraction of it. Equally, it does depend a lot on your review requirements. We want many people to see our patches, not just one person, and we greatly benefit from that comprehensive peer review. There are benefits in what we do, and being told that we need to understand the reasons and benefits to stay the course is unhelpful and gets a bit worn over time! Our developer/maintainer base is used to mailing list review, and changing that would likely result in one person looking at patches, to the detriment of the project. Maintainers like myself also have favored processes and tools, and changing them would likely at least cause productivity issues for a while.

Final thoughts: The future?

Governments are asking some good questions about software and security, but there are also very valid concerns about the lifecycle of hardware and sustainability issues. What happens to hardware after the original manufacturer stops supporting it? Landfill? Can you tell if a device contains risky code?

The project has some amazing software license and SBoM capabilities, and we collaborate closely with SPDX. We’re also one of the few build environments that can generate fully reproducible binaries and images down to the timestamps for all the core software components straight out of the box.

Combining these technologies, you can have open and reproducible software for devices. That means you can know the origin of the code on the device, you can rebuild it to confirm that what it runs is really what you have instructions/a manifest for, and if—or, in reality, when—there is a security issue, you have a path to fixing it. There is the opportunity for others to handle software for the device if the original provider stops for whatever reason, and devices can avoid landfill.

I dream of a world where most products allow for this level of traceability, security, and sustainability, and I believe it would drive innovation to a new level. I know a build system that could help it become a reality!

Get involved to help the Yocto Project community grow

Basic survival isn’t my objective or idea of success. I’d love to see more energy, engagement, and collaboration around new features, establish that security team and see the project playing a more prominent role in the broader FOSS ecosystem.

Help can take different forms. If you already use the Yocto Project, say so publicly, or let us list you as a user! We’re open to developer help and new contributors too, be it features, bug fixing, or as maintainers.

The project is also actively looking to increase its number of member companies. That helps us keep doing what we’re doing today, but it might also let us fund development in the critical areas we need it and allow us to keep things running as the ecosystem has grown to expect. Please contact us if you’re interested in project membership to help this effort.

About the author: Richard Purdie is the Yocto Project architect and a Linux Foundation Fellow.