An update on the SPDX python-tools

1578

Authors: Armin Tänzer armin.taenzer@tngtech.com, Meret Behrens meret.behrens@tngtech.com, Nicolaus Weidner nicolaus.weidner@tngtech.com, and Maximilian Huber maximilian.huber@tngtech.com

Progress with the SPDX Python tools

Discussions regarding the development and direction of the SPDX Python tools often happen in the weekly meetings or smaller rounds and are not always visible to interested parties. This blog post intends to fill this gap, providing a condensed version of what was done and what is to come. Large parts of the work described in this post were made possible by an OpenSSF-sponsored project conducted by TNG. It is intended to be the first in a series of such posts.

The initial cleanup

As the Python tools were only nominally maintained for about a year, a lot of “backlog” had piled up, both in open pull requests (short: PRs) and open issues. While not the most exciting part of working on the Python tools, finishing PRs and triaging issues was still an essential first step to bringing the Python tools up to speed.

Over the past two months, 48 PRs were closed, out of which 21 had been open for up to several years. In some cases, the original contributors finished their contributions after a review; in others, we took over and finished the work they started. Some of these PRs were small, and others were large and conflicting – and it’s a relief for everyone that it’s no longer necessary to scan 10+ PRs for possible conflicts or overlaps before making a small change.

On the side of open issues, the number was reduced from 51 to… 52. To put these numbers into much-needed perspective, though, 25 “old” issues (created before September) and 19 “new” issues were closed. Many new issues were discovered while working on the tools and will be tackled in time, along with the remaining older issues. They are not considered a priority now and will be easier to resolve after some much-needed refactoring (more on that later).

SPDX v2.3 support

For a long time, a big issue with the Python tools was their lack of support for SPDX v2.3 – or, as a matter of fact, for several aspects of v2.2 and v2.1 as well. We are now at a stage where all v2.3, v2.2, and v2.1 properties are supported (see caveats below) – testers welcome! Since the Python tools support no version-specific handling, validation was updated to follow the v2.3 specification. Introducing version-specific handling is the main motivation for refactoring the tools.

Caveats:

  • RDF support: JSON came out as a clear winner when discussing the importance of the different serialization formats, followed by tag/value because of its widespread usage. While quite exhaustive in its specification, RDF can be cumbersome to use, and its steep learning curve dissuades many potential users from choosing the format. For these reasons, we decided to postpone implementing full RDF support for v2.3 and instead focus on making the Python tools future-proof.
  • License expressions: Full support for license expressions is a long-standing open issue that is not resolved yet. We entertain the possibility of using the nexB license expression project for this.

Preparing for the future

Just release it already

The last release on PyPI happened more than three years ago, and many changes have been implemented since then. It’s about time for a new release, and there already is a shiny new release candidate for 0.7.0. We would like the release to be bug-free, so more testers are always welcome! Feel free to try and break things, and find out whether your personal use case is covered.

Apart from making bug fixes and new features available, this release will allow us to get started on some major breaking changes:

Move fast and break things

As foreshadowed several times in this blog post, a major refactoring of large parts of the Python tools is coming up. The main goals are:

  • making the Python tools easily maintainable and extensible
  • providing a well-documented and easy-to-use API for library users
  • preparing the tools for inclusion of the SPDX v3 specification once it is released
  • allowing version-specific handling of SPDX documents – in particular, version-specific validation

A rough sketch of the planned architecture is included in this issue.

Work on the refactoring has already started on a dedicated feature branch (to avoid colliding with the release): A first version of the new data model is finished (issue), and issues covering the validation layer and JSON parsing are underway.

We look forward to hearing from you

And this is where you come in: We’d like to establish a vibrant community around the python-tools, so send us your feedback over on GitHub! Let us know what you think about existing features and upcoming changes/PRs and what features you still dearly miss. Of course, we also appreciate every contribution you’d like to bring in yourself, whether that may be actual code, test files, or special use cases that we still need to consider! If you want to talk directly to us, feel free to join our weekly sync meeting every Thursday at 4:30 pm GMT (5 pm GMT on every first Thursday of the month). Or email us via the SPDX tech mailing list to arrange a discussion at a different time.

References

For convenience, some important links are repeated here: