On October 30, 2019, we officially open sourced Amundsen, our solution to solve metadata catalog and data discovery challenges. Ten months later, Amundsen joined the Linux foundation AI (LFAI) as its incubation project.
In almost every modern data-driven company, each interaction with the platform is powered by data. As data resources are constantly growing, it becomes increasingly difficult to understand what data resources exist, how to access them, and what information is available in those sources without tribal knowledge. Poor understanding of data leads to bad data quality, low productivity, duplication of work, and most importantly, a lack of trust in the data. The complexity of managing a fragmented data landscape is not just a problem unique to Lyft, but a common one that exists throughout the industry.
In a nutshell, Amundsen is a data discovery and metadata platform for improving the productivity of data analysts, data scientists, and engineers when interacting with data. By indexing the data resources (tables, dashboards, users, etc.) and powering a page-rank style search based on usage patterns (e.g. highly-queried tables show up earlier than less-queried tables), these customers are able to address their data needs faster.