Luis Camacho Caballero is working on a project to preserve endangered South American languages by porting them to computational systems through automatic speech recognition using Linux-based systems. He was one of 14 aspiring IT professionals to receive a 2016 Linux Foundation Training (LiFT) scholarship, announced last month.
Luis, who is from Peru, has been using Linux since 1998, and appreciates that it is built and maintained by a large number of individuals working together to increase knowledge. Through his language preservation project, he hopes to have the first language, Quechua, the language of his grandparents, completed by the end of 2017, and then plans to expand to other Amazonian languages.
Linux.com: Can you tell me more about Quechua, the language of your parents and grandparents?
Luis Camacho Caballero: Quechua was the lingua franca used in South American Andean between V and XVI centuries. It’s strongly associated to Inca culture (1300 BC – 1550 BC) but is clearly older than that. It is still alive and used by about 8 million people distributed among Ecuador, Perú and Bolivia. However, it’s under risk of extinction because, put in practice, the only language supported by government is Spanish. Don’t misunderstand, of course, there is a national agency for heritage preservation but it hasn’t gotten momentum yet. The process of substitution is running faster and stronger than initiatives of preservation.
It’s a shame, I speak just a bit. You can taste a piece of Quechua in these funny clips: 1, 2 and 3 and even hear some famous songs here: Heaven, The way you make feel (below), and bonus track.
Linux.com: What is your process for recording and digitizing the language?
Luis: It’s a hard process. Basically, it is composed of two parts: building a text/voice Corpus and the language processing itself.
In regard to the first part, the challenges are 1) linking both Corpora, get a exact matching of voice and text and 2) In order to make the corpora more useful, doing part-of-speech tagging, or POS-tagging, in which information about each word’s part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of tags.
In the part of the automatic speech recognition (ASR) itself, we are testing Artificial Intelligence algorithms looking for the one that matches better with features of the Quechua language.
Linux.com: How did you get involved in this work?
Luis: Since that first time I was exposed to English ASR, maybe six years ago, I knew that I had to do ASR for Quechua, it’s my contribution to preserve my heritage.
Linux.com: Is this a hobby, or a job for you?
Luis: Nowadays I am with PUCP, I wrote a proposal and fortunately it was granted by the Peruvian Science Foundation, so, I have resources for developing this project until Christmas 2017. Part of my job is networking with all the stakeholders and looking for more funds until we reach a complete ASR system, one at the same level of well-supported languages like English.
Linux.com: How do you plan to use your LiFT scholarship?
Luis: Linux is a wonderful platform, almost all language computational portability technology is developed over Linux. I’ve not decided yet which course fits my current needs of Linux support.
Linux.com: How will the scholarship help you?
Luis: I think the scholarship help me at least in two ways: 1) getting in touch with the more renowned expert Linux trainers and 2) getting a valuable knowledge that would otherwise would be expensive or inaccessible.
Interested in learning more about starting your IT career with Linux? Check out our free ebook “A Brief Guide To Starting Your IT Career In Linux.”