The project context This project is a part of a Ph.D. thesis in Artificial Intelligence at Bircham International University (BIU) – Madrid (a Distance Learning Program), with the theme “Agent-based Multimodal translation in Natural Language Processing“. BIU is a Top 10 World Ranking Distance Learning University. We expect the duration of this Ph.D. to not exceed the 2018-2020 period … The thesis implies the building of a learning set for English, French and Baoule languages from which the source and reference of each corpus should be taken for Machine Translation tasks. The Baoule language is new in the field, there is no wikipedia, and no existing prepared (and paired) corpora for the translation purpose.
What is the interest for such a project? The problem of automatic induction (inference) of a grammar on the basis of a sample of language sentences is still an open problem in the area of Artificial Intelligence (AI)1Introduction to Artificial Intelligence, Mariusz Flasiński, P.22, para. 3(Please, each time click on the footnote number (1) for details). And Human languages are diverse with about 6000 to 7000 languages spoken worldwide2Minh-Thang Luong thesis on Neural Machine Translation, Stanford University, December 2016, P.1. But only about three (03) hundred languages (approximately 4 percent) have an official wikipedia that could be used as a reliable corpus for Natural Language Processing (NLP) tasks. So, the call for contribution is still required if we want to take into account all spoken languages. Moreover, through this project we will have the possibility to contribute to creating a richer corpus, ready for machine learning tasks, for the Baoule language. This work will be available to any machine translation project. In addition, it is a technique that will be continuously improved thereafter.
What is Language Pairing? First, let us define Artificial Intelligence as a discipline for creating Machines or Agents or Systems that Think and Act like Humans3Artificial Intelligence: A Modern Approach, Stuart J. Russell & Peter Norvig, P.2, Figure 1.1.. In Artificial Intelligence an immense significance is attached to natural language and an intelligent system’s ability to use it4Introduction to Artificial Intelligence, Mariusz Flasiński, P.7, para. 1. This led to Computational Linguistics which is the scientific study of language from a computational perspective in order to provide computational models of various kinds of linguistic phenomena. The Machine translation (MT) is a sub-field of Computational Linguistics that investigates the use of software to automatically translate text or speech from one language to another. MT should not to be confused with Computer-Aided Translation (CAT) also called Machine-Aided Human Translation (MAHT) where a human performs the translation using the support offered by computer tools, and with Interactive Machine Translation (IMT) which is a sub-field of CAT where the computer predicts the text the user is going to input (often used for smartphones’ keyboard). In order to produce high quality translations for MT tasks, it is important to use reliable Language Pairs produced by an Expert in Language Pair Translation5https://www.ulatus.com/translation-blog/the-importance-of-language-pairs-in-academic-and-professional-translation/. A Language Pair is a dataset of sentences from two (02) distinct languages (bilingual lexical data), where each sentence is reciprocally translated from one language (the source) toward the other (the reference). The Language Pairing consists in the production of Language Pairs. There exist several language pairing services on the internet such as Wagner Consulting International, Ulatus, and Apertium. Apertium proposes open source translation pairs via Wikipedia and GitHub, and we can find several language pair patterns such as the
Where would you find enough data for a new language integration? All choices here are made on basis of the scientific method6Artificial Intelligence: A Modern Approach, Stuart J. Russell & Peter Norvig, section 1.3.8, AI adopts the scientific method (1987-present). The third language (Baoule language) has no official wikipedia page for the time. For a new language official wikipedia page we need several validation steps and the process could take several years. The Baoule language has been verified as eligible on January 26th 2017, but it is still in the incubator. We are waiting for the final decision, that could come only after valid wiki pages translations (active test project and translation of the MediaWiki interface into the Baoule language). We don’t know when this will be done, in order to put the Baoule language among the Official Wikipedias.
For the time, as the first contributor of the Wikimedia Incubator for the Baoule community, we provided on a Welcome page some useful tools for Help in the Baoule language translation, in order to allow anyone who wants, to translate existing pages or create new Baoule pages. And this perfectly matches the current Policy on enabling translation into a language (at least one person willing to translate into the language). But this Language Pairing Interface remains important for any kind of expression that does not meet the Wikimedia criteria. You could provide pairing of some personal statements; even if you make only one pairing proposition, it will be useful for the application and will help in diversification.
So in the meanwhile, we propose two (02) main strategies for getting a reliable corpus for this language pairing project: 1- We have the Gospel or entire Bible translation in Baoule, which is done by dozens of experts in the field of Language Pair Translation; very rare are the translation projects that can bring together so many specialists. So, this is technically a very reliable source for Baoule including language pairs creation. 2- Other sources are the Baoule dictionary and an open-ended language pairing through this interface that will be validated after advice from native users of the Baoule language. This is the main reason why we created this Language Pairing Interface, in order to have another translation source for diversification and more efficiency.
How to contribute to this project? As a language project, it is possible for anyone who wants, to add a new translation or propose a better translation than an existing one, in order to contribute. At the end, this will lead to a better performance of the final application that will be useful for Machine-Aided Human Translation (MAHT) tasks and will also serve for the translation of the MediaWiki interface into the Baoule language. In order to contribute, please use the Language Pairing Menu. The Language Pairs Views tab allows a consultation of existing English – French – Baoule translation Pairs. The Make a Pairing Proposition tab allows the user to propose a totally new translation or another translation of an existing pair with comments in the Observation field. The contribution could be done anonymously if the user is not connected, otherwise his name will be saved as the contributor. Please, consider ethics when proposing language pairs. It would be better to avoid toxic comments and other unpleasant statements. Let us be respectful.
What about the author of this project? I am KOUASSI Konan Jean-Claude, a Computer Scientist (currently a B3 level Civilian in Côte d’Ivoire since February 2013) and a Ph.D. Student in Artificial Intelligence at Bircham International University-Madrid-Spain, in Distance Learning. As an official, the article 59 of the Statut General de la Fonction Publique of Côte d’Ivoire (law number 92-570 of the 11/09/1992) allows me to acquire such a degree. For more details; Developer, Network Engineer (5-year local program), Machine Learning Engineer (Udacity), with 12+ years of experience in Computer Science. Interested in Cognitive Computing Research, personally and actively involved in fields such as Artificial Intelligence, Deep Learning, Machine Intelligence, Intelligent Interactive Systems, Machine Vision, and related domains. Proponent of the belief that these technologies will help us contribute to creating a better world.