Easy translate – uber for translation
EasyTranslate is the Uber for translation, collaborating with more than four thousand linguists and providing language solutions that facilitate interpretation of over 60 million words annually. Being passionate about language and aiming to provide the services of the best and most dedicated specialists in the translation industry, EasyTranslate has since 2010 become the fastest growing communication agency in Northern Europe with offices in London, Paris, Oslo, Stockholm, Hamburg, Amsterdan, Zürich, Vienna, and this summer set up headquarters in a large new harbour front office space in Copenhagen.
In our strive for continuous innovation we would you like you to contribute to our development, not only in regards to the present challenge but longer term potentially become part of our global translation universe, by applying aspects of machine learning and natural language processing to design highly scalable solutions that seamlessly match our clients’ requests.
In order to facilitate the process of matching up input source documents with the best qualified translators, we aim to automate the description of documents by applying hierarchical LDA topic modeling to infer the contents and general categories of the texts, in order to optimally select linguists that have expert knowledge within the specific domains. Based on the topic model we would like to create high level representations of texts that would allow us to assess how similar the original source and translated target documents are across different languages. Similarly we would like to describe linguists based on the topic distributions of documents they have previously translated. Adding a scoring of linguists based on customer feedback might allow us to automatically find the optimal translator for a new but similar document based on its LDA distribution over topics.
Our challenge to you
Based on a selection of EasyTranslate source documents, create a prototype hierarchical LDA topic model using English Wikipedia as corpus, to retrieve high dimensional distributions over topics and infer top level generic document categories for the English input texts based on the Gensim Python framework
Provide an interface to the retrieved LDA topic representations that makes it possible to assess and compare how similar the source texts are in document space. Likewise the interface should support representing linguists as LDA topic representations and inferred high level categories (Computers, Software) defined by their previously translated source texts, target document directions (e.g. English > Danish, Danish > English) and a scoring function based on customer satisfaction feedback related to the translation of each document.