Motivation for launching the project by the client: the client wanted to add new functionality to his own product - the ability to search for a translation of a scientific article among the most common languages.
What we had initially: Antiplagiat did not have functionality for searching translations of scientific articles; the task was to add new functionality.
Project goals: to build a topic model aimed to solve two problems: the problem of semantic search for the translation of scientific articles, as well as the problem of classifying scientific articles relative to scientific headings.
MIL Team's solution: The team's experience in the field of topic modelling and microservice architecture made it possible to create a service for searching translations of scientific articles and defining scientific headings of articles, which can be launched in a virtual machine.
Tools for building the model:
- A parallel corpus of scientific articles from the library website;
- A parallel corpus of Wikipedia articles in 100 languages;
- Affiliation tags of scientific headings of different rubricators(UDC, OECD).
The model results:
- a topic model of scientific headings;
- a virtual machine on which the model can run.
Technological stack: grpc, Python, sklearn, BigARTM