Dialogue clustering system
Project: contact center analytics automation project - semantic clustering of dialogues
Project with MC NTT

Automating the responses of contact center operators assumes that there is a taxonomy of issues that customers address. This taxonomy will allow you to classify requests and then process them. When working with a large number of contact centers on various topics, you need a system for rapid analysis of the body of dialogues. You need to create a tool for automatically building ready-made taxonomies for dialog boxes.


It is important for KC analysts to quickly understand what is in the dialog corpus in order to quickly automate their work. Building such a taxonomy completely manually is a very time-consuming task that requires automation.
The decision of the team
We asked the partner for a marked-up selection of synonymous dialogs, which helped us compare different models and configure the model parameters for a specific task.
We tested several methods for solving the problem: various neural network approaches to searching for paraphrases and hierarchical multi-modal thematic models. Thematic models performed better.
The final solution was packaged in a Docker container that implements the business logic required by the partner.
- Reducing the load on the analyst
- Reducing the time to identify new categories
- Identification of new intents in the flow of requests
Permitted difficulties
- Model that is resistant to changing subjects
- Stability of the model when changing the size of the text case
- Correction of typos (including for cases with very specific vocabulary)
- Project Manager: Alexey Goncharov
- Team Lead: Artem Popov
- Team of researchers: Daria Polyudova, Evgenia Veselova, Viktor Bulatov
- Scientific consultant of the project: Konstantin Vorontsov
Technology stack
TopicNet, BigARTM, Flask, Python, PyTorch, gensim, UMAP