TopicNet
Open source project for automated multi-modal hierarchical thematic modeling
Training scenarios
Reproducible learning scenarios are implemented. Everyone can find the most appropriate scenario for their task and quickly implement the first thematic model
1
Balanced models
The problem of constructing thematic models on unbalanced samples is solved. A regularizer is presented that allows you to improve thematic models when working on such collections
2
Logging experiments
Convenient tool for logging and reproducing experiments allows you to save the most valuable information and use it to select the best models
3
Prototype " out of the box"
In a few lines of code, you can implement the first model on your own data. We have lowered the threshold for entering the field of thematic modeling and simplified the use of the library
4
Support for custom metrics
Users can create metrics for their own tasks. Logging of training metrics during model training is supported
5
View results
We have added a new functionality for viewing information about the built model. Now you can interpret the result in a few steps and analyze the errors that occurred
6
Suitable for both developers and professional researchers
Our solution is a library of automated thematic modeling.

On the one hand, the library contains the functionality that the developer needs: an automated pipeline for building a model, the ability to work with unbalanced data, and selecting the optimal number of topics - all this will allow you to use the functionality of the library "out of the box".

On the other hand, the library contains functionality for the researcher: you can use complex scenarios for training and preparing models, embed your own quality criteria for training models, and your own regularizers as stages of model training. Logging of experiments and convenient viewing of simulation results make the library the most convenient tool for building thematic models.
80%
Percentage of interpreted topics
With the help of optimal and prepared training scenarios, you can get an increase in the share of interpreted topics "out of the box".
40%
Reduce development time
A large set of optimization tools allows you to focus on selecting the optimal model and reduce development time.
Applied problem
Which are solved by thematic modeling
Exploratory search for a closed collection
Theme models allow you to quickly restore the structure of the collection and build an interpreted vector representation of each document, narrowing the search area for the query.
01
Taxonomy of a text collection
Understanding the structure of the collection is necessary for further automation of processing requests. Collection information models allow you to quickly understand the contents of a collection of dialogs.
02
Customer segmentation and profiling
Analysis of data about user actions to highlight their interpreted behavior profiles can be implemented using the matrix decomposition mechanism.
03
Analysis of news flow dynamics
Temporal theme models allow you to monitor the dynamics of the theme development in the collection. And automated selection of hierarchically related topics allows you to understand the structure of the news collection.
04