Multi-label text classification is a challenging task in natural language processing, this task includes assigning subset of labels to a given document. What is challenging about this task is the big number of classes, e.g. Wikipedia dataset is annotated with hundreds of thousands of tags. Another challenge is that the labels follows power-law distribution. Legal documents often come in the form of long texts; however most current state of the art models deal only with fixed context. In this research, we aim at improving the current state of the art on this task and exploiting hierarchical information to enhance the quality of the model.
Шахин З. (науч. рук. Муромцев Д.И.) Multi-label text classification for legal documents (Многофакторная тематическая классификация правовых документов) // Сборник тезисов докладов конгресса молодых ученых. Электронное издание. – СПб: Университет ИТМО, [2020]. URL: https://kmu.itmo.ru/digests/article/4598