Polylingual Text Classification in the Legal Domain

AutoreTeresa Gonçalves - Paulo Quaresma
CaricaAuxiliar Professor at the Department of Computer Science of the University of Évora - Associated Professor at the same Department.
Polylingual Text Classif‌ication in the Legal Domain
SUMM ARY:1. Introduction – 2. Concepts and Tools – 2.1. Automatic Text Classif‌ica-
tion – 2.2. Support Vector Machines – 3. Polylingual Approach to Text Classif‌ication
– 3.1. Combining MonolingualClassif‌ier – 3.2. Using PolylingualClassif‌iers – 4. Ex-
periments – 4.1. Dataset Description – 4.2. Experiment al Setup – 4.3. Monolingual
Experiments – 4.4. Monolingual Combiner Experiments – 4.5. Polylingual Experi-
ments – 5. Conclusions and Future Work
Current Information Technologies and Web-based services need to man-
age, select and f‌ilter increasing amounts of textual information. Text classif‌i-
cation allows users, through navigation on class hierarchies, to browse more
easily the texts of their interests. This paradigm is very effective both in
f‌iltering information as in the development of online end-user services.
Since the number of documents involved in these applications is large,
eff‌icient and automatic approaches are necessary for classif‌ication. A Ma-
chine Learning approach can be used to automatically build the classif‌iers.
The construction process can be seen as a problem of supervised learning:
the algorithm receives a relatively small set of labelled documents and gen-
erates the classif‌ier. Several algorithms have been applied, such as decision
trees, linear discriminant analysis and logistic regression, the naïve Bayes
algorithm and Support Vector Machines (SVM). Besides having a justif‌ied
learning theory describing its mechanics, with respect to text classif‌ication
SVM are known to be computationally eff‌icient, robust and accurate.
Because of the globalization trend, an organization or individual often
generates, acquires and archives the same document written in different lan-
guages (i.e., polylingual documents); moreover, many countries adopt mul-
tiple languages as their off‌icial languages. If these polylingual documents
are organized into existing categories one would like to use this set of pre-
classif‌ied documents as training documents to build models to classify newly
arrived polylingual documents.
For multilingual text classif‌ication, some prior studies address the chal-
lenge of cross-lingual text classif‌ication. However, prior research has not
T. Gonçalvesis Auxiliar Professor at the Department of Computer Science of the Uni-
versity of Évora; P. Quaresma is Associated Professor at the same Department.

Per continuare a leggere


VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT