Polylingual Text Classification in the Legal Domain
Autore | Teresa Gonçalves - Paulo Quaresma |
Carica | Auxiliar Professor at the Department of Computer Science of the University of Évora - Associated Professor at the same Department. |
Pagine | 203-216 |
Polylingual Text Classification in the Legal Domain
TERE SA GONÇ ALVE S, PAULO QUARE SMA ∗
SUMM ARY:1. Introduction – 2. Concepts and Tools – 2.1. Automatic Text Classifica-
tion – 2.2. Support Vector Machines – 3. Polylingual Approach to Text Classification
– 3.1. Combining MonolingualClassifier – 3.2. Using PolylingualClassifiers – 4. Ex-
periments – 4.1. Dataset Description – 4.2. Experiment al Setup – 4.3. Monolingual
Experiments – 4.4. Monolingual Combiner Experiments – 4.5. Polylingual Experi-
ments – 5. Conclusions and Future Work
1. INT RODU CTI ON
Current Information Technologies and Web-based services need to man-
age, select and filter increasing amounts of textual information. Text classifi-
cation allows users, through navigation on class hierarchies, to browse more
easily the texts of their interests. This paradigm is very effective both in
filtering information as in the development of online end-user services.
Since the number of documents involved in these applications is large,
efficient and automatic approaches are necessary for classification. A Ma-
chine Learning approach can be used to automatically build the classifiers.
The construction process can be seen as a problem of supervised learning:
the algorithm receives a relatively small set of labelled documents and gen-
erates the classifier. Several algorithms have been applied, such as decision
trees, linear discriminant analysis and logistic regression, the naïve Bayes
algorithm and Support Vector Machines (SVM). Besides having a justified
learning theory describing its mechanics, with respect to text classification
SVM are known to be computationally efficient, robust and accurate.
Because of the globalization trend, an organization or individual often
generates, acquires and archives the same document written in different lan-
guages (i.e., polylingual documents); moreover, many countries adopt mul-
tiple languages as their official languages. If these polylingual documents
are organized into existing categories one would like to use this set of pre-
classified documents as training documents to build models to classify newly
arrived polylingual documents.
For multilingual text classification, some prior studies address the chal-
lenge of cross-lingual text classification. However, prior research has not
∗T. Gonçalvesis Auxiliar Professor at the Department of Computer Science of the Uni-
versity of Évora; P. Quaresma is Associated Professor at the same Department.
Per continuare a leggere
RICHIEDI UNA PROVA