4. Text mining

The research team has significant experience in developing Natural Language Processing modules and integrating them in a robust text analysis and information extraction platform for the Greek language . The group also constantly develops large collections of text that are annotated and used as training material for several NLP analyzers. One of these resources is the Greek Dependency Treebank , a corpus of 100K words annotated at the levels of syntax and semantics, which has been used for training data-driven dependency parsers. Another resource developed by members of the team is the Greek Event Annotation Corpus, which consists of texts with annotations for events and spatiotemporal expressions according to widely adopted standard schemas like TimeML . In this context, the group needs to closely follow-up with recent developments and the current state-of-the-art in the relevant fields. The focus lies in the areas of:

  • Dependency parsing with the aim to facilitate knowledge extraction and harvesting. DP is one of the most promising and rapidly growing paradigms in syntactic analysis, mainly due to the fact that dependency structures extend naturally to semantic representations, while also being better suited for languages with a free or flexible word order like Greek. Effort should be invested in recent approaches to dependency parsing like integrating graph-based and transition-based models and exploiting large collections of automatically parsed data as training material.
  • Innovative techniques for event/fact recognition and spatiotemporal anchoring of events. Recognizing and grounding events on the space and time is of crucial importance for applications that need to extract and aggregate information from large collections of unstructured data. Within this research axis, we will focus on coupling our temporal expression recognizer with a similar module for spatial expressions. We also plan to study the factuality of events and discriminate between situations that have happened or not.
  • Algorithmic methods based on Local Grammars for the development of semantic-syntactic computational lexica, with particular consideration for the case of emotive predicates. Semantic-syntactic lexica stand at the heart of parsing and event/fact recognition algorithms and techniques. Emotive predicates notoriously present idiosyncratic semantic and syntactic behaviour; it is widely accepted that only state-of-the-art techniques, such as Local Grammars, that can both capture the detail of the environments where these predicates occur and support thematic classification of predicates, can be trusted to yield reliable results for the development of conceptually organised lexica for NLP exploitation.

Related news

To Iνστιτούτο Επεξεργασίας του Λόγου και το Ερευνητικό Κέντρο Αθηνά διοργανώνει Ανοιχτή Εκδήλωση στην Ξάνθη

«25 χρόνια Έρευνας και Καινοτομίας στις Τεχνολογίες Γλώσσας, Πολιτισμού και Περιεχομένου» είναι ο τίτλος της Ανοικτής Εκδήλωσης που διοργανώνει το Ινστιτούτο Επεξεργασίας του Λόγου (ΙΕΛ) -ένα από τα Ινστιτούτα του Ερευνητικού Κέντρου «Αθηνά»- στην Ξάνθη, την Δευτέρα 25 Μαΐου 2015 (18.00 – 21.00), στο ξενοδοχείο Elisso. Το ακαδημαϊκό/επιστημονικό, επιχειρηματικό, εκπαιδευτικό και ευρύτερο κοινό της Ξάνθης [...]

Posted in Activities, Info Days, Language learning and learning disabilities, Multimedia processing, Multimodal communication, Open Days, Text mining | Comments Off

Visit from OFAI researchers

On 7-8 May 2015, Martin Gasser and jan Schulter, researchers at OFAI, visited ILSP/Athena RIC. Martin Gasser gave a presentation with the title “Applications of Score Performance Matching Technology”. The presentation focussed on the aligment of audio data to the score which is a central problem when studying different performances of classical music pieces. He [...]

Posted in Activities, Multimedia processing, Scientific Presentations, Text mining, Visits | Comments Off

Info Day: Language and Content Processing Technologies @ ILSP / “Athena” RIC

Το Ερευνητικό Κέντρο “Αθηνά” έχει τη χαρά να συμμετάσχει στο Athens Science Festival 2015 από τις 17 έως τις 22 Μαρτίου στην Τεχνόπολη του Δήμου Αθηναίων. Το Athens Science Festival έχει στόχο να καταστήσει τις επιστήμες πιο φιλικές στο ευρύ κοινό και να παρέχει ερεθίσματα και κίνητρα σε άτομα κάθε ηλικίας να ανακαλύψουν την επιστήμη [...]

Posted in Activities, Info Days, Language learning and learning disabilities, Multimedia processing, Multimodal communication, Priority Research Axes, Project news, Text mining | Comments Off

A visit to the the Intelligent Music Processing and Machine Learning Group of the Austrian Research Institute for Artificial Intelligence (OFAI)

Assistant Researcher Angelos Gkiokas from the ILSP visited for three months the Intelligent Music Processing and Machine Learning Group of the Austrian Research Institute for Artificial Intelligence (OFAI). The aim was the knowhow transfer from OFAI to ILSP in the field of Music Information Retrieval as well as establishing a closer research collaboration between institutes. [...]

Posted in Activities, Multimedia processing, Scientific Presentations, Text mining, Visits, Workshops | Comments Off

[Scientific Presentation] Integration of Multiword Expression Recognition in Parsers, Mathieu Constant (Univ. Marne-La-Vallee)

December 12, 2014 Matthieu Constant Associate professor in Computer Science at the Université Paris-Est Marne-la-Vallée, France. Automatic linguistic analysis faces two major problems inherent in natural languages: ambiguity and multiword expressions (MWE). Whereas the literature abounds in analyzers trying to deal with the case of ambiguity, few studies tackled the integration of MWE recognition. As [...]

Posted in Activities, Project news, Scientific Presentations, Text mining | Comments Off