Extracting real content from the glut of online information

Filtering information from the Internet and proprietary networks can be difficult and time-consuming. This is increasingly so for those working with the large body of published scientific and technical information that supports research and development. The growing online news and information service market is adding to the confusion, thanks to the sheer number of online newspapers, news agencies, search engines, etc.

To counter this problem, the TREVI project has developed a next-generation technology that helps bridge the gap between the quantity of information available and time limits. Current approaches to harnessing the information surplus do little more than index key words. TREVI, in contrast, extracts and presents the concepts and relationships contained within any collection of documents, thus raising productivity without increasing overheads.

The TREVI system includes a new text extraction and classification processor based on robust linguistic capabilities for syntactic, morphological and semantic processing of raw text. The resulting analysis detects relevant entities (e.g. person, location, company names) and events in texts for intelligent text classification. The system is open and object-oriented, and follows an agent-based paradigm for text enrichment.

TREVI focuses on the converging information market (information provider and broker, information services based on databases) and especially the news agencies. It also addresses the growing meta-information market on the Internet (search engines, portals, community sites, etc.).

The demonstration shows the stand-alone version of the TREVI prototype, which categorises information with respect to predefined classes, stores natural language texts (English and Spanish) and then publishes them in a suitable framework. The input text comes from a press agency, a financial information provider, a shipping company and an online medical information provider.

Contact Maria Vittoria MARABELLO, Itaca s.r.l.
tel +39 6 43 56 00 11
fax +39 6 43 56 00 31
email mv.marabello@itaca.it
website www.itaca.it

