Automatic building up documents taxonomy through metadata analysis

IRIS

In cooperation with CNIPA (the Italian Authority for the use of ICT’s in the Public Administration), we studied and developed a new solution for the effective access to legal data, especially law texts, norms and rules. Such information represented in XML based and structured documents - is available also at the section or paragraph level. We are experiencing this kind of system within the civil data status, because a project of vertical research, structured on a semantic level, allows the collection of information and the building of a body of uniform rules. The system is based on a statistical similarities relationship and it gives to user the capability to consider also information which, even if not immediately returned as a result of the query resolution, could however be interesting related to the user information needs, because it discovers new information and relationships with in the set of documents. The system provides the usual functionalities of ad hoc retrieval of laws, sections and paragraphs of interest, implemented by means of XML-retrieval techniques, but it also, given the text of a certain law, applies document similarity algorithms to derive section or paragraph, the set of paragraphs where the sections and laws are included which, probably, treat the same subject. Furthermore, by performing a suitable text parsing, the system extracts from each document all explicit references to different laws (and even the references to sections and paragraphs). In this way the system is able, in response to a given query, to return not only all laws (and the corresponding sections and paragraphs) which may be relevant to the specified subject, but also, for each returned law, a set of laws (sections, paragraphs) which are either explicitly (by means of explicit reference in the text) or implicitly (by statistical similarity) related to it. Then these items are ranked by applying a suitable, user tunable, function of both explicit (in a link analysis style) and implicit referent. Applying iteratively the same approach to each considered law, section or paragraph, the user is able to browse within the given document corpus, moving according to the presence of significant (explicit or implicit) relationships among text items. This search technology employs a new class of database designed for exploring information, not just managing transactions, but it lets users prioritize and personalize their choices, rather than directing them down a classification path. Now users can find what they are looking for, and discover new information and relationships.

Talamo, M., Gambosi, G., Aversa, A., Bonazzoli, S. (2009). Automatic building up documents taxonomy through metadata analysis. In Archiving 2009 (pp.160-163). Arlington, VA : Society for Imaging Science and Technology.

Automatic building up documents taxonomy through metadata analysis

TALAMO, MAURIZIO;GAMBOSI, GIORGIO;Aversa, A;Bonazzoli, S.

2009-05-01

Abstract

In cooperation with CNIPA (the Italian Authority for the use of ICT’s in the Public Administration), we studied and developed a new solution for the effective access to legal data, especially law texts, norms and rules. Such information represented in XML based and structured documents - is available also at the section or paragraph level. We are experiencing this kind of system within the civil data status, because a project of vertical research, structured on a semantic level, allows the collection of information and the building of a body of uniform rules. The system is based on a statistical similarities relationship and it gives to user the capability to consider also information which, even if not immediately returned as a result of the query resolution, could however be interesting related to the user information needs, because it discovers new information and relationships with in the set of documents. The system provides the usual functionalities of ad hoc retrieval of laws, sections and paragraphs of interest, implemented by means of XML-retrieval techniques, but it also, given the text of a certain law, applies document similarity algorithms to derive section or paragraph, the set of paragraphs where the sections and laws are included which, probably, treat the same subject. Furthermore, by performing a suitable text parsing, the system extracts from each document all explicit references to different laws (and even the references to sections and paragraphs). In this way the system is able, in response to a given query, to return not only all laws (and the corresponding sections and paragraphs) which may be relevant to the specified subject, but also, for each returned law, a set of laws (sections, paragraphs) which are either explicitly (by means of explicit reference in the text) or implicitly (by statistical similarity) related to it. Then these items are ranked by applying a suitable, user tunable, function of both explicit (in a link analysis style) and implicit referent. Applying iteratively the same approach to each considered law, section or paragraph, the user is able to browse within the given document corpus, moving according to the presence of significant (explicit or implicit) relationships among text items. This search technology employs a new class of database designed for exploring information, not just managing transactions, but it lets users prioritize and personalize their choices, rather than directing them down a classification path. Now users can find what they are looking for, and discover new information and relationships.

Scheda breve

Scheda completa

Scheda completa (DC)

	Nome del convegno
	
			Archiving 2009
		
	Luogo del convegno
	
			Arlington, Virginia
		
	Anno del convegno
	
			2009
		
	Organizzatore/i del convegno
	
			IS&T
		
	Rilevanza del convegno
	
			Rilevanza internazionale
		
	Data di pubblicazione
	
			mag-2009
		
	Settore disciplinare dell'intervento
	
			Settore INF/01 - INFORMATICA
		
	Lingua del contenuto
	
			English
		
	Tipologia
	
			Intervento a convegno
		
	Citazione
	
			Talamo, M., Gambosi, G., Aversa, A., Bonazzoli, S. (2009). Automatic building up documents taxonomy through metadata analysis. In Archiving 2009 (pp.160-163). Arlington, VA : Society for Imaging Science and Technology.
		
	Tutti gli autori
	
			Talamo, M; Gambosi, G; Aversa, A; Bonazzoli, S
		
	Appare nelle tipologie:
	
			02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/14719

Citazioni

ND

0

ND

social impact