Relations between Ontologies - Introduction to ontologies and semantic web - tutorial
Index Terms — Ontology, Semantic Web, Web Services. . and to describe the relations between these resources used to enable interoperation between Web . This site introduces ontologies and semantic web, with the emphasis on how ontologies are defined and used for semantic web applications today (more about. The semantic web is a general term for a collection of technologies that strive to provide web context that contains meaningful data (versus markup which tells a.
What are Vocabularies Used For? The role of vocabularies on the Semantic Web are to help data integration when, for example, ambiguities may exist on the terms used in the different data sets, or when a bit of extra knowledge may lead to the discovery of new relationships.
Consider, for example, the application of ontologies in the field of health care. Medical professionals use them to represent knowledge about symptoms, diseases, and treatments. Pharmaceutical companies use them to represent information about drugs, dosages, and allergies.
Combining this knowledge from the medical and pharmaceutical communities with patient data enables a whole range of intelligent applications such as decision support tools that search for possible treatments; systems that monitor drug efficacy and possible side effects; and tools that support epidemiological research. Another type of example is to use vocabularies to organize knowledge. Libraries, museums, newspapers, government portals, enterprises, social networking applications, and other communities that manage large collections of books, historical artifacts, news reports, business glossaries, blog entries, and other items can now use vocabularies, using standard formalisms, to leverage the power of linked data.
It depends on the application how complex vocabularies they use. Some applications may decide not to use even small vocabularies, and rely on the logic of the application program. Some application may choose to use very simple vocabularies like the one described in the examples section belowand let a general Semantic Web environment use that extra information to make the identification of the terms. Some applications need an agreement on common terminologies, without any rigor imposed by a logic system.
As a result, the relationships between entities in the specialization layers are reflected in generalization layer which forms a RDF S knowledge base. Figure 1 shows the representation of specialization and generalization layers. The first one is commonly referred to as ABox or Assertional Box. For example, the representation of the sentence "Heitor Villa-Lobos was born in Rio de Janeiro" in Japanese would use the particularization, instantiation, or specification of the class 'person'.An Introduction to the Semantic Web
The second is called TBox or Terminological Box, which contains the domain abstractions that enable inferences about the data model. Thus, in this layer, the relationship between classes and properties introduces the semantics in the data model, which leads to an ontology and translates into the computer world the ideas of Dahlberg about extension and intension of the concepts that are the basis of ontologies in Information Science.
Created by the Author. The semantics of the elements of the RDF S knowledge is based on their properties and values, i. With these Semantic Web tools, information systems can go a long way with a little semantics. However, the difficulty in predicting relationships involving conflict or incompatibility still remains.
For example, disjoint classes: This means that, in RDFS, it is impossible to determine whether there are inconsistencies. On the other hand, the Open World OWA assumption is the view that what is stated in the database is what is known; everything else is unknown.
Similarly, there is no assumption of single names, i. Finally, there should be a comprehensive specification of entities and relationships unless they add inference rules in a more abstract layer that can set limits and introduce generalized restrictions to the database.
It extends RDF and RDFS and adds more vocabulary for describing groups of things, such as classes, facts about these classes, relationships between classes and instances, and characteristics of these relationships. It is focused on the processing of the Web content and is intended to be read by computer applications. Moreover, it enables the creation of rules, axioms, and inferences to enable deductions using logical tools W3C, Information retrieval There have been undeniable advances in information retrieval in recent years due to the Web, the popularization of Graphical Use Interfaces GUIand inexpensive mass storage devices.
In addition, the continuous optimization of search engines, which improves users' experience, has made the Web the standard and preferred source of information, especially after the launch of the Google search engine by Brin and Pagewhich tries to respond to the challenges of designing a system that gathers Web documents and keeps them updated, according to the rate of growth of the Web.
Baeza-Yates and Ribeiro-Neto propose the distinction between the user task and the logical view of the document. User task implies specifying terms which convey the semantics of the user need and that meet the user information needs when browsing retrieved documents. Logical view of the document refers to a sequence of transformations aimed at representing documents through a set of index terms or keywords, which is justified because although full texts are the most complete logical view of a document, their usage implies high computational cost.
On the other hand, a small set of categories provides the most concise logical view of a document, but its usage leads to poor quality retrieval. From the traditional information retrieval to the Web today, there has been a significant change in the Web user profile. Professionals trained to perform queries on well-structured and well-known collections have been replaced with ordinary people who tend to ignore or disregard the heterogeneity of the contents, query languages, or any conceptual foundation about Information Systems.
This has led to increased complexity in the infrastructure involved in the entire information management process. Semantic search Activities involving the Semantic Web have been widely studied and many proposals have been made in an attempt to create a Web of distributable, machine-readable data.
Since the concept of semantic web has been introduced, many problems have been solved but more complex ones are still approached differently by different researchers that contribute to a more generalized view of semantic web, which is discussed below.
Semantic portals discussed by Maedche et al. Searches return ontology instances rather than documents and no relevance ranking is provided. In some systems, links to documents that reference the instances are added in the user interface next to each returned instance in the query answer according to Contreras et al.
The relevance ranking issue was addressed by Rocha et al. The authors proposed a semantic network in which the relation instances have semantic labels and numerical weights.
The query terms are mapped to the semantic network nodes, and the order of the search results is determined according to the relevance provided by the associated weights.
Guha and McCool and Guha et al. The study by Vallet et al. Seeking to overcome the limitations of specific organizational ontologies, Fernandez et al. Their study represents an important step towards the design of semantic retrieval technologies to the open Web by: Given the flexibility of the semantic web, retrieved objects can represent people, companies, cities, proteins, or anything that has been published without predefined categorization such as that in traditional search engines.
Moreover, this system must scale to large amounts of data and must be robust enough to deal with heterogeneity, noise, unreliability, and possible conflicts of data collected from a large number of sources. The exploitation of metadata associated with semantic web documents can increase the precision of information retrieval systems.
W3C Semantic Web FAQ
The model uses semantic representation rather than keywords. The documents are described through concepts and instances clustered in "semantic cases" that represent the user interest. In order to achieve more precise results, the matching and similarity models compare the same "semantic cases" of queries and documents.
In an attempt to interconnect semantic web with WWW, various processes have been proposed, especially lately. Despite the growth of structured databases to levels that enable various searches, Heath and Bizer mention that the gap between text and structured data remains a barrier to the popularization of semantic web and to the use of tools designed for this environment.
Some initiatives such as those introduced in the studies by Navigli et al. The model proposed by Buitelaar et al. Given the large volume of Web content, it is impossible to develop solutions without the help of machines. Therefore, in order to automate the lexicon construction, Walter et al.
The aim is to induce the creation of a lexicon from the knowledge represented in ontologies to feed the originally proposed model. Finally, the integration between semantic web and WWW will make it possible to obtain appropriate structured and unstructured information about the user profile.
The new generation of IRS will be able to indistinctly search either in databases of formal knowledge containing ontological structures that are not understandable to people or in textual databases that are not understandable to intelligent computer programs and provide good quality results.
Thus, only with this free and unrestricted communication between the two worlds, the claimed potential of semantic web will be available to the common user. Semantic information retrieval model proposal In the present study, we propose the semi-automatic construction of a lexical database in Portuguese for the Financial Risk domain, which, for the purpose of this study, was called RiscoLex.
This database was created based on ontology of risk and its corresponding corpus, as described below. Figure 2 shows the top level view of the financial risk domain. The various concepts are linked by relationships that contradict the forces between the threat and protection of the assets of an entity. Each dimension of this diagram gives rise to increasingly specific concepts. The set of concepts must be interpreted following the arrows that establish the type of relationship between one concept and another.
Adapted from Gresser et al. The collection of texts about financial risk contained 2, documents in Portuguese, which are in various formats known by most users. The reason was the ability to find different types of lexicalizations or ontology properties that enable better generalization of standards. The following computational resources were used for processing: Lexicalization Approach In order to represent the linguistic information, the principles defined by McCrae et al.
This model was designed to develop a standard RDF format of linguistic information, which includes declarative specifications of a machine readable lexicon that captures morphological, syntactic, and semantic aspects of the lexical items related to an ontology.
Relations between Ontologies
Semantic similarity was determined using the following lexical resources that are structured in groups of semantically related lexical items and that can be used freely because they are in the public domain: These resources combined are the key sources for the selection of semantically related lexicons in the domain of interest, financial risk, in the present study. The proposal for the construction of RiscoLex is to extract the labels of classes and properties of the ontology, identify and retrieve their respective synonyms and the morphosyntactic features of each term, convert them into RDF format, and provide the lexical database with the Lemon model.
Figure 3 shows the steps of the generation RiscoLex process. Created by the author. The approach includes the proposal of one or more lexical entries for each class and property of the ontology. The first step involves the extraction of the labels of the ontology and additional information such as synonyms and syntactical features, from external resources.
The task steps were configured to do the following: This step aims to characterize frequent terms which are, therefore, preferred in the domain and in the Portuguese language; 4 in natural language, it is common to use more than one word to convey the same meaning. Thus, the aim is to find the greatest possible number of synonyms for the terms of the list.
Linguistic ontologies for the Portuguese language were used in this task; and 5 The Lesk approach was used to treat polysemous terms and collect those that are more relevant to the domain. Riscolex and information retrieval Traditional IRS rely on keywords or descriptors to index documents, but this is not enough. The problem is that if the query term does not match with the keywords, the document will not be retrieved.
For example, the query term is perigo dangerin the representation of the document the synonym is risco risk ; no mechanism based on measuring similarity between terms will retrieve that document.
Therefore, a corpus and an ontology represent the same domain to different users: In general, there is no correspondence between the labels available in ontological entities and the document descriptors. In this case, the RiscoLex, linked to the ontology, provides the lemma and the synonyms to the descriptors.
If the descriptor is not inserted in the RiscoLex or in the ontology, it can be semi-automatically inserted in both of them to emphasize the dynamic nature of knowledge. Moreover, the extent to the comprehensiveness of the indexation system can be increased through inferences that explicitly provide semantic meanings. The inclusion of ontology to support the IRS provides more meanings through inferential engines. In this case, it can be seen as a dynamic extension of the document descriptors.
Therefore, they inherit all of their attributes and restrictions through axioms, without being explicitly expressed. An overview In the Semantic Information Retrieval Model SIRMit is assumed that the ontology was constructed and associated with the textual information sources that include the concepts to be represented. In addition, it is also assumed that although this search is restricted to the financial risk domain, the model can be applied to any other domain since there is structured and unstructured information that could represent the concepts understood by the domain.
The domain is represented by ontologies and corpora. On the one hand, ontological entities represent concepts, and inference engines automatically infer non-explicit information.
W3C Semantic Web Frequently Asked Questions
On the other hand, descriptors describe document contents, and people interact using natural language to infer unexpressed meanings. They complement each other for the task of providing information but in different formats or even incompatible formats. Documents and ontological entities are indexed together.
This modeling option facilitates the interaction with the end user as it keeps searching in the same way it does in traditional search engines. Additionally, the final result is at least as good as that of the traditional approach; i. Figure 4 illustrates the information retrieval process with the addition of the semantic module. The user interacts in a traditional way to submit the query.
The query processing standardizes the terms for the search. The lexicon-ontological knowledge includes the ontology and the RiscoLex. The corpus characterizes the database containing the documents to be retrieved.
The joint indexation of the databases involved provides the lexical-semantic index, which is used in the retrieval and ranking of retrieved documents to be presented to the user.
The semantic annotation process is therefore essential to link documents to the semantic space created by the domain ontology. NLP is the main tool for document identification, comparison, and annotation. However, seeking to minimize possible ambiguity effects, it is complemented by human validation. Results The first result to be highlighted is the creation of RiscoLex, the first lexical database in Brazilian Portuguese built with the Lemon model, which differs from the others by the interpretation of language restricted to the well-defined domain.
Additionally, the ontology, as a resource for natural language interpretation, puts the lexical database at the center of the interpretation process.
In this line, the level of representational granularity, at which the meaning of natural language is captured, is not driven by language but by the semantic distinctions made in an ontology. Thus, these distinctions are relevant only in the context of a specific domain.
Another result is the construction of the first ontology for risk management in Portuguese. Difficulties in building this type of resource have often been reported in the academic literature.