« Home « Kết quả tìm kiếm

Generic ontology of datatypes


Tóm tắt Xem thử

- Especially in data mining research it is impossible to efficiently (semi-) automatically connect parts of workflows, such as data preprocessing and data mining, perform analysis of the research results and communicate the research outputs, without machine process- able representation of datatypes and their properties.
- OntoDT defines the semantics, i.e., meaning of the key entities and represents the knowledge about datatypes in a machine friendly way.
- The OntoDT ontology is based on the latest revised version of the ISO/IEC 11404 standard for datatypes [23]..
- In Section 2, we present the background related to the development of the OntoDT on- tology.
- For example, a definition of the term attribute (‘Defines an attribute’) is circular and does not explain how an attribute is different from e.g.
- For example, users can define datatypes such as start of the project, beginning of the project, start date.
- All these datatypes are of the same type date datatype and the data encoded with these custom datatypes have the same semantic meaning.
- Organization System (SKOS) [57], which defines the semantics of the data.
- OntoDT adopts a similar approach for the defining semantic meaning of the data and also for modeling units.
- These terms do not correspond to conventional datatypes and are rather labels for the capturing the semantic meaning of the data.
- One cannot apply data mining algorithms to the data of the type experimental measurement without specifying the datatype (e.g., numerical datatype)..
- Observations are distinguished at the level of the entity (e.g., location, time), and the characteristics of an entity (e.g., height, name, color) are classified as data..
- Unfortunately, the representations and semantic meanings of the term datatype across these resources are not consistent.
- The design of the OntoDT ontology follows best practices in ontology engineering, such as the OBO Foundry principles, which are widely accepted in the biomedical domain [60].
- OntoDT is developed to be complementary to and integrated with state-of-the-art ontologies for representing scientific knowledge.
- Next, the terminology of the ontology (its classes and relations) is specified using some first order logical language (e.g., description logics).
- In the OntoDT ontology, the datatype class is modeled as a subclass of the OBI: data representational model class.
- In addition, OntoDT models datatype properties as subclasses of the quality class and connects them using the has-quality relation.
- 1a, we present the structure of the datatype class and in Fig.
- 1b the OWL Manchester syntax of the class definition..
- The value space specification class is modeled in OntoDT as a subclass of the OntoDM: specification entity class.
- A monadic operation specifies an operation that maps a value of a given datatype into a value of the given datatype, or into a value of the boolean datatype.
- A dyadic operation specifies an operation that maps a pair of values of a given datatype into a value of the given datatype, or into a value of the boolean datatype.
- A datatype property is defined as a quality that specifies the intrinsic properties of the data units represented by the datatype, regardless of the properties of their representations in computer systems.
- Cardinality denotes the notion of cardinality of the value space.
- Finally, boundedness is a property that denotes the boundaries of the value space..
- 1c, we present the representation of the integer datatype in OntoDT and in Fig.
- 1d we present OWL Manchester syntax of the integer datatype class definition.
- The integer datatype is a subclass of the numeric ordered primitive datatype class and.
- In OntoDT, an extended datatype (named ‘subtype’ in the ISO standard) is defined as a IAO: data representational model that is derived from an existing datatype by restricting the value space to a subset of the base datatype, while maintaining all operations (see Fig.
- An extended datatype is defined by a subtype generator that represents the relationship between the value spaces of the base type and the extended datatype..
- for the base datatype, and this is the reason we do not represent them simply as subclasses of the datatype class.
- The positive integer datatype is an extended datatype of the integer datatype obtained by limiting the value space with a lower bound of zero (see Fig.
- For example, to define an instance of the real datatype, we additionally need to specify the radix and the factor, which taken together, describe the precision to which values of the datatype are distinguishable.
- Both radix and factor are represented as subclasses of the value expression class..
- The instances of the discrete datatype differ between each other in the discrete-value-list specification.
- Here, we show a representation of the datatype representing the Iris-class attribute, as an instance of the discrete datatype class (see Fig.
- 4b, we present the OWL Manchester syntax of the Iris class datatype instance..
- On one hand, at the second level of the taxonomy with respect to the order property, we distinguish between numeric ordered primitive datatype and complex datatype.
- Next, it defines a construction procedure which creates a new value space from the value space of the element datatypes.
- Representation of the discrete datatype class and instance in OntoDT.
- Each of the datatypes from a collection of datatypes, to which the datatype generator is applied, is called a paramet- ric datatype.
- An aggregate generator is a datatype generator that specifies the algorithmic procedure applied to the value spaces of the component datatypes to yield the value space of the aggregate datatype, and a set of characterizing operations specific to the generator.
- The aggregate specific properties are independent of the component datatype properties.
- They are defined as qualities of the aggregate generator.
- 5a, we present an example of the record datatype (also called a tuple datatype).
- The values of the record datatype are hetero- geneous aggregations of values of the component datatypes.
- Each field component contains a unique identifier of the component and its datatype.
- 5b, we present the OWL Manchester syntax of the record datatype class..
- In Section 6.1, we presented an example of datatype representation of the class attribute of the Iris dataset, which is an instance of a primitive datatype.
- Additionally, the Iris-tuple datatype has a specification of the component datatypes.
- ‘sepal length’) and denotes the datatype of the component (e.g., real(f:def,r:def), where f:def and r:def represent the fraction and radix parameters needed to define an instance of a real datatype class).
- The results of the evaluation are summarized in Tables A.1–A.4 of the Appendix A..
- In this section, we present three use cases of the OntoDT ontology.
- It describes the datatype of the underlying data and is connected to the OntoDT datatype class via the is-about relation.
- The meaning of the labels is presented in the legend..
- At the first level of the taxonomy of datasets, we have the unlabeled dataset (a dataset that has only descriptive data and is usually used for clustering and pattern discovery tasks) and the labeled dataset (a dataset that has both descriptive and output/target data and is usually used for predictive modeling tasks)..
- At the second level of the unlabeled dataset taxonomy, we distinguish between a feature-based unlabeled dataset and a structure-based unlabeled dataset.
- At the second level of the labeled dataset taxonomy, we distinguish between a labeled dataset with primitive output and a labeled dataset with structured output.
- If we focus only on labeled datasets, the taxonomy can be further extended based on the datatypes on the descriptive and output part of the data.
- Finally, we showed how the structure of the OntoDT taxonomy of datatypes can be used to produce a taxonomy of datasets..
- A dataset specification includes information about the datatype of the data examples by using relations to the classes from OntoDT.
- All different variants of the same dataset were grouped under the same dataset class (as instances of that class).
- We represent all 6 variants of this dataset as separate dataset instances of the EDM dataset class, as each dataset instance is characterized by a different datatype..
- 7, we present an example annotation of one dataset instance of the EDM dataset, which has continuous descriptive attributes and two continuous target attributes.
- Each labeled dataset instance is described by a labeled dataset record datatype, which is a subclass of the record datatype with the distinctive feature that it contains only two field components, one describing the datatype on the description side and one the datatype on the target side.
- In that sense, the dset:EDM-MCT dataset instance is described by the labeled dataset record datatype instance containing an instance of the record of real datatype in both descriptive and target field components..
- 11 By using reasoning, some of the knowledge that was implicitly encoded in the ontology was made explicit.
- The transitivity of the is-a relation is one example of the implicit knowledge built inside the ontology.
- For example, if we would query for datasets that have some homogenous aggregate datatype on the output/target side, by using the inferred ontology, we would get all datasets that contain target datatypes that are subclasses of the homogenous aggregate datatype class to the lowest levels as answer of our query.
- 10 The annotations of the datasets are available on the ontology web page..
- This can be done by directly extending the OntoDT datatype taxonomy and defining the semantic meaning of the domain datatypes by linking them to the corresponding entities in domain ontologies.
- For example, we can define an amino-acid sequence datatype as a subclass of the character sequence datatype class (which is a sequence datatype having characters as its base type).
- BioXSD does not support arbitrary datatypes and it does not provide a clear framework for the representation of the semantic meaning of the data.
- OntoDT is fully interoperable with OBO bio-ontologies because it was developed by following the OBO Foundry recommendations (see Section 4) and therefore it fully supports the representation of the semantic meaning of the data by the corresponding entities defined in domain-specific bio-ontologies..
- The semantic meanings of the terms sequence and nucleotide are curtail for the capturing of the semantic meaning of the data of the datatype sequence.
- We represent the bio-sequence datatype class as a subclass of the character sequence datatype class with the defined.
- In order to define the nucleotide and amino acid sequences datatypes, we define two subclasses of the character datatype class: nucleotide character datatype and amino acid character datatype.
- We represent the bio-sequence record datatype class as a subclass of the record datatype class (see Fig.
- In OntoDT, we model the bio-sequence field component class is as a role of the bio-sequence datatype (defined previously)..
- In a similar way, we can define other datatypes from BioXSD as subclasses or instances of the OntoDT datatypes..
- This use case demonstrates that OntoDT provides logically consistent representation of bioinformatics datatypes from BioXSD and enables an accurate representation of the semantic meanings of the data of the specified datatypes.
- OntoDT adopts a modular approach where not only the information about units of measurements but also other operational information and the semantic meaning of the underlying data is captured and maintained separately.
- Following best practices, OntoDT clearly separates the semantic meaning of the data from the data itself and its structure.
- A modular approach for the recording of information is flexible, extensible, and also reduces the complexity of the underlying representational model.
- The employed designing approach (see Section 4) facilitates a seamless integration of the relevant resources.
- The ontology has been constructed by following best practices in ontology design so that it is complementary and can be easily integrated with other state-of-the-art ontologies for science..
- We envision several dimensions of further development of OntoDT that would overcome the current limitations of the on- tology.
- First, we want to further establish the connection with domain ontologies and represent domain dependent semantic datatypes for different domains (e.g., biology, ecology, economics) using the OntoDT ontology and the semantics of the domain entities from domain ontologies.
- We would like to acknowledge the support of the European Commission through the projects: MAESTRA – Learning from Massive, Incompletely annotated, and Structured data (grant number FP7-ICT-612944) and HBP – The Human Brain Project (grant number FP7-ICT-604102).
- We would also like to acknowledge the support of the Engineering and Physical Sciences Research Council UK (grant number EP/K030469/1)..
- 5 Modularity OntoDT is part of the OntoDM ontology, which contains also the OntoDM-core and the OntoDM-KDD subontologies.
- 10 Inverse relations Most of the imported relations from RO, OBI, and IAO have defined inverse relations..
- 12 Instantability More extensive population of the ontology with instances is planned for the future..
- 2 Use of annotation properties We reuse the OBI consortium defined meta-data (http://obi-ontology.org/page/OBI_Minimal_metadata) to provide additional semantic annotation of the classes and relations..
- 5 Ontology term IDs The IDs of the ontology terms include a combination of an ontology module ID and a multiple digit code.
- 1 Definitions Most of the OntoDT classes have textual definitions that are taken from the ISO/IEC 11404.
- The source of the definitions is properly referenced in the annotations..
- 4 Users of the ontology The ontology is reused by the OntoDM-core ontology..
- Taylor, The SSN ontology of the W3C semantic sensor network incubator group, Web Semant.: Sci., Serv

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt