« Home « Kết quả tìm kiếm

Grid Computing P15


Tóm tắt Xem thử

- In addition to computing resource scheduling, Data Grids address the problems of storage and data man- agement, network-intensive data transfers and data access optimization, while maintaining high reliability and availability of the data (see References [2, 3] and references therein)..
- The Open Grid Services Architecture (OGSA) [1, 4] builds upon the anatomy of the Grid [5], where the authors present an open Grid Architecture, and define the technologies and infrastructure of the Grid as ‘supporting the sharing and coordinated use of diverse resources in dynamic distributed Virtual Organizations (VOs.
- In this article the application of the OGSA is discussed with respect to Data Grids.
- We then investigate OGSA’s benefits and possible weaknesses in view of the Data Grid problem.
- In conclusion, we address what we feel are some of the shortcomings and open issues that expose potential areas of future development and research..
- All the Data Grid discussion presented in this chapter draws from existing concepts in the literature, but is strongly influenced by the design and experiences with data management in the context of the EU Data Grid project [10].
- However, the usage of the Web services paradigm to build a Grid infrastructure was always considered in the architecture and has lead to the definition of the Web Service discovery architecture WSDA [13]..
- 15.1.1 The vision.
- In this section we summarize the Data Grid problem and the vision of the Grid as presented in the literature .
- The concept of the Virtual Organization (VO) gives us the first necessary semantics by which to address these issues systematically and to motivate these properties in detail..
- The concept of the VO is not a new one.
- In the vision of the Grid, end users in a VO can use the shared resources according to the rules defined by the VO and the resource providers.
- VOs can be dynamic, that is, they may change in time both in scope and in extension by adjusting the services that are accessible by the VO and the constituents of the VO.
- however, the user is not necessarily interested in the exact location of the data ( TRANSPARENT.
- of the VO in question, while at the same time maintaining the manageability of the data and its accessibility ( SECURE, COORDINATED, MANAGEABLE.
- 15.2 THE OGSA APPROACH.
- We try to map each requirement to the properties that grid systems are supposed to have on the basis of the requirements given in Section 15.2.
- AVAILABLE , FLEXIBLE 8 Support upgradability without disruption of the.
- On the basis of the services defined, higher-level services are envisaged such as data management services, workflow management, auditing, instrumentation and monitoring, problem determination and security protocol mapping services.
- Figure 15.1 The concept space of the GGF data area (from the GGF data area website)..
- 15.3.1 The data.
- In most of the existing architectures, the data management services are restricted to the handling of files.
- The following is a nondefinitive list of the kinds of data that are dealt with in Data Grids..
- Files: For many of the VOs the only access method to data is file-based I/O, data is kept only in files and there is no need for other kind of data granularity.
- For example, depending on the QoS requirements on file access, the files can be secured through one of the many known mechanisms (Unix permissions, Access Control Lists (ACLs), etc.
- The data identified by a GDH may correspond to a database, a table, a view or even to a single row in a table of the RDBMS.
- 15.3.1.1 The Grid data handle GDH.
- The common requirement for all these kinds of data is that the logical identifier of the data, the GDH, be globally unique.
- The assignment and the checking of the GDH can be performed by the Data Registry (DR) (see below)..
- One of the fundamental differences between Grids and the Web is that for Web services, ultimately, there always is a human being at the end of the service chain who can take corrective action if some of the services are erroneous.
- 15.3.1.2 The Grid data reference.
- The elements of the GDR include physical location, available access protocols, data lifetime, and possibly other metadata such as size, creation time, last update time, created by, and so on.
- 15.3.1.3 The data registry.
- The SDEs of this service will hold most of the additional information on the GDR to GDH mapping, but there may be other more dedicated services for very specific kinds of GDH or GDR metadata..
- To give an example, in Data Grids that only deal with files – as is currently the case within the EU DataGrid project – the GDH corresponds to the LFN and the GDR is a combination of the physical file name and the transport protocol needed to access the file, sometimes also called Transport File Name (TFN) [16–18].
- 15.3.2 The functionality and the services.
- Lifetime management: The ability to specify the lifetime of the data directly (times- tamps and expiration times), or indirectly (through data status flags like permanent, durable, volatile)..
- In all existing Data Grids, data management is one of the cornerstones of the architecture..
- The data management services need to be very flexible in order to accommodate the peculiarities and diverse requirements on QoS of the VOs and their peculiarities with respect to different kinds of data (see previous section) and data access..
- Because of the multitude of different kinds of data, it is essential that the data can be described and validated in the Data Grid framework..
- This functionality may be part of the DR or set up as a separate data registration service..
- Materialization of virtual data: This functionality of the data management services will materialize virtual data according to a set of materialization instructions.
- After materialization, physical copies of the data exist and the corresponding catalogs need to be updated and a GDR assigned..
- GDH assignment and validation: The uniqueness of the GDH can only be assured by the data management services themselves.
- The VOs may have their own GDH generation and validation schemes but in order to be certain, those schemes should be pluggable and complementary to the generic GDH validation scheme of the Data Grid.
- In the case of the location of data based on metadata, there will be higher- level services, probably databases and not just registries that can execute complex queries to find data (see Chapter 14).
- The replica selection functionality of the data management services can select the optimal replica for this purpose..
- Replica load balancing: On the basis of access patterns of frequently used data, this service can initiate replication to improve load balancing of the Data Grid.
- The strategy to be chosen is again dependent on the QoS requirements of the VO in question: how much latency is allowed, requirements on consistent state between replicas, and so on.
- Application metadata are metadata managed by the Grid on behalf of the users and are treated simply as data by the Grid system.
- The GDR should contain most of the necessary technical metadata on the given physical instance of the data.
- There is, however, a need for metadata services that store both technical and application metadata on all the other aspects of the data management services..
- The details of the security infrastructure are not worked out yet, but these and other secu- rity metadata will certainly have to be stored in the Grid.
- Subscription configurations : This is metadata to be used by the subscription service, defining the source and destination of the automated replication process.
- Data security is another one of the cornerstones of Data Grids.
- Most of the data management services need to be monitored and controlled by other services or users.
- The status of the services needs to be published, statistics need to be kept for optimization and accounting purposes, services need to be deployed or terminated upon request, and so on..
- one possibility is to set up services in the scheme of the Grid monitoring architecture that follows a consumer-producer model [27]..
- In this section we investigate the steps that need to be taken to provide OGSA versions of the services described above.
- Transactions, which are very important in Data Grids, may be implemented more easily in higher-level services by building upon the functionality of the factories.
- Figure 15.2 Elements of the data storage, transfer, management, and metadata services and their logical layering.
- One attractive possibility is to build the replica manager with the factory interface to many of its fellow high-level services so that it can instantiate any of the services, providing the functionality upon request..
- It also does not make sense to add the registry functionality to any other service because of the vital role of the registry.
- Model 1 : A very natural possibility is to keep a set of factories with a very long lifetime (possibly equal to the lifetime of the VO) that create the necessary services to assure the QoS requirements of the VO and keep them alive themselves by using the OGSA lifetime-extension mechanism.
- The preferred model depends on the applications and the usage patterns of the VOs..
- In the case of Model 2 and 3, it is important that applications do not set very long lifetimes on the services, since in the case of the failure of the applications the services would remain too long..
- Managing the data lifetime is orthogonal to managing the lifetime of the Data Grid ser- vices.
- Nevertheless, the GSH, the service that is associated with accessing persistent data need not change even if the implementation of the service has changed radically..
- The GDR may also change radically, the schema of the data it.
- Information on migration of the data might be kept in the metadata catalogs as well if it is desirable..
- If the site has only storage capabilities, it will need most of the data management services discussed above.
- This can only happen by having reference implementations and deploy- ments of OGSA-compliant Grid middleware that will eventually expose the strengths and weaknesses of the architecture.
- The current state of OGSA is analyzed by addressing each of the.
- As we have mentioned before, the robustness of the system needs to be ensured also in a scalable manner..
- The ability to set up VOs that fulfill many different QoS requirements is highlighted as one of the most desirable properties of Grids.
- There might be the need to define a QoS namespace to be able to query this property of services more explicitly in the WSDL description of the GSR.
- Each service also needs to declare its own internal QoS metrics and to give a value in a specific instance in the case in which different instances of the same service can be set up such that the given metrics can change..
- hence our introduction of the concept of Grid Data References and the Data Registry..
- Is there the need for a Grid service that deals with these issues, or should each of the services have an interface addressing this, making it part of the GridService base interface? By deferring this problem to a later time, the design decision is made that security needs to be dealt with at the protocol layer or by the higher-level services..
- An issue is how to delegate rights to automated Grid services that need to use the resources on behalf of the user even if the user did not initiate their usage explicitly..
- Security will have to be dealt with very soon within OGSA since it will depend on the success of the underlying security framework.
- Interoperability is explicitly mentioned as a requirement and is one of the driving concepts behind OGSA.
- It resembles the concepts of the Grid monitoring architecture [27], but without the possibility of registering notification sources with the target (sink) of the notification event.
- As mentioned before, users of many services and services that want to interoperate need to get hold of the service descriptions to discover which services meet their needs or which services are still missing to achieve a given QoS.
- The Registry needs to be searched to find the GSHs of the services that fulfill the user requirements – formulated in a query if necessary.
- The HandleMap then can be contacted to retrieve the detailed description of the services in question..
- By holding a GSH one can get at the corresponding (WSDL) description and the HandleMap is bound to the HTTP(S) protocol to assure availability of the description without another necessary discovery step..
- This whole mechanism, however, leads us back to the service discovery problem: How do we get the GSH of the relevant registries in the first place? There has been significant effort in the P2P community to provide robust and scalable protocols addressing this issue, like Chord [29]..
- How is it possible to get the handles of the registries that we can query to get a set of services that we might be interested in, that is, how do we find the registry or registries relevant to a given query?.
- How is a query formulated to do so? OGSA considers using XQuery [30, 31] to query the Service Data of the Registry, but then we need SDEs defining QoS.
- We have introduced the notion of the GDH, which is the unique logical identifier of the data and the GDR pointing to a physical instance and describing the data if necessary – in analogy to the GSH and GSR in OGSA.
- One of the real issues is how the data is actually located.
- we have introduced the notion of the Data Registry, which holds the GDH to GDR mappings.
- (2001) The data grid:.
- (2001) The anatomy of the grid.
- (2001) An Overview of the Web Services Inspection Language, http://www.ibm.com/developerworks/webservices/library/ws-wsilover..
- of the Intl.
- The physiology of the grid, An Open Grid Services Architecture for Distributed Systems Integration.
- (2001) The D0 experiment data grid – SAM, Proceedings of the Second International Workshop on Grid Computing GRID2001.
- Proceedings of the 19th IEEE Symposium on Mass Storage Systems, 2002.

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt