« Home « Kết quả tìm kiếm

A GPU Accelerated High Performance Cloud Computing Infrastructure for Grid Computing Based Virtual Environmental Laboratory


Tóm tắt Xem thử

- Meanwhile, the huge amount of observed/modelled data, and the need to store, process, and refine them, often makes the use of high performance parallel computing the only effective solution to ensure the effective usability of numerical applications, as in the field of atmospheric /oceanographic science, where the development of the Earth Simulator supercomputer [65] is just the edge..
- Moreover, this technology offers several invaluable tools in ensuring security, performance, and availability of the applications.
- The use of the grid computing technologies is often limited to computer science specialists, because of the complexity of grid itself and of its middleware.
- The aim of our virtual laboratory is to bridge the gap between the technology push of the high performance cloud computing and the pull of a wide range of scientific experimental applications.
- The rest of the chapter is organized as follows: section 2 describes the design and implementation of the application orchestration component.
- The need of a high performance file system distributed among virtual machines instances is the main motivation of the section 4.
- Finally, the conclusions and future direction in this field are reported in section 6, where the principal guidelines for improving the usability and the availability of the described software components are drown..
- The growth of the many-core technologies could seem to make obsolete one of the traditional goals of the grid computing technology, i.e.
- We feel that this point is not correct and that the increase of computing power will continue to follow the increase of the need for it and vice-versa.
- This suggests that the computing grids have to be elastic, in the meaning of the ability to allocate, for the needed amount of time, as much computing power as needed by the experiment.
- This process could be hampered by the grid software and hardware infrastructure complexity, but, above all, it is strictly related to the application that has to be grid-enabled.
- The user must know the requirements of the application in terms of CPU, memory, disk, and architecture.
- the application must be functionally decomposed and each component has to be exposed to the grid leveraging on a wrapping web service.
- The fast development of the web service technology, the poor performance in job submission and the complexity in programming and management called.
- The application package is the key component of our implementation and is the blueprint of the job execution environment.
- Each application runs in a private custom environment, so that several instances of the same software can be concurrently executed.
- Within the cloud paradigm, server time and network storage can be provisioned to the user in an on-demand self-service fashion without requiring human interaction.
- According to current demand, computing resources (both physical and virtual) are dynamically assigned and reassigned to serve all users that generally have no control or knowledge over the exact location of the provided resources.
- Any kind of software, including operating systems and applications can be deployed and then run on the best matching computing resource that is provided for rent to the user.
- The highest level of the stack is the Software as a Service (SaaS): one can use the provider’s applications running on a cloud infrastructure from various client devices through a client interface such as a Web browser (e.g., web-based email).
- Cloud- enabled software has to take full advantage of the cloud paradigm by being service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability or encapsulated in a virtual machine image and executed as a virtual machine instance..
- Our Virtual Machine Compiler (VMc) produces a XML description of the needed virtual machine in order to run a generic piece of software in a virtualized environment with minimal resources for achieving a required performance.
- Once installed and configured by the user, the application is executed in a full featured virtual machine under the control of the VMc, that evaluates the disk and memory needs and, above all, the needed services and libraries.
- Finally, the application environment is cloned on the virtual machine image and made ready to be executed within a software appliance.
- Our custom software solution provides domain scientists with a full configurable grid and cloud-computing tool, which minimizes the impact of the computing infrastructure..
- To the user of the virtual laboratory, the view has completely changed: the cloud elasticity permits to consider computing resources as potentially infinite.
- it is independent of the underlying cloud software and interacts with other components as Images, Instances, Locations, and Sizes.
- The virtual network file system permits to the head node user to execute on working nodes using secure shell and perform some other operation in the distributed memory parallel programming ecosystem.
- groups of computing nodes belonging to the same virtual cluster can be allocated on the same real machine, so MPI optimization for virtual networks can be used in the best way.
- The highest advantage in the workflow technology application in e- Science is in the integration of multiple teams involved in management of different parts of the experiment promoting inter and cross organization collaborations [48].
- With the advent of the cloud technology applied to computational sciences the potential importance of scientific workflows increased: the scientist is in the power of allocate all the needed computing power doesn’t caring about the effective availability..
- The main JFM role is the submission of jobs or virtual machine instances creation to the propter computing resource abstracting the Processing Unit in our virtual laboratory component stack.
- As previously stated, the jobs are formally described with an extension of the JSDL, providing syntactic and semantic tools for job description in terms of virtual machines instances.
- In the virtual laboratory an experiment is an aggregated of Processing Units, each of them, connected with the others in a direct acyclic graph, draws the computing steps as the nodes of the experiment graph..
- In particular, the Job component takes in charge the list of the jobs to be completed before the actual processing unit is executed and the list of jobs to be submitted when the PU work eventually ends up..
- One of the most successful GPU based acceleration system is provided by nVIDIA and relies on the CUDA programming paradigm and tools [16].
- This happens because of the communication between virtual and real machines, of the communication between guest and host on the real machine side, and of the issues related to the vendor specific interface (on the virtual machine side).
- The use of the GPGPUs massive parallel architecture in scientific computing is still relegated to HPC clusters.
- A key property of the proposed system is its ability to execute CUDA kernels with an overall performance similar to that obtained by real machines with direct access to accelerators.
- This architectural scheme makes evident an important design choice: the GPU virtualization is independent of the hypervisor..
- Due to the potentially large size of the input/output data of a CUDA kernel, this is particularly relevant for the GPU accelerators that span many pages of contiguous memory.
- Xen [21] is a hypervisor that runs directly on the top of the hardware through a custom Linux kernel.
- Moreover, it is application transparent and implements an automatic discovery of the supported virtual machines [73].
- Unfortunately, most of the promising vmChannel features are not yet fully implemented [11]..
- The use of the TCP/IP stack to permit the communication between virtual machines and a real machine is common feature in virtualization.
- Nevertheless the use of TCP/IP could be interesting in some deployment scenarios where the size of the computing problem, the network performances and, above all, the GPU acceleration justify the general overhead..
- The front-end, using the communicator component, packs the library function invocation and sends it to the back-end.
- Then it executes the CUDA operation, retrieves the results and sends them to the front-end using the communicator..
- Finally, the front-end interacts with the CUDA library by terminating the GPU operation and providing results to the calling program.
- The actual performances (and benefits) depend of the application, the problem size and the network bandwidth..
- According to the underlying idea of high performance cloud computing applications, we implemented the virtual accelerator in a hypervisor independent fashion and in a fully configurable way.
- methods for efficient sharing of the GPU by multiple guest VMs, have to be implemented at the same run level..
- We implemented a wrapper stub CUDA library with the same interface of the nVIDIA CUDA library.
- This component exposes Unix Sockets on virtual machine instances thanks to a QEMU device connected to the virtual PCI bus.
- On the host side, the back end mediates all accesses to the GPU and it is responsible for executing the CUDA calls received from the front end and for returning the computed results.
- In our implementation there is no mapping between guest memory and device memory because the gVirtuS backend acts in behalf of the guest running virtual machine, then device memory pointers are valid in the host real machine and in the guest as well.
- In this way the front end and the back end sides interacts in a effective and efficient way because the memory device pointers are never de-referenced on the host side of the CUDA enabled software (with gVirtuS the host side is the guest virtual machine side): CUDA kernels are executed on our backend side where the pointers are fully consistent.
- The front end prepares a package containing the name of the function to be invoked and the related parameters.
- Nevertheless in the high performance cloud computing context this feature could be extremely limiting: using shared memory a MPI communication channel could implement transparently a latency free network among virtual computing nodes without any change to the distributed memory parallel software.
- The evaluation process results show that the gVirtuS GPU virtualization and the related sharing system allow an effective exploitation of the computing power of the GPUs [45].
- High performance parallel I/O.
- In the figure we show the components of the parallel I/O architecture and on the margins, we depict how the logical components map on physical nodes..
- As described in section, the CUDA calls of the applications running on virtual machines are intercepted in forwarded in zero- copy fashion to the guest machine where they are processed by the CUDA run time.
- The main tasks of the module are buffer management and asynchronous data staging, including write-back and prefetching modules.
- The client-side cache management tier absorbs the writes of the applications and hides the latency of accessing the I/O nodes over interconnect networks..
- The communication with the compute nodes are decoupled from the file system access, allowing for a full overlap of the two operations..
- Running I/O nodes on virtual machines bring benefit when the load is highly variable and starting or shutting down virtual machines can contribute to adapt the cost of operation of the infrastructure.
- The objective of I/O node-side file cache module is to provide the management of a file cache close to the storage and offer efficient transfer methods between virtual machines and the storage system.
- The I/O node-side file cache management tier provides an agile and elastic proxy layer that allows the performance of the file systems to scale up with the load generated by the applications..
- The architecture presented in the previous sections scales both with the computation and I/O requirements of the applications but at the cost of a hierarchical organization of file caches.
- The first file level caching is deployed on virtual machines of the applications and have the role improving the performance of the accesses to the file systems by overlapping the GPGPU and processor cores computation with I/O transfers between computational virtual machines and I/O nodes.
- The application I/O forwarding tier transfers the application write request to the client-side cache management tier.
- On the I/O node a write- back module is in charge of caching the file blocks received from the compute nodes and flushing them to the file system over the storage network.
- Our prefeching solution leverages the characteristics of the stream processing model of the GPGPU applications [32].
- The virtual laboratory.
- The scientist can configure and run his experiment selecting and assembling each component from a palette or submitting a JFDL file to the Virtual Laboratory Engine..
- We have used this application as test bed to evolve our laboratory till the actual state of the art of a fully elastic- Science approach..
- It is a multi-scale model primarily designed to produce forecasts of the concentration of ozone, aerosols and other pollutants.
- Currently, the ocean modeling components of our virtual laboratory primarily consist of the WW3 (WaveWatch III) [69] sea-wave propagation model.
- WW3 is a third generation wave model developed at NOAA/NCEP as an evolution of the WAM model [34].
- Previously, for shallow water simulation, we grid-enabled the SWAN [31] in order to be coupled with WW3 and to obtain a more reliable simulation of the waves in the surf zone..
- The POM grid enabling process required a complete revision of the code in order to implement user provided cases.
- It enables a virtual machine instance, running in a high performance computing cloud, to properly exploit the computing power of the nVIDIA CUDA system.
- We have designed a scalable parallel I/O architecture based on a hierarchy of caches deployed on the computing virtual machines and on the I/O system backend, close to the storage.
- The multiple level I/O pipelining in a data staging approach maps suitably to the characteristics of stream computing model, which exposes the inputs and outputs of the kernel computation, an important piece of information that can be leveraged by parallel I/O scheduling strategies..
- [13] libcloud a unified interface to the cloud.
- [31] Booij, N., Holthuijsen, L.H., Ris, R.C.: The SWAN wave model for shallow water.
- The Anatomy of the Grid: Enabling Scalable Virtual Organizations.
- pPOM: A nested, scalable, parallel and fortran 90 implementation of the princeton ocean model.
- In Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing interface (Espoo, Finland, September .
- In Proceedings of the 3rd ACM Workshop on System-Level Virtualization For High Performance.
- On the Use of Cloud Computing for Scientific Workflows.
- MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface.
- In Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (Denver, Colorado, November .
- Wang The Weather Reseach and Forecast Model: Software Architecture and Performance,"to appear in proceedings of the 11 th ECMWF Workshop on the Use of High Performance Computing In Meteorology, 25-29 October 2004, Reading U.K..
- In Proceedings of the Third international Workshop on Use of P2p, Grid and Agents For the Development of Content Networks (Boston, MA, USA, June .
- In Proceedings of the 2009 9th IEEE/ACM international Symposium on Cluster Computing and the Grid (May .
- In Proceedings of the 16th international Symposium on High Performance Distributed Computing (Monterey, California, USA, June .
- A break in the clouds:.
- In Proceedings of the 17th international Symposium on High Performance Distributed Computing (Boston, MA, USA, June