« Home « Kết quả tìm kiếm

Using Open Source Desktop Grids in Scientific Computing and Visualization


Tóm tắt Xem thử

- The flexible, object oriented structure and the modularity of the system allows improvements and further extensions to other programming languages to be made easily..
- QADPZ supports parallel programs running on the desktop grid, by providing and API in the C/C++ language, which implements a subset of the MPI standard..
- The user can choose from different algorithms, depending on the application, improving both the communication overhead imposed by large data transfers and keeping privacy of the data.
- Free availability of the source code allows its flexible installations and modifications based on the individual needs of research projects and institutions.
- The extensions and refinements to the original master-worker paradigm concern mainly both the reduction of the time consumed for delays in communications and the increase of the period of time in which the workers perform computations.
- The most important of them are as follows: pulling or pushing of work units, pipelining of the work-units at the worker, sending more work-units at a time, adaptive number of workers, adaptive timeout interval for work units, multithreading, redundant computation of the last work-units to decrease the time to finish, overlapped communication and computation at both the workers and the master, and use of compression to trim down the dimension of messages (Constantinescu 2008.
- It starts with a justification for our endeavour to build a new DG system, and continues with the main capabilities of QADPZ, and with the improvements of the master-worker model implemented in the system.
- The former, usually called simply desktop grid (or local DG, or LAN-based DG), refers to a grid infrastructure that is confined to an institutional boundary, where the spare processing capacity of an enterprise’s desktop PCs are used to support the execution of the enterprise’s applications.
- Volunteers are typically members of the general public who own Internet- connected PCs.
- Several aspects of the project/volunteer relationship are worth to be noticed (BOINC, 2010):.
- (3) volunteers are ought to trust that projects will provide applications that will not damage their computers or invade their privacy, will be truthful about both work that is performed by its applications and use of the resulting intellectual property, and will follow proper security practices, so that the projects will not be used as vehicles for malicious activities.
- Moreover, there is no need for using screensaver graphics and, in fact, it may be desirable that the computation is completely invisible and out of the control of the PCs’ users.
- When the computer is needed back by its user, the screen saver instantly gets out of the way and it only continues its analysis when the computer is not used anymore.
- of the SETI process.
- None of the pieces depends on the other pieces, which makes large deployment of clients and computations over the Internet very easy..
- BOINC is open-source software that can be used for both volunteer computing and desktop grid computing..
- The main requirement of the application is that it shall be divisible into a large number (thousands or millions) of jobs that can be executed independently..
- BOINC stats, 2010), which is more than the processing power of the fastest existing supercomputer system (Jaguar - Cray XT5), with a sustained processing rate of 1.759 PetaFLOPS (Wikipedia, 2010.
- The research areas of the problems to be solved vary from mathematics (e.
- The company's aim has been to test the security of their own products and to demonstrate the vulnerability of the encryption schemes they considered inadequate (Hayes, 1998)..
- The focus of the distributed.net project is on very few specialized computing challenges..
- Condor also considers several layers of priority values: the priority assigned to the resource request ad by the user, the priority of the user which submitted that ad, and the desire of the machines in the pool to prefer certain types of ads over others (Condor, 2010)..
- This section includes a brief description of the QADPZ system, starting with a justification for our endeavour to develop a new DG platform, and continuing with both the main QADPZ capabilities and the improvements of the master-worker conceptual model that is implemented in the system..
- 3.1 Why to develop a new desktop grid system?.
- One reason is that many of the existing systems at the time when the QADPZ project started were highly specialized in a very limited set of computationally challenging problems, and hence they did not allow the execution of a general application.
- Moreover, at the time of the development, the source code of the available desktop grid systems was generally not available, therefore making difficult the extension or analysis of any new, non-standard application.
- In addition, most of the existing systems usually had a complicated deployment procedure, requiring high-level, privileged access to the desktop computers and that made very hard to use such systems on a larger scale, and also complicated the further maintenance of computers – e.g.
- Therefore, we have created an open architecture able to evolve in pace with the needs and challenges of the continuously changing real world.
- The main functional capabilities of the system concern resource sharing and management, job management, heterogeneity, simple installation and maintenance, support for parallel programming, network support, autonomy, performance measurements, multi-project use, and on-line/off-line support (Constantinescu, 2008;.
- The most important resources that need to be shared are the idle computational cycles of the desktop machines that contribute to the system.
- though, the owners of the desktop computers that share resources keep control of them, and are able to both define use policies and retract some resources on their will..
- It is the responsibility of the user who submits the jobs to provide the appropriate binary files for execution on different platforms.
- The system is easy to install on a large number of computers in a network, and further maintenance of the installed programs it is be minimal.
- QADPZ is able to provide information about its performances, which could be used for better usage of the available resources.
- In contrast, interactive jobs provide real-time feedback of the execution, and the user can inspect the partial result, and interact with the execution of the application..
- The QADPZ user interface provides for monitoring and controlling of the behaviour of the system, and the programming interface allows different user applications to interact with the system.
- The presence of a user logging into a slave computer is automatically detected and the task is killed or moved to another slave to minimize the disturbance of the regular computer users (Constantinescu, 2008.
- The smallest independent execution unit of the QADPZ is called a task.
- In this model, a number of worker processes are available, each being capable of performing any one of the steps in a particular computation..
- The master keeps a record of all the work units of the computation that it is assigned to perform.
- As each work unit is completed by one of the workers, the master records the result.
- The program works in the same way irrespective of the number of workers available - the master just gives out a new work unit to any worker who has completed the previous one (Constantinescu, 2008;.
- In the case of static decomposition the master generates all the work-units in the beginning of the computation, while in dynamic decomposition the computation starts with a small number of work-units, and later new work-units are created, depending on the results of already executed work-units.
- Distribution of work-units to the workers can be of two types: static or dynamic.
- In the former case, the master processor decides on the distribution of work at the start of the computation, by assigning the work-units to the workers, and in the latter the distribution of work-units varies between workers as the computation proceeds..
- QADPZ uses an improved version of the master-worker model, which is based on an algorithm with dynamic decomposition of the problem and dynamic number of workers..
- The improvements regard the performance of the original model by increasing the time workers are doing computations, and decreasing the time used for communication delays..
- This is achieved by using different techniques, such as using pulling or pushing of work units, pipelining of the work-units at the worker, sending more work-units at a time, adaptive number of workers, adaptive timeout interval for work units, multithreading, redundant computation of the last work-units to decrease the time to finish, overlapped communication and computation at the workers and the master, and use of compression to reduce the size of messages (Constantinescu, 2008.
- If this communication time is comparable with the time needed for executing a work-unit, the efficiency of the worker is reduced very much.
- The user is the initiator of the transaction.
- The master will further send more work- units to the worker.
- This moves all decisions about initiating work-units transfers to the master, allowing a better control and monitoring of the overall computation..
- One way to do that is to use work-units pipelining at the worker, thus making sure that the worker has a new work-unit available when it finishes the processing of the current work-unit.
- In the beginning, the master sends more than one work-units to the worker, then after each received result, sends another work-unit to be queued on the worker.
- The master controls the total number of workers used for computation, since it is the one sending out work-units to the workers.
- The number of workers is automatically reduced if the efficiency of the computation decreases.
- We employ a heuristic- based method that uses historical data about the behavior of the application.
- As Richard Hamming has observed many year ago, “the purpose of computing is insight, not numbers”.
- Scientists and engineers develop software systems that implement the mathematical models of the studied systems and run these programs with various sets of input parameters.
- The ability of scientists to visualize complex computations and simulations is essential to ensure the integrity of the analysis, to give insights, and to communicate about them with others.
- Some of the domains and directions in which Scientific Computation and Visualization are able to give valuable insight are listed here: engineering, computational fluid dynamics, finite element analysis, electronic design automation, simulation, medical imaging, geospatial, RF propagation, meteorology, hydrology, data fusion, ground water modeling, oil and gas exploration and production, finance, data mining/OLAP, numerical simulations,.
- Further on in this section, we presents some of the scientific and visualization experiments we have performed to prove the viability of the QADPZ system: a real world problem:.
- Analysis of these specific problems is based on the prediction of the flow circulation and transport of different materials, either suspended in water or moving along the free surface or the bottom.
- This grid is assumed to be detailed enough to describe the main flow field of the Trondheim fjord (Constantinescu 2008.
- Shading the discretized cells according to the value of the scalar data.
- a) Map of the Trondheim fjord b) Grid of the Trondheim fjord.
- The image is created beginning with a white noise that is then convoluted along integral lines of the given vector field.
- The local nature of the LIC algorithm recommends a parallel implementation, which could, in principle, compute all pixels simultaneously.
- An evaluation of the performance of the system is presented, by comparing it with running the same simulation on a typical dedicated cluster environment..
- To solve the incompressible Navier–Stokes equations we have used a version of the well–.
- Implementation of the flow solver has been done in C++ using the object oriented numerical library Diffpack.
- An MPI library with a subset of the most used MPI calls has been implemented on top of QADPZ’s communication protocol.
- First, we used the original implementation of the solver’s software (which uses MPICH as a parallel communication protocol) to run the cluster version of the simulation.
- Second, we recompiled the solver using the QADPZ-MPI library to create the distributed computing version of the simulation.
- The solver was run using exactly the same computers of the cluster (i.e.
- Simulations were done in two different times of the day: during the night, when network traffic in the LAN is minimal, and during working hours, when the LAN traffic is significant..
- The advantages of using our system are as follow: the installation of the slave on the computational nodes is extremely easy, only one executable file and one configuration file are needed.
- Upgrade of the slaves is done automatically from the master, without any administrator intervention.
- Also, there is no need for a shared file system, since each of the slaves is downloading by itself the necessary files.
- In the Fig.
- Scientific breakthroughs are enabled by insight, and, better visualization of an issue provides for a better understanding of the underlying science, and often for the discovery of something profoundly new and unexpected..
- Advanced capabilities for visualization may prove to be as critical as the existence of the supercomputers themselves for scientists and engineers, and also for specialists in other domains.
- Then the gaming industry has made a breakthrough, under the pressure of the gamers who have requested more and more graphical power, by developing very high performance graphics cards, at very low costs (commodity hardware)..
- The core idea of the work presented in this chapter has been to provide a desktop grid computing framework and to prove its viability by testing it in some Scientific Computing and Visualization experiments.
- Another feature to be added with high priority is handling of the restart of the master computer.
- In the current version of the system, each job needs a different client process, and currently we consider extending the client functionality to allow a single client instance to optionally connect to multiple masters and handle multiple jobs.
- Future developments of the system include more complete implementation of the MPI layer and development of a complete library of the collective communication routines..
- Current QADPZ’s implementation is limited to a small subset of the MPI standard.
- Dynamic balancing of the workload can be used for tackling this issue.
- The current user interface to the system is based on C.
- Possible extensions of the system would be different interfaces for other languages, e.g.
- This can easily be done, since the message exchanges between different components of the system are based on an open XML specification..
- The current implementation of the system is made considering only one central master node.
- Proceedings of the 20 th Annual Conference on Computer Graphics and Interactive Techniques SIGGRAPH 1993, pp.
- EDGeS: bridging Desktop and Service Grids, Proceedings of the 2 nd Iberian Grid Infrastructure Conference (IBERGRID 2008), pp.
- A Desktop Grid Computing Approach for Scientific Computing and Visualization, PhD Thesis, Norwegian Univ.
- Extending the EGEE Grid with XtremWeb-HEP Desktop Grids, Proceedings of the 2010 10 th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp.
- Building a Condor Desktop Grid and its exercises,.
- Leveraging HTC for UK eScience with Very Large Condor Pools: Demand for Transforming Untapped Power into Results, Proceedings of the UK e-Science Programme All Hands Meeting (AHM 2004), pp.
- Desktop Grid for High Energy Physics, available at http://www.xwhep.org, accessed September 2010