What is CHESS Hybrid Cloud Platform? 

Easy to use, calculate at any time

CHESS Hybrid Cloud Platform is a high performance computing (HPC) platform for integrating in-house and public cloud resources. It provides the benefits associated with on-demand cloud solutions while guaranteeing data security.  

CHESS Hybrid Cloud Platform realises unified management of complex data cluster with high efficiency and low cost. On a single platform and using a single set of standards, it achieves efficient deployment of enterprise applications in a multi-cloud environment, and greatly speeds up application processing.

Benefits

CHESS_Icon2.png

Support for Hybrid Use of In-House HPC and Cloud Resources

The local cluster can be linked with the public cloud to quickly realise a hybrid cloud; public cloud computing resources can be flexibly scaled according to demand; provides virtually unlimited cloud resources.

CHESS_Icon2.png

Private Cluster Management Tool

A private cluster management tool that provides rapid deployment, centralised management, and unified scheduling.

CHESS_Icon3.png

One-Stop Service

Provides optimised and customised application software and cluster management software to users; one-click link to customise the user’s cluster; online use of data files, application software, computing resources, storage resources; online monitoring and other functions; 7*24 hours online service.

CHESS_Icon5.png

Multi-Architecture Support

X86 and ARM64 architecture server

CHESS_Icon6.png

Support Various AI Open Source Frameworks

TensorFlow, Caffe and others

CHESS_Icon4.png

GPU Monitoring and Scheduling

Support single card & multi-card GPU card sharing

Functions

With its modular design, CHESS can freely select combinations of modules according to user demand. Modules include deployment, cluster management, cluster arrangement, monitoring, job scheduling, hybrid cloud, statistics and billing, and WEB portal.

Deployment Module

The deployment module helps system administrators deploy the operating system and software applications, efficiently and conveniently.

  • Batch installation, rapid deployment;
  • Elastic extension and dynamic scaling of nodes;
  • System backup and restore functions;
  • System imaging and customised software packages for different nodes;
  • Unified deployment of operating system, management software and application environments.

Cluster Management Module

This module provides node management, parallel command, remote switch machine and other functions; NFS shared directory management, operation logs and machine on-off records are implemented via the Web based interface.

  • Node role management
    The role of a node can be switched by checking the character column letter (M/I/E/T).
  • Node status and node operation
    Viewing node information, including online status, whether to allow the submission of jobs, single or batch node operations (delete, switch machine, restart, new mirror, restore node, SSH, VNC, etc.)
  • Shared directory management
    Shared directories can be created via the web interface. One can edit mount points without complex NFS shared file system configurations.
  • Operating system imaging
    Node system image management provides one-click recovery of the operating system.
  • Cluster operation log queries
    Various log of cluster operations can be queried including the contents, time, results, users, etc.

    Monitoring Module

    The system administrator can monitor the physical cabinet view, the system cluster, node operation and resource usage. It also supports webpage, email alarm and alarm threshold settings.

    • Intuitive cluster monitoring
      The physical cabinet view shows the node position, the node status information, including server loading, online status, CPU temperature, etc.
    • Cluster/node performance status monitoring
      Real time monitoring of the cluster/node CPU, memory, swap partition, network, disk, loading and other performance indicators.
    • File system usage
      Listing of the cluster shared directories and the mount points under each shared directory, and run status details.
    • Fault notification
      When there are node failures or the load of CPU, memory and other indicators is too high, notification will be sent via short message service (SMS) or email. Notification history is maintained for further reference.
    • Alarm threshold setting
      Alarm thresholds can be customised for different scenarios.
    • Performance Analysis
      Performance parameters of the node can be set and displayed in real-time, based on specific time ranges.
    • GPU card monitoring
      The performance of each GPU card can be monitored.

    Job Scheduling Module

    This module optimises the cluster system hardware and software resources, reducing job response time and supporting multiple job submission templates. It simplifies cluster resource management, providing a clear view of the node CPU usage and configuration of the resource manager. One can also edit/delete/compress scripts directly via the web interface.

    • Unified job management interface
      View the status, queue, and owner information of job submissions from the job management list, and delete/ stop jobs.
    • Compute nodes configuration
      View the number of cores and CPU utilization of each node in the cluster, monitor node job submission, modify node properties, and control node resources.
    • Scheduling policy
      Provide resource reservations, Backfill algorithm, dynamic priority, fair sharing, quota management, system diagnosis, system monitoring and statistics and other functions. It supports QoS/ preemption strategy and policy-based scheduling; jobs can access to cluster resources based on their priority.
    • User group policies
      User group policies include maximum number of jobs, maximum number of processors, maximum memory, maximum hard disk, maximum wall time, priority, etc.
    • Resource reservation
      Computing resources can be reserved for users, ensuring that the job has available computing resources at a specific time.
    • Multiple application templates, flexible job submission
      Job template can be used to simplify similar job submission; new template can be created for any application.
      Job submission options: command line, web interface, application integration interface, job script and executable file submission. Common applications can also be set as templates.
    • Comprehensive file management
      The intuitive web interface allows users to create, edit, upload, download, copy, cut, paste, compress, and decompress files.

    Hybrid Cloud Module

    The hybrid cloud module integrates the local servers and public cloud resources into an integrated HPC cluster system and application environment. It can be expanded as needed and deployed flexibly, greatly improving computing power and speeding up application processing.

    • Hybrid cloud node management
      Through the web based interface, one can manage the basic information of the node, including hostname, MAC address, IP address, role, specification, status, creation time, etc.
    • Cloud node provisioning
      Provisioning of public cloud resources can be done as required; a flexible pricing scheme includes monthly or yearly subscription, or even pay-per-use. After the application is completed, relevant information can be viewed on the hybrid cloud node.
    • Cloud node operation
      Node management operations support startup, shutdown, forced shutdown, restart, forced restart and release.
    • Cloud storage management
      Job data can be read and written into shared storage NAS storage or directly created and mounted to public cloud nodes. Such storage can also be removed as needed.

    Statistics and Billing

    Various reporting options includes rich data resource statistics, report overview, individual report details, etc.; PDF/HTML/Excel formats are supported.

    • Cluster computing resource usage statistics
      One can generate cluster system CPU/memory/swap partition/storage usage and completed job/running job/waiting to process job data reports.
    • Statistic resource consumption and flexible rate setting
      Combined with user (group) CPU usage time and run time, one can set charge rates flexibly to generate bills.

    Web Portal

    The Web portal provides comprehensive facilities and various management functions, including monitoring, job scheduling, reporting, and hybrid cloud module.

    • Access control
      In cluster management, job scheduling, cluster monitoring and report statistics module, the administrator can set the user access right and assign the user function module via the Web interface.
    • Service management
      The interface also provides service monitoring, view service status/time/CPU utilisation/memory usage, start/terminate/monitor service projects.
    • Users and groups
      The Web interface provides the create/edit/delete user (group) functions; one can view the groups of each user, changing the group password, etc.

    Industrial Applications

    CHESS Hybrid Cloud Platform can be widely used in aerospace, automotive, electronics, education, scientific research, petroleum, meteorology, life sciences, manufacturing, artificial intelligence and other domain which have high computational demands.

    • Manufacturing: Ansys, Fluent, Abaqus, CFX, Numeca, etc.
    • Computational chemistry: VASP, GROMACS, LAMMPS, NAMD, Gaussian, Materials Studio, etc.
    • Meteorological applications: MM5, Grapes, CESM, WRF, wrf-chem, etc.
    • Biomedical applications: Anaconda, Bioconda, bwa, FastQC, etc.
    • Scientific computing: Matlab, R, Mathematica, etc.
    • Artificial intelligence: TensorFlow, Caffe, etc.
    IBM_industrialApp.png
    Contact us to discover how we can grow your business.
    Call Us :
    (Mainland China) 
    (Hong Kong)