HPC and data infrastructure at Jülich Supercomputing Centre

Figure 1: Development of the compute infrastructure at JSC since 2004.

The Jülich Supercomputing Centre (JSC) at Forschungszentrum Jülich provides a world-class supercomputing infrastructure for simulation and data science. Starting from 2004, JSC has followed a dual architecture approach that ensured the availability of complementary architectures for general-purpose as well as highly scalable workloads. Following its Modular Supercomputing architectural approach, JSC is now combining these systems as individual Modules into tightly-coupled supercomputers to enable fine-grained work sharing for complex applications, as found, for example, in coupled earth systems models. Accordingly, JSC is looking to extend the recently installed JUWELS Cluster soon with a highly scalable compute Module. Due to development of the processor market, this Module will likely be based on a processor technology that will require modernization and significant code modifications in the existing ESM applications to be used efficiently.

JSC’s compute services are matched with a large storage infrastructure that is designed to meet the varying bandwidth and capacity requirements of classical simulation workloads as well as newly arising learning and analysis use cases. In order to take full advantage of the different storage technologies on the market, each with its own performance and price characteristics, the centralized storage infrastructure is organized in multiple tiers. Each tier is characterized by its capacity, which dictates data retention times, and performance level. The archival tier utilizes tape technology and is optimized for high capacity but low access speed and hence primarily intended for the storage of cold data. With the XCST layer that will be operational end of 2018, JSC introduces an additional capacity-oriented multi-purpose storage tier. This so-called XCST augments the archive layer by providing disk-based storage that enables quicker, though still low bandwidth, access to large data sets and thus bridges between the cold archival storage and the fast HPC filesystems. Additionally, the XCST will enable new data sharing and community-access schemes. For this purpose, a virtual machine hosting infrastructure will be implemented that enables communities to provide custom services based on the data. Additionally, JSC intends to offer an object-store API for the XCST in the near future. The next layer in the storage pyramid, the large-capacity storage tier, provides HPC-focused filesystems that balance capacity and bandwidth requirements. The well-known scratch filesystem – used for intermediate storage of large simulation input and output data – is located on this layer. In the future, JSC is planning to add an additional bandwidth-optimized high-performance storage tier that will leverage non-volatile memory technologies to enable faster application checkpointing and accelerate data-intense applications in the field of simulation and learning. By organizing the storage infrastructure in tiers with different characteristics, the needs of different scientific use cases can be met efficiently. However, the need to manage the data distribution and movement across the tiers will necessitate an optimized data management scheme.

Figure 2: Storage tiers in the JSC supercomputing facility.