Workspaces/Advanced Features/Filesystems: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
(Created page with "= Multiple Filesystem Locations = {| class="wikitable" |- !style="width:40%" | Works on cluster !style="width:10%" | bwUC 3.0 !style="width:10%" | BinAC2 !style="width:10%" | Helix !style="width:10%" | JUSTUS 2 !style="width:10%" | NEMO2 |- |<tt>-F</tt> option (multiple filesystems) |style="background-color:#90EE90; text-align:center;" | ✓ |style="background-color:#FFB6C1; text-align:center;" | ✗ |style="background-color:#FFB6C1; text-align:center;" | ✗ |style="b...")
 
mNo edit summary
 
Line 1: Line 1:
= Multiple Filesystem Locations =
= Multiple Filesystem Locations =

'''Most users don't need special filesystem options.''' On all clusters, workspaces are created on the default high-performance filesystem without any options - this works for standard I/O workloads.

== Do I Need the -F Option? ==

'''For standard I/O (large files, sequential access):'''
* '''All clusters:''' Just use <tt>ws_allocate myWs 30</tt> (no <tt>-F</tt> needed)
* The default filesystem handles standard workloads well

'''For special workloads (AI/ML, many small files, random I/O):'''
* '''NEMO2:''' Default Weka filesystem works great - no <tt>-F</tt> needed
* '''bwUniCluster 3.0:''' Use <tt>-F ffuc</tt> for flash filesystem
* '''Other clusters:''' Use <tt>$TMPDIR</tt> or default workspace

== Check Available Filesystems ==

$ ws_list -l # List available filesystems

If only one filesystem is listed, you're all set - just use <tt>ws_allocate</tt> without <tt>-F</tt>.

== When -F Option is Available ==


{| class="wikitable"
{| class="wikitable"
Line 10: Line 31:
!style="width:10%" | NEMO2
!style="width:10%" | NEMO2
|-
|-
|<tt>-F</tt> option (multiple filesystems)
|<tt>-F</tt> option
|style="background-color:#90EE90; text-align:center;" | ✓
|style="background-color:#90EE90; text-align:center;" | ✓
|style="background-color:#FFB6C1; text-align:center;" | ✗
|style="background-color:#FFB6C1; text-align:center;" | ✗
Line 18: Line 39:
|}
|}


Only '''bwUniCluster 3.0''' offers multiple filesystems via <tt>-F</tt> option.
Some clusters offer multiple filesystem locations for workspaces with different characteristics:


== Cluster-Specific Information ==
'''bwUniCluster 3.0:'''
* Default workspace filesystem (Lustre)
* Flash-based workspace filesystem (<tt>ffuc</tt>) - for KIT/HoreKa users only
** Lower latency and better performance for small files
** SSDs instead of hard disks
** Shared between bwUniCluster 3.0 and HoreKa


=== NEMO2 ===
'''Example creating workspace on flash filesystem:'''


'''Default Weka filesystem (no -F needed):'''
$ ws_allocate -F ffuc myworkspace 60
* Excellent for all workloads - standard I/O, small files, random access
* Handles AI/ML training, compilation, and general workloads efficiently
* Just use: <tt>ws_allocate myWs 30</tt>


=== bwUniCluster 3.0 ===
Use <tt>ws_list -l</tt> or <tt>ws_find -l</tt> to see available filesystem locations on your cluster.


'''Default Lustre filesystem (no -F needed):'''
== Choosing the Right Filesystem ==
* Best for standard I/O: large files, sequential access
* General-purpose workload
* Use: <tt>ws_allocate myWs 30</tt>


'''Flash filesystem with -F ffuc:'''
'''Note:''' Most clusters have only one default filesystem - the <tt>-F</tt> option is rarely needed. Use <tt>ws_list -l</tt> first to check if multiple filesystems are available on your cluster.
* SSD-based storage for special workloads
* Shared between bwUniCluster 3.0 and HoreKa (KIT/HoreKa users only)
* Use for: AI/ML datasets, many small files, random I/O, compilation
* Use: <tt>ws_allocate -F ffuc myWs 30</tt>


=== Other Clusters (BinAC2, Helix, JUSTUS 2) ===
=== bwUniCluster 3.0 Filesystems ===


* Single default filesystem (no <tt>-F</tt> option available)
'''Default Lustre filesystem:'''
* Good for all standard workloads
* Standard workspace location
* For special workloads with many small files, consider using <tt>$TMPDIR</tt>
* Best for large files and sequential I/O
* General-purpose storage


== Simple Decision Guide ==
'''Flash filesystem (ffuc):'''
* SSD-based storage for KIT/HoreKa users
* Shared between bwUniCluster 3.0 and HoreKa
* Use for workloads with:
** Many small files
** Random I/O patterns
** AI/ML training
** Compilation and builds
* Balance load: use <tt>-F ffuc</tt> when appropriate to reduce load on default filesystem


{| class="wikitable"
=== General Guidelines ===
|-
!style="width:30%" | Your Workload
!style="width:35%" | NEMO2
!style="width:35%" | bwUniCluster 3.0
|-
|Standard I/O (large files)
|<tt>ws_allocate myWs 30</tt>
|<tt>ws_allocate myWs 30</tt>
|-
|AI/ML training
|<tt>ws_allocate myWs 30</tt>
|<tt>ws_allocate -F ffuc myWs 30</tt>
|-
|Many small files
|<tt>ws_allocate myWs 30</tt>
|<tt>ws_allocate -F ffuc myWs 30</tt>
|-
|Random I/O
|<tt>ws_allocate myWs 30</tt>
|<tt>ws_allocate -F ffuc myWs 30</tt>
|-
|Compilation/builds
|<tt>ws_allocate myWs 30</tt>
|<tt>ws_allocate -F ffuc myWs 30</tt>
|-
|Single-node temporary
|colspan="2" style="text-align:center;" | Use <tt>$TMPDIR</tt>, not workspaces
|}


== Quick Reference by Data Type ==
'''Flash-based filesystems (SSD/NVMe):'''
* Use for many small files
* Best for low-latency requirements
* Ideal for random I/O patterns
* Examples: Weka (NEMO2), ffuc (bwUniCluster 3.0)


{| class="wikitable"
'''Standard Lustre/parallel filesystems:'''
|-
* Best for large files
!style="width:40%" | Data Type
* Optimized for sequential I/O patterns
!style="width:60%" | Where to Store
* General-purpose workload support
|-

|Large files, standard I/O
=== Data Type Recommendations ===
|Default workspace (no <tt>-F</tt>) on all clusters

|-
'''Large sequential I/O:'''
|AI/ML datasets
* Use standard workspace filesystem
|NEMO2: default workspace; bwUniCluster 3.0: <tt>-F ffuc</tt>
* Lustre: best for very large files
|-
* Weka: excellent for both large and small files
|Many small files

|NEMO2: default workspace; bwUniCluster 3.0: <tt>-F ffuc</tt>
'''Many small files or random access:'''
|-
* Use flash-based workspace filesystem (Weka, ffuc)
|Random I/O patterns
* Or stage to <tt>$TMPDIR</tt> on compute nodes
|NEMO2: default workspace; bwUniCluster 3.0: <tt>-F ffuc</tt>

|-
'''Data read multiple times on single node:'''
|Single-node temporary
* Copy to <tt>$TMPDIR</tt> at job start for best performance
|Always <tt>$TMPDIR</tt>, not workspaces

|-
'''Temporary data for single node:'''
|Multi-node shared data
* Always use <tt>$TMPDIR</tt>, not workspaces
|Default workspace on all clusters

|-
'''Multi-node temporary data:'''
|Compilation/builds
* Use workspaces (not suitable for <tt>$TMPDIR</tt>)
|NEMO2: default workspace; bwUniCluster 3.0: <tt>-F ffuc</tt> or <tt>$TMPDIR</tt>

|}
'''AI/ML training data:'''
* Use Weka (NEMO2) or flash filesystems for best performance
* Or stage to <tt>$TMPDIR</tt> for repeated access

'''Compilation/build directories:'''
* Use flash-based filesystems (Weka, ffuc)
* Or <tt>$TMPDIR</tt> for better performance


For more information about specific filesystems, see the [[Workspaces/Advanced_Features/Quotas|Quotas & Limits]] page.
For quota information, see [[Workspaces/Advanced_Features/Quotas|Quotas & Limits]].

Latest revision as of 17:32, 2 December 2025

Multiple Filesystem Locations

Most users don't need special filesystem options. On all clusters, workspaces are created on the default high-performance filesystem without any options - this works for standard I/O workloads.

Do I Need the -F Option?

For standard I/O (large files, sequential access):

  • All clusters: Just use ws_allocate myWs 30 (no -F needed)
  • The default filesystem handles standard workloads well

For special workloads (AI/ML, many small files, random I/O):

  • NEMO2: Default Weka filesystem works great - no -F needed
  • bwUniCluster 3.0: Use -F ffuc for flash filesystem
  • Other clusters: Use $TMPDIR or default workspace

Check Available Filesystems

  $ ws_list -l                             # List available filesystems

If only one filesystem is listed, you're all set - just use ws_allocate without -F.

When -F Option is Available

Works on cluster bwUC 3.0 BinAC2 Helix JUSTUS 2 NEMO2
-F option

Only bwUniCluster 3.0 offers multiple filesystems via -F option.

Cluster-Specific Information

NEMO2

Default Weka filesystem (no -F needed):

  • Excellent for all workloads - standard I/O, small files, random access
  • Handles AI/ML training, compilation, and general workloads efficiently
  • Just use: ws_allocate myWs 30

bwUniCluster 3.0

Default Lustre filesystem (no -F needed):

  • Best for standard I/O: large files, sequential access
  • General-purpose workload
  • Use: ws_allocate myWs 30

Flash filesystem with -F ffuc:

  • SSD-based storage for special workloads
  • Shared between bwUniCluster 3.0 and HoreKa (KIT/HoreKa users only)
  • Use for: AI/ML datasets, many small files, random I/O, compilation
  • Use: ws_allocate -F ffuc myWs 30

Other Clusters (BinAC2, Helix, JUSTUS 2)

  • Single default filesystem (no -F option available)
  • Good for all standard workloads
  • For special workloads with many small files, consider using $TMPDIR

Simple Decision Guide

Your Workload NEMO2 bwUniCluster 3.0
Standard I/O (large files) ws_allocate myWs 30 ws_allocate myWs 30
AI/ML training ws_allocate myWs 30 ws_allocate -F ffuc myWs 30
Many small files ws_allocate myWs 30 ws_allocate -F ffuc myWs 30
Random I/O ws_allocate myWs 30 ws_allocate -F ffuc myWs 30
Compilation/builds ws_allocate myWs 30 ws_allocate -F ffuc myWs 30
Single-node temporary Use $TMPDIR, not workspaces

Quick Reference by Data Type

Data Type Where to Store
Large files, standard I/O Default workspace (no -F) on all clusters
AI/ML datasets NEMO2: default workspace; bwUniCluster 3.0: -F ffuc
Many small files NEMO2: default workspace; bwUniCluster 3.0: -F ffuc
Random I/O patterns NEMO2: default workspace; bwUniCluster 3.0: -F ffuc
Single-node temporary Always $TMPDIR, not workspaces
Multi-node shared data Default workspace on all clusters
Compilation/builds NEMO2: default workspace; bwUniCluster 3.0: -F ffuc or $TMPDIR

For quota information, see Quotas & Limits.