Workspace

From bwHPC Wiki
Revision as of 12:01, 18 November 2025 by M Janczyk (talk | contribs) (table with site features)
Jump to navigation Jump to search

Workspace tools provide temporary scratch space called workspaces for your calculations on a central file storage. They are meant to keep data for a limited time – but usually longer than the time of a single job run.

Important

  • No Backup: Data in workspaces is not backed up and will be automatically deleted after expiration
  • Time-limited: Every workspace has a limited lifetime (typically 30-100 days depending on cluster)
  • Automatic Email Reminders: You will receive email notifications before expiration
  • Backup Important Data: Copy important results to appropriate permanent storage before expiration (location depends on your cluster/site policies)

Quick Start - Most Common Commands

Task Command
Create workspace for 30 days ws_allocate myWs 30
Create group-writable workspace ws_allocate -G groupname myWs 30
List all your workspaces ws_list
Find workspace path (for scripts) ws_find myWs
Check which expire soon ws_list -R
Extend workspace by 30 days ws_extend myWs 30
Delete/release workspace ws_release myWs
Restore released workspace ws_restore -l then ws_restore oldname newname

Create Workspace

To create a workspace you need to specify a name and lifetime in days:

  $ ws_allocate myWs 30

This returns:

  Workspace created. Duration is 720 hours. 
  Further extensions available: 3
  /work/workspace/scratch/username-myWs-0

Important: Creating a workspace a second time with the same command is safe - it always returns the same path. This makes it perfect for batch job scripts.

Capture the path in a variable:

  $ WORKSPACE=$(ws_allocate myWs 30)
  $ cd $WORKSPACE

For all options and advanced usage, see the Advanced Features guide.

List Your Workspaces

To see all your workspaces:

  $ ws_list

Shows:

  • Workspace ID
  • Workspace location
  • Available extensions
  • Creation date and remaining time

Useful options:

  • ws_list -R - Sort by remaining time (see what expires soon)
  • ws_list -s - Short format (only names, good for scripts)

Find Workspace Path

Get the path to a workspace for use in scripts:

  $ ws_find myWs

Returns:

  /work/workspace/scratch/username-myWs-0

In scripts:

  $ cd $(ws_find myWs)
  $ WORKSPACE=$(ws_find myWs)

Extend Workspace Lifetime

Extend a workspace before it expires:

  $ ws_extend myWs 30              # Extend by 30 days from now

Or use:

  $ ws_allocate -x myWs 30         # Alternative command

Note: Each extension consumes one of your available extensions (typically 3-100 depending on cluster).

Release (Delete) Workspace

Works on cluster bwUC 3.0 BinAC2 Helix JUSTUS 2 NEMO2
ws_release --delete-data (immediate deletion)

When you no longer need a workspace:

  $ ws_release myWs

What happens:

  • Workspace becomes inaccessible
  • Data is kept for a grace period (can be restored, see below)
  • Real deletion happens later (typically during nighttime)

To free quota immediately:

  $ ws_release --delete-data myWs  # Immediate deletion (WARNING: cannot be recovered!)

Or with older workspace tools:

  $ WSDIR=$(ws_find myWs) && [ -n "$WSDIR" ] && rm -rf "$WSDIR"    # Delete data first (with safety check)
  $ ws_release myWs                                                # Then release

Restore Workspace

Works on cluster bwUC 3.0 BinAC2 Helix JUSTUS 2 NEMO2
ws_restore

If you released a workspace by accident or need to recover an expired one, you can restore it within a grace period:

(1) List restorable workspaces:

  $ ws_restore -l

(2) Create a new target workspace:

  $ ws_allocate restored 60

(3) Restore the expired workspace:

  $ ws_restore username-myWs-0 restored

Note: Use the full name from ws_restore -l (including username and timestamp), not the short name from ws_list.

For detailed restore options, see the Advanced Features guide.

Share Workspace

Works on cluster bwUC 3.0 BinAC2 Helix JUSTUS 2 NEMO2
-g option (group-readable)
-G option (group-writable)

You can share workspaces with team members:

Important: Not all sharing options are available on all clusters. ACL-based methods like ws_share require filesystem support and may not work everywhere. If one method doesn't work, try an alternative approach.

Group-readable workspace (read-only for group):

  $ ws_allocate -g myWs 30

Group-writable workspace (read-write for group, recommended):

  $ ws_allocate -G projectgroup myWs 30

Recommended approach:

  • Use -g or -G flags during workspace creation
  • For read-only sharing: use -g
  • For collaborative work (read-write): use -G groupname
  • Set groupname in ~/.ws_user.conf if you always work with the same group

For advanced sharing options (ACL-based, read-only, less common), see the Advanced Features guide.

Command Overview

The workspace tools consist of several commands:

  • ws_allocate - Create or extend a workspace
  • ws_list - List all your workspaces
  • ws_find - Find the path to a workspace
  • ws_extend - Extend the lifetime of a workspace
  • ws_release - Release (delete) a workspace
  • ws_restore - Restore an expired or released workspace
  • ws_register - Create symbolic links to workspaces

All commands support -h or --help to show detailed usage information.

Using Workspaces in Batch Jobs

Recommended approach: Create your workspace manually before submitting jobs, then reference it in your job scripts using ws_find.

(1) Create workspace once (on login node):

  $ ws_allocate myProject 60

(2) Use in job scripts with ws_find:

#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --time=24:00:00

# Find existing workspace
WORKSPACE=$(ws_find myProject)

# Change to workspace
cd $WORKSPACE

# Your computation here
./my_program --input input.dat --output results.dat

Warning: Avoid using ws_allocate directly in job scripts that run frequently. While ws_allocate is safe to call multiple times on the same workspace name (it returns the existing workspace), you should not create too many workspaces unnecessarily. Create workspaces manually when needed, then use ws_find in your job scripts to locate them.

Advanced Features

For detailed information about advanced workspace features, configuration options, and less frequently used commands, see the separate Workspaces/Advanced_Features guide.

Topics covered in the advanced guide include:

  • Complete command reference with all options
  • Multiple filesystem locations
  • Detailed options for ws_allocate, ws_list, ws_find, ws_extend
  • Email and calendar reminders configuration
  • Group workspaces and cooperative usage
  • Advanced sharing with ws_share (ACL-based, read-only)
  • Setting permissions (ACLs and Unix permissions)
  • Deleting and restoring workspaces in detail
  • Cluster-specific limits and quotas
  • Checking workspace quotas
  • Registering workspace links

Best Practices and Recommendations

For All Users

  1. Set up ~/.ws_user.conf - Configure default reminder timing, duration, and groupname to avoid typing them repeatedly (see example configuration)
  2. Email reminders are automatic - Notifications are sent automatically using your identity provider email; only use -r to customize reminder timing if needed
  3. Custom email only if needed - Only use -m option to override the email address from your identity provider
  4. Use ws_register - Create symbolic links to your workspaces in a convenient directory: ws_register ~/workspaces
  5. Create workspaces manually - Create workspaces on the login node before submitting jobs, then use ws_find in your job scripts
  6. Track your workspaces - Regularly run ws_list -R to see which workspaces will expire soon
  7. Backup important data - Workspaces are temporary and not backed up - copy results to appropriate permanent storage (check your cluster/site policies for backup locations)
  8. Clean up regularly - Release workspaces you no longer need to keep filesystems organized

For Short-term Jobs (hours to days)

  1. Use default or short durations (1-7 days)
  2. Consider using a single workspace for a series of related jobs
  3. Use ws_find in job scripts to locate the workspace

For Long-term Campaigns (weeks to months)

  1. Request maximum allowed duration
  2. Email reminders are sent automatically; optionally customize reminder timing with -r option
  3. Use ws_list -R regularly to monitor remaining time
  4. Plan data archival to appropriate permanent storage before expiration (check cluster/site policies)

For Collaborative Work

  1. Use ws_allocate -G groupname for shared write access (recommended)
  2. Set groupname in ~/.ws_user.conf if you always work with the same group
  3. Use ws_allocate -g for read-only sharing within group
  4. Document the workspace location for your team members
  5. For advanced sharing scenarios, see the Advanced Features guide

For Managing Multiple Filesystems

  1. Note: Most clusters have only one default filesystem - the -F option is rarely needed
  2. Use ws_list -l first to check if multiple filesystems are available on your cluster
  3. Use -F option only if you need specific filesystem for performance or capacity needs (see filesystem options)
  4. bwUniCluster 3.0 filesystems:
    • Default Lustre filesystem: Standard workspace location, best for large files and sequential I/O
    • Flash filesystem (ffuc): SSD-based storage for KIT/HoreKa users, shared between bwUniCluster 3.0 and HoreKa
    • Use flash filesystem for workloads with many small files, random I/O, AI/ML training, or compilation
    • Balance load: use -F ffuc when appropriate to reduce load on default filesystem
  5. General guidelines:
    • Flash-based filesystems (SSD/NVMe): Use for many small files, low-latency requirements, random I/O
    • Standard Lustre/parallel filesystems: Best for large files and sequential I/O patterns

For Different Data Types

  1. Large sequential I/O: Use standard workspace filesystem (Lustre best for very large files, Weka excellent for both large and small)
  2. Many small files or random access: Use flash-based workspace filesystem like Weka (NEMO2) or bwUniCluster ffuc, or stage to $TMPDIR
  3. Data read multiple times on single node: Copy to $TMPDIR at job start for best performance
  4. Temporary data for single node: Always use $TMPDIR, not workspaces
  5. Multi-node temporary data: Use workspaces (not suitable for $TMPDIR)
  6. AI/ML training data: Use Weka (NEMO2) or flash filesystems for best performance, or stage to $TMPDIR for repeated access
  7. Compilation/build directories: Use flash-based filesystems (Weka, ffuc) or $TMPDIR for better performance

For Quota Management

  1. Delete data before releasing if you need immediate quota relief: WSDIR=$(ws_find workspace) && [ -n "$WSDIR" ] && rm -rf "$WSDIR" then ws_release workspace
  2. Use ws_release --delete-data (newer versions) for immediate deletion
  3. Remember: released workspaces may still count toward quota during grace period