NEMO2/Workspaces: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
mNo edit summary
 
(34 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{| style=" background:#FFD28A; width:100%;"
<div style="border: 3px solid #ffc107; padding: 15px; background-color: #fff3cd; margin: 10px 0;">
'''Note:''' This is the updated Workspaces guide for NEMO2. For other clusters please use: [[Workspace]].
| style="padding:8px; background:#FFC05C; font-size:120%; font-weight:bold; text-align:left" | New Workspace Page
</div>
|-

|
'''Workspace tools''' provide temporary storage on NEMO's fast parallel filesystem (Weka).
'''WARNING:''' This is a new Workspaces page, the old safe-to-use page can be found here: [[Workspace]].
They are meant for data that needs to persist longer than a single job, but not permanently.
|}

For advanced features — user config (<tt>~/.ws_user.conf</tt>), reminders, quotas, workspace handover, and more — see [[NEMO2/Workspaces/Advanced_Features|Advanced Features]].

== What are Workspaces? ==


'''Use workspaces for:'''
'''Workspace tools''' provide temporary scratch spaces called '''workspaces''' for your calculations on a central file storage. They are meant to keep data for a limited time – but usually longer than the time of a single job run.
* Jobs generating intermediate data
* Data shared between multiple compute nodes
* Multi-step workflows


'''Don't use workspaces for:'''
== Important ==
* Permanent storage (use HOME or project directories)
* Single-node temporary files (use <tt>$TMPDIR</tt> instead)


== Important - Read First ==
* '''No Backup:''' Data in workspaces is '''not backed up''' and will be '''automatically deleted''' after expiration
* '''Time-limited:''' Every workspace has a limited lifetime (typically 30-100 days depending on cluster, see the [[Workspaces/Advanced_Features#Cluster-Specific_Workspace_Limits|Cluster-Specific Workspace Limits]])
* '''Automatic Email Reminders:''' You will receive email notifications before expiration
* '''Backup Important Data:''' Copy important results to appropriate permanent storage before expiration (location depends on your cluster/site policies)


* No Backup: Data is '''not backed up''' and will be '''automatically deleted''' after expiration
== Quick Start - Most Common Commands ==
* Time-limited: Maximum lifetime is 100 days, up to 100 extensions
* Email Reminders: You receive email notifications before expiration
* Backup Important Data: Copy results to permanent storage before expiration

== Command Overview ==

* <tt>ws_allocate</tt> - Create or extend workspace
* <tt>ws_list</tt> - List your workspaces
* <tt>ws_find</tt> - Find workspace path (for scripts)
* <tt>ws_extend</tt> - Extend workspace lifetime
* <tt>ws_release</tt> - Release (delete) workspace
* <tt>ws_restore</tt> - Restore expired/released workspace
* <tt>ws_register</tt> - Create symbolic links

All commands support <tt>-h</tt> for help.

== Quick Start ==


{| class="wikitable"
{| class="wikitable"
Line 22: Line 45:
!style="width:60%" | Command
!style="width:60%" | Command
|-
|-
|Create workspace for 30 days
|Create workspace (100 days)
|<tt>ws_allocate myWs 30</tt>
|<tt>ws_allocate myWs 100</tt>
|-
|-
|Create group-writable workspace
|Create group workspace
|<tt>ws_allocate -G groupname myWs 30</tt>
|<tt>ws_allocate -G groupname myWs 100</tt>
|-
|-
|List all your workspaces
|List all workspaces
|<tt>ws_list</tt>
|<tt>ws_list</tt>
|-
|-
|See what expires soon
|Find workspace path (for scripts)
|<tt>ws_list -Rr</tt>
|-
|Find path (for scripts)
|<tt>ws_find myWs</tt>
|<tt>ws_find myWs</tt>
|-
|-
|Extend by 100 days
|Check which expire soon
|<tt>ws_list -R</tt>
|<tt>ws_extend myWs 100</tt>
|-
|-
|Extend workspace by 30 days
|Delete workspace (permanent, next nightly run)
|<tt>ws_extend myWs 30</tt>
|-
|Delete/release workspace
|<tt>ws_release myWs</tt>
|<tt>ws_release myWs</tt>
|-
|-
|Restore released workspace
|Restore expired workspace (30d grace)
|<tt>ws_restore -l</tt> then <tt>ws_restore oldname newname</tt>
|<tt>ws_restore -l</tt> then <tt>ws_restore oldname newname</tt>
|}
|}


== Create Workspace ==
== Creating Workspaces ==


To create a workspace you need to specify a '''name''' and '''lifetime''' in days:
Create a workspace with a '''name''' and '''lifetime''' in days:


$ ws_allocate myWs 30
$ ws_allocate myWs 100

This returns:
Workspace created. Duration is 720 hours.
Further extensions available: 3
/work/workspace/scratch/username-myWs-0

'''Important:''' Creating a workspace a second time with the same command is safe - it always returns the same path.

'''Capture the path in a variable:'''

$ WORKSPACE=$(ws_allocate myWs 30)
$ cd $WORKSPACE

'''For all options and advanced usage,''' see the [[Workspaces/Advanced_Features#Detailed_ws_allocate_Options|Advanced Features guide]].

== List Your Workspaces ==

To see all your workspaces:

$ ws_list

Shows:
* Workspace ID
* Workspace location
* Available extensions
* Creation date and remaining time

'''Useful options:'''
* <tt>ws_list -R</tt> - Sort by remaining time (see what expires soon)
* <tt>ws_list -s</tt> - Short format (only names, good for scripts)

== Find Workspace Path ==

Get the path to a workspace for use in scripts:

$ ws_find myWs


Returns:
Returns:


/work/workspace/scratch/username-myWs-0
/work/classic/$USER-myWs


'''In scripts:'''
'''Capture path in variable:'''


$ cd $(ws_find myWs)
$ WORKSPACE=$(ws_allocate myWs 100)
$ cd "$WORKSPACE"
$ WORKSPACE=$(ws_find myWs)


'''Important:''' Running the same command again is safe - returns the existing workspace path.
== Extend Workspace Lifetime ==


== Listing Workspaces ==
Extend a workspace before it expires:


$ ws_extend myWs 30 # Extend by 30 days from now
$ ws_list # List all workspaces
$ ws_list -Rr # Sort by remaining time, soonest first
$ ws_list -g # Show group workspaces


== Extending Workspaces ==
Or use:


$ ws_allocate -x myWs 30 # Alternative command
$ ws_extend myWs 100 # Extend by 100 days from now


'''Alternative:''' <tt>ws_allocate -x myWs 100</tt>
'''Note:''' Each extension consumes one of your available extensions (see the [[Workspaces/Advanced_Features#Cluster-Specific_Workspace_Limits|Cluster-Specific Workspace Limits]]).


Each extension consumes one of your available extensions (100 total).
== Release (Delete) Workspace ==


== Releasing Workspaces ==
When you no longer need a workspace:


$ ws_release myWs
$ ws_release myWs


The workspace becomes inaccessible immediately and is permanently deleted at the next nightly expirer run. '''Do not rely on recovering a released workspace.'''
'''What happens:'''
* Workspace becomes inaccessible
* Data is kept for a grace period (can be restored, see below)
* Real deletion happens later (typically during nighttime)


== Restoring Workspaces ==
{| class="wikitable"
|-
!style="width:40%" | Works on cluster
!style="width:10%" | bwUC 3.0
!style="width:10%" | BinAC2
!style="width:10%" | Helix
!style="width:10%" | JUSTUS 2
!style="width:10%" | NEMO2
|-
|<tt>ws_release --delete-data</tt> (immediate deletion)
|style="background-color:#90EE90; text-align:center;" | ✓
| style="text-align:center;" |
| style="text-align:center;" |
| style="text-align:center;" |
|style="background-color:#90EE90; text-align:center;" | ✓
|}


Recover workspaces that '''expired naturally''' (reached end of lifetime) within the 30-day grace period:
'''To free quota immediately:'''


$ ws_restore -l # (1) List restorable workspaces
$ ws_release --delete-data myWs # Immediate deletion (WARNING: cannot be recovered!)
$ ws_allocate restored 100 # (2) Create target workspace
$ ws_restore username-myWs-0 restored # (3) Restore


'''Important:''' Use the '''full name''' from <tt>ws_restore -l</tt> (with username and timestamp), not the short name.
Or with older workspace tools:
Released workspaces (via <tt>ws_release</tt>) can also be restored, but only until the next nightly expirer run — after that they are permanently deleted.


== Sharing Workspaces ==
$ WSDIR=$(ws_find myWs) && [ -n "$WSDIR" ] && rm -rf "$WSDIR" # Delete data first (with safety check)
$ ws_release myWs # Then release


=== Group workspace (recommended) ===
== Restore Workspace ==


$ ws_allocate -g myWs 100 # Group-readable (read-only for group)
{| class="wikitable"
$ ws_allocate -G projectgroup myWs 100 # Group-writable (recommended for teams)
|-
!style="width:40%" | Works on cluster
!style="width:10%" | bwUC 3.0
!style="width:10%" | BinAC2
!style="width:10%" | Helix
!style="width:10%" | JUSTUS 2
!style="width:10%" | NEMO2
|-
|<tt>ws_restore</tt>
|style="background-color:#90EE90; text-align:center;" | ✓
|style="background-color:#90EE90; text-align:center;" | ✓
|style="background-color:#90EE90; text-align:center;" | ✓
|style="background-color:#90EE90; text-align:center;" | ✓
|style="background-color:#90EE90; text-align:center;" | ✓
|}


Anyone in the group can use <tt>ws_list -g</tt> to see the workspace and extend it with <tt>ws_allocate -x -u owner myWs 100</tt>.
If you released a workspace by accident or need to recover an expired one, you can restore it within a grace period:
Using <tt>-G</tt> also enables smooth handover when team members leave — see [[NEMO2/Workspaces/Advanced_Features#Workspace_Handover|Workspace Handover]].


'''Set default group in <tt>~/.ws_user.conf</tt>:'''
'''(1) List restorable workspaces:'''

$ ws_restore -l

'''(2) Create a new target workspace:'''

$ ws_allocate restored 60

'''(3) Restore the expired workspace:'''

$ ws_restore username-myWs-0 restored

'''Note:''' Use the '''full name''' from <tt>ws_restore -l</tt> (including username and timestamp), not the short name from <tt>ws_list</tt>.

'''For detailed restore options,''' see the [[Workspaces/Advanced_Features#Restore_an_Expired_Workspace|Advanced Features guide]].

== Share Workspace ==

{| class="wikitable"
|-
!style="width:40%" | Works on cluster
!style="width:10%" | bwUC 3.0
!style="width:10%" | BinAC2
!style="width:10%" | Helix
!style="width:10%" | JUSTUS 2
!style="width:10%" | NEMO2
|-
|<tt>-g</tt> option (group-readable)
| style="text-align:center;" |
| style="text-align:center;" |
| style="text-align:center;" |
| style="text-align:center;" |
|style="background-color:#90EE90; text-align:center;" | ✓
|-
|<tt>-G</tt> option (group-writable)
| style="text-align:center;" |
| style="text-align:center;" |
| style="text-align:center;" |
| style="text-align:center;" |
|style="background-color:#90EE90; text-align:center;" | ✓
|}

You can share workspaces with team members:

'''Important:''' Not all sharing options are available on all clusters. ACL-based methods like <tt>ws_share</tt> require filesystem support and may not work everywhere. If one method doesn't work, try an alternative approach.

'''Group-readable workspace''' (read-only for group):

$ ws_allocate -g myWs 30

'''Group-writable workspace''' (read-write for group, recommended):

$ ws_allocate -G projectgroup myWs 30

'''Important:''' Group members can extend group-writable workspaces (created with <tt>-G</tt>) even if the original creator is absent using <tt>ws_allocate -x -u <username> <workspace_id> <days></tt>. This is useful when the workspace owner is unavailable and the workspace needs to be extended.

'''Recommended approach:'''
* Use <tt>-g</tt> or <tt>-G</tt> flags during workspace creation
* For read-only sharing: use <tt>-g</tt>
* For collaborative work (read-write): use <tt>-G groupname</tt>
* Set <tt>groupname</tt> in <tt>~/.ws_user.conf</tt> if you always work with the same group

'''For advanced sharing options''' (ACL-based, read-only, less common), see the [[Workspaces/Advanced_Features#Cooperative_Usage_.28Group_Workspaces_and_Sharing.29|Advanced Features guide]].

== Command Overview ==

The workspace tools consist of several commands:

* <tt>ws_allocate</tt> - Create or extend a workspace
* <tt>ws_list</tt> - List all your workspaces
* <tt>ws_find</tt> - Find the path to a workspace
* <tt>ws_extend</tt> - Extend the lifetime of a workspace
* <tt>ws_release</tt> - Release (delete) a workspace
* <tt>ws_restore</tt> - Restore an expired or released workspace
* <tt>ws_register</tt> - Create symbolic links to workspaces

All commands support <tt>-h</tt> or <tt>--help</tt> to show detailed usage information.

== Using Workspaces in Batch Jobs ==

'''Recommended approach:''' Create your workspace manually before submitting jobs, then reference it in your job scripts using <tt>ws_find</tt>.

'''(1) Create workspace once (on login node):'''

$ ws_allocate myProject 60

'''(2) Use in job scripts with ws_find:'''


<pre>
<pre>
groupname: projectgroup
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --time=24:00:00

# Find existing workspace
WORKSPACE=$(ws_find myProject)

# Change to workspace
cd $WORKSPACE

# Your computation here
./my_program --input input.dat --output results.dat
</pre>
</pre>


=== Share after creation ===
'''Warning:''' Avoid using <tt>ws_allocate</tt> directly in job scripts that run frequently. While <tt>ws_allocate</tt> is safe to call multiple times on the same workspace name (it returns the existing workspace), you should not create too many workspaces unnecessarily. Create workspaces manually when needed, then use <tt>ws_find</tt> in your job scripts to locate them.

== Advanced Features ==

For detailed information about advanced workspace features, configuration options, and less frequently used commands, see the separate '''[[Workspaces/Advanced_Features]] guide'''.

Topics covered in the advanced guide include:
* Complete command reference with all options
* Multiple filesystem locations
* Detailed options for ws_allocate, ws_list, ws_find, ws_extend
* Email and calendar reminders configuration
* Group workspaces and cooperative usage
* Advanced sharing with ws_share (ACL-based, read-only)
* Setting permissions (ACLs and Unix permissions)
* Deleting and restoring workspaces in detail
* Cluster-specific limits and quotas
* Checking workspace quotas
* Registering workspace links

== Best Practices and Recommendations ==

=== For All Users ===

# '''Set up ~/.ws_user.conf''' - Configure default reminder timing, duration, and groupname to avoid typing them repeatedly (see [[Workspaces/Advanced_Features#Example_.7E.2F.ws_user.conf_Configuration|example configuration]])
# '''Email reminders are automatic''' - Notifications are sent automatically using your identity provider email; only use <tt>-r</tt> to customize reminder timing if needed
# '''Custom email only if needed''' - Only use <tt>-m</tt> option to override the email address from your identity provider
# '''Use ws_register''' - Create symbolic links to your workspaces in a convenient directory: <tt>ws_register ~/workspaces</tt>
# '''Create workspaces manually''' - Create workspaces on the login node before submitting jobs, then use <tt>ws_find</tt> in your job scripts
# '''Track your workspaces''' - Regularly run <tt>ws_list -R</tt> to see which workspaces will expire soon
# '''Backup important data''' - Workspaces are temporary and not backed up - copy results to appropriate permanent storage (check your cluster/site policies for backup locations)
# '''Clean up regularly''' - Release workspaces you no longer need to keep filesystems organized

=== For Short-term Jobs (hours to days) ===

# Use default or short durations (1-7 days)
# Consider using a single workspace for a series of related jobs
# Use <tt>ws_find</tt> in job scripts to locate the workspace

=== For Long-term Campaigns (weeks to months) ===

# Request maximum allowed duration
# Email reminders are sent automatically; optionally customize reminder timing with <tt>-r</tt> option
# Use <tt>ws_list -R</tt> regularly to monitor remaining time
# Plan data archival to appropriate permanent storage before expiration (check cluster/site policies)

=== For Collaborative Work ===

# Use <tt>ws_allocate -G groupname</tt> for shared write access (recommended)
# Set <tt>groupname</tt> in <tt>~/.ws_user.conf</tt> if you always work with the same group
# Use <tt>ws_allocate -g</tt> for read-only sharing within group
# Document the workspace location for your team members
# For advanced sharing scenarios, see the [[Workspaces/Advanced_Features#Cooperative_Usage_.28Group_Workspaces_and_Sharing.29|Advanced Features guide]]

=== For Managing Multiple Filesystems ===

# '''Note:''' Most clusters have only one default filesystem - the <tt>-F</tt> option is rarely needed
# Use <tt>ws_list -l</tt> first to check if multiple filesystems are available on your cluster
# Use <tt>-F</tt> option only if you need specific filesystem for performance or capacity needs (see [[Workspaces/Advanced_Features#Multiple_Filesystem_Locations|filesystem options]])
# '''bwUniCluster 3.0 filesystems:'''
#* '''Default Lustre filesystem:''' Standard workspace location, best for large files and sequential I/O
#* '''Flash filesystem (ffuc):''' SSD-based storage for KIT/HoreKa users, shared between bwUniCluster 3.0 and HoreKa
#* Use flash filesystem for workloads with many small files, random I/O, AI/ML training, or compilation
#* Balance load: use <tt>-F ffuc</tt> when appropriate to reduce load on default filesystem
# '''General guidelines:'''
#* Flash-based filesystems (SSD/NVMe): Use for many small files, low-latency requirements, random I/O
#* Standard Lustre/parallel filesystems: Best for large files and sequential I/O patterns

=== For Different Data Types ===


If you didn't use <tt>-g</tt>/<tt>-G</tt> at creation, share read-only with <tt>ws_share</tt>:
# '''Large sequential I/O:''' Use standard workspace filesystem (Lustre best for very large files, Weka excellent for both large and small)
# '''Many small files or random access:''' Use flash-based workspace filesystem like Weka (NEMO2) or bwUniCluster ffuc, or stage to <tt>$TMPDIR</tt>
# '''Data read multiple times on single node:''' Copy to <tt>$TMPDIR</tt> at job start for best performance
# '''Temporary data for single node:''' Always use <tt>$TMPDIR</tt>, not workspaces
# '''Multi-node temporary data:''' Use workspaces (not suitable for <tt>$TMPDIR</tt>)
# '''AI/ML training data:''' Use Weka (NEMO2) or flash filesystems for best performance, or stage to <tt>$TMPDIR</tt> for repeated access
# '''Compilation/build directories:''' Use flash-based filesystems (Weka, ffuc) or <tt>$TMPDIR</tt> for better performance


$ ws_share share myWs alice bob # Grant read access
=== For Quota Management ===
$ ws_share list myWs # Show who has access
$ ws_share unshare myWs alice # Remove access


'''Advanced sharing:''' [[NEMO2/Workspaces/Advanced_Features#Sharing|Sharing guide]] for ACL-based per-user permissions.
# Delete data before releasing if you need immediate quota relief: <tt>WSDIR=$(ws_find workspace) && [ -n "$WSDIR" ] && rm -rf "$WSDIR"</tt> then <tt>ws_release workspace</tt>
# Use <tt>ws_release --delete-data</tt> (newer versions) for immediate deletion
# Remember: released workspaces may still count toward quota during grace period

Latest revision as of 17:37, 12 May 2026

Note: This is the updated Workspaces guide for NEMO2. For other clusters please use: Workspace.

Workspace tools provide temporary storage on NEMO's fast parallel filesystem (Weka). They are meant for data that needs to persist longer than a single job, but not permanently.

For advanced features — user config (~/.ws_user.conf), reminders, quotas, workspace handover, and more — see Advanced Features.

What are Workspaces?

Use workspaces for:

  • Jobs generating intermediate data
  • Data shared between multiple compute nodes
  • Multi-step workflows

Don't use workspaces for:

  • Permanent storage (use HOME or project directories)
  • Single-node temporary files (use $TMPDIR instead)

Important - Read First

  • No Backup: Data is not backed up and will be automatically deleted after expiration
  • Time-limited: Maximum lifetime is 100 days, up to 100 extensions
  • Email Reminders: You receive email notifications before expiration
  • Backup Important Data: Copy results to permanent storage before expiration

Command Overview

  • ws_allocate - Create or extend workspace
  • ws_list - List your workspaces
  • ws_find - Find workspace path (for scripts)
  • ws_extend - Extend workspace lifetime
  • ws_release - Release (delete) workspace
  • ws_restore - Restore expired/released workspace
  • ws_register - Create symbolic links

All commands support -h for help.

Quick Start

Task Command
Create workspace (100 days) ws_allocate myWs 100
Create group workspace ws_allocate -G groupname myWs 100
List all workspaces ws_list
See what expires soon ws_list -Rr
Find path (for scripts) ws_find myWs
Extend by 100 days ws_extend myWs 100
Delete workspace (permanent, next nightly run) ws_release myWs
Restore expired workspace (30d grace) ws_restore -l then ws_restore oldname newname

Creating Workspaces

Create a workspace with a name and lifetime in days:

  $ ws_allocate myWs 100

Returns:

  /work/classic/$USER-myWs

Capture path in variable:

  $ WORKSPACE=$(ws_allocate myWs 100)
  $ cd "$WORKSPACE"

Important: Running the same command again is safe - returns the existing workspace path.

Listing Workspaces

  $ ws_list                                # List all workspaces
  $ ws_list -Rr                            # Sort by remaining time, soonest first
  $ ws_list -g                             # Show group workspaces

Extending Workspaces

  $ ws_extend myWs 100                      # Extend by 100 days from now

Alternative: ws_allocate -x myWs 100

Each extension consumes one of your available extensions (100 total).

Releasing Workspaces

  $ ws_release myWs

The workspace becomes inaccessible immediately and is permanently deleted at the next nightly expirer run. Do not rely on recovering a released workspace.

Restoring Workspaces

Recover workspaces that expired naturally (reached end of lifetime) within the 30-day grace period:

  $ ws_restore -l                          # (1) List restorable workspaces
  $ ws_allocate restored 100               # (2) Create target workspace
  $ ws_restore username-myWs-0 restored    # (3) Restore

Important: Use the full name from ws_restore -l (with username and timestamp), not the short name. Released workspaces (via ws_release) can also be restored, but only until the next nightly expirer run — after that they are permanently deleted.

Sharing Workspaces

Group workspace (recommended)

  $ ws_allocate -g myWs 100                # Group-readable (read-only for group)
  $ ws_allocate -G projectgroup myWs 100   # Group-writable (recommended for teams)

Anyone in the group can use ws_list -g to see the workspace and extend it with ws_allocate -x -u owner myWs 100. Using -G also enables smooth handover when team members leave — see Workspace Handover.

Set default group in ~/.ws_user.conf:

groupname: projectgroup

Share after creation

If you didn't use -g/-G at creation, share read-only with ws_share:

  $ ws_share share myWs alice bob          # Grant read access
  $ ws_share list myWs                     # Show who has access
  $ ws_share unshare myWs alice            # Remove access

Advanced sharing: Sharing guide for ACL-based per-user permissions.