Workspaces/Advanced Features/Best Practices
Jump to navigation
Jump to search
Best Practices and Recommendations
For All Users
- Set up ~/.ws_user.conf - Configure default reminder timing, duration, and groupname to avoid typing them repeatedly (see Reminders & Configuration)
- Email reminders are automatic - Notifications are sent automatically using your identity provider email; only use -r to customize reminder timing if needed
- Custom email only if needed - Only use -m option to override the email address from your identity provider
- Use ws_register - Create symbolic links to your workspaces in a convenient directory (see ws_register)
- Create workspaces manually - Create workspaces on the login node before submitting jobs, then use ws_find in your job scripts (see Using Workspaces in Batch Jobs)
- Track your workspaces - Regularly run ws_list -R to see which workspaces will expire soon
- Backup important data - Workspaces are temporary and not backed up - copy results to appropriate permanent storage (check your cluster/site policies for backup locations)
- Clean up regularly - Release workspaces you no longer need to keep filesystems organized
For Short-term Jobs (hours to days)
- Use default or shorter durations
- Consider using a single workspace for a series of related jobs
- Use ws_find in job scripts to locate the workspace (see Using Workspaces in Batch Jobs)
- Copy results to permanent storage when jobs complete
- Release workspace when no longer needed (see Release Workspace)
For Long-term Campaigns (weeks to months)
- Request maximum allowed duration (see Cluster-Specific Workspace Limits)
- Email reminders are sent automatically; optionally customize reminder timing with -r option
- Use ws_list -R regularly to monitor remaining time (see List Your Workspaces)
- Plan data archival to appropriate permanent storage before expiration (check cluster/site policies)
For Collaborative Work
- Use ws_allocate -G groupname for shared write access (see Create Group Workspace)
- Set groupname in ~/.ws_user.conf if you always work with the same group (see Reminders & Configuration)
- Use ws_allocate -g for read-only sharing within group
- Use ws_list -g to see all group workspaces (see List Group Workspaces)
- Team members can extend group workspaces (see Extend Group Workspace)
- Take over reminder responsibility when colleague is unavailable (see Manage Reminders)
- Document the workspace location for your team members
- For advanced sharing scenarios (ACL-based, ws_share), see the Sharing guide
For Managing Multiple Filesystems
- Note: Most clusters have only one default filesystem - the -F option is rarely needed
- Use ws_list -l first to check if multiple filesystems are available on your cluster
- Use -F option only if you need specific filesystem for performance or capacity needs (see Filesystems guide)
- bwUniCluster 3.0 filesystems:
- Default Lustre filesystem: Standard workspace location, best for large files and sequential I/O
- Flash filesystem (ffuc): SSD-based storage for KIT/HoreKa users, shared between bwUniCluster 3.0 and HoreKa
- Use flash filesystem for workloads with many small files, random I/O, AI/ML training, or compilation
- Balance load: use -F ffuc when appropriate to reduce load on default filesystem
- General guidelines:
- Flash-based filesystems (SSD/NVMe): Use for many small files, low-latency requirements, random I/O
- Standard Lustre/parallel filesystems: Best for large files and sequential I/O patterns
For Different Data Types
- Large sequential I/O: Use standard workspace filesystem (Lustre best for very large files, Weka excellent for both large and small)
- Many small files or random access: Use flash-based workspace filesystem like Weka (NEMO2) or bwUniCluster ffuc, or stage to $TMPDIR
- Data read multiple times on single node: Copy to $TMPDIR at job start for best performance
- Temporary data for single node: Always use $TMPDIR, not workspaces
- Multi-node temporary data: Use workspaces (not suitable for $TMPDIR)
- AI/ML training data: Use Weka (NEMO2) or flash filesystems for best performance, or stage to $TMPDIR for repeated access
- Compilation/build directories: Use flash-based filesystems (Weka, ffuc) or $TMPDIR for better performance
For Quota Management
- Use ws_release --delete-data for immediate deletion (see Immediate Deletion)
- For clusters without --delete-data option, use manual deletion method
- Remember: released workspaces may still count toward quota during grace period (~1 hour)