Index | index by Group | index by Distribution | index by Vendor | index by creation date | index by Name | Mirrors | Help | Search |
Name: slurm-seff | Distribution: SUSE Linux Enterprise 15 SP3 |
Version: 20.11.5 | Vendor: openSUSE |
Release: bp153.2.1 | Build date: Thu May 6 09:34:22 2021 |
Group: Productivity/Clustering/Computing | Build host: obs-power8-03 |
Size: 9042 | Source RPM: slurm-20.11.5-bp153.2.1.src.rpm |
Packager: https://bugs.opensuse.org | |
Url: https://www.schedmd.com | |
Summary: Mail tool that includes job statistics in user notification email |
Mail program used directly by the SLURM daemons. On completion of a job, it waits for accounting information to be available and includes that information in the email body.
SUSE-GPL-2.0-with-openssl-exception
* Mon May 03 2021 Egbert Eich <eich@suse.com> - Ship REST API version and auth plugins with slurmrestd. - Add YAML support for REST API to build (bsc#1185603). * Wed Mar 17 2021 Christian Goll <cgoll@suse.com> - Udpate to 20.11.5: - New features: * New job_container/tmpfs plugin developed by NERSC that can be used to create per-job filesystem namespaces. Documentaiion and configuration can be found in the respecting man page. - Bug fixes: * Fix main scheduler bug where bf_hetjob_prio truncates SchedulerParameters. * Fix sacct not displaying UserCPU, SystemCPU and TotalCPU for large times. * scrontab - fix to return the correct index for a bad #SCRON option. * scrontab - fix memory leak when invalid option found in #SCRON line. * Add errno for when a user requests multiple partitions and they are using partition based associations. * Fix issue where a job could run in a wrong partition when using EnforcePartLimits=any and partition based associations. * Remove possible deadlock when adding associations/wckeys in multiple threads. * When using PrologFlags=alloc make sure the correct Slurm version is set in the credential. * When sending a job a warning signal make sure we always send SIGCONT beforehand. * Fix issue where a batch job would continue running if a prolog failed on a node that wasn't the batch host and requeuing was disabled. * Fix issue where sometimes salloc/srun wouldn't get a message about a prolog failure in the job's stdout. * Requeue or kill job on a prolog failure when PrologFlags is not set. * Fix race condition causing node reboots to get requeued before ResumeTimeout expires. * Preserve node boot_req_time on reconfigure. * Preserve node power_save_req_time on reconfigure. * Fix node reboots being queued and issued multiple times and preventing the reboot to time out. * Fix run_command to exit correctly if track_script kills the calling thread. * Only requeue a job when the PrologSlurmctld returns nonzero. * When a job is signaled with SIGKILL make sure we flush all prologs/setup scripts. * Handle burst buffer scripts if the job is canceled while stage_in is happening. * When shutting down the slurmctld make note to ignore error message when we have to kill a prolog/setup script we are tracking. * scrontab - add support for the --open-mode option. * acct_gather_profile/influxdb - avoid segfault on plugin shutdown if setup has not completed successfully. * Reduce delay in starting salloc allocations when running with prologs. * Alter AllocNodes check to work if the allocating node's domain doesn't match the slurmctld's. This restores the pre*20.11 behavior. * Fix slurmctld segfault if jobs from a prior version had the now-removed INVALID_DEPEND state flag set and were allowed to run in 20.11. * Add job_container/tmpfs plugin to give a method to provide a private /tmp per job. * Set the correct core affinity when using AutoDetect. * slurmrestd - mark "environment" as required for job submissions in schema. * Tue Feb 23 2021 Christian Goll <cgoll@suse.com> - Udpate to 20.11.04 * Fix node selection for advanced reservations with features. * mpi/pmix: Handle pipe failure better when using ucx. * mpi/pmix: include PMIX_NODEID for each process entry. * Fix job getting rejected after being requeued on same node that died. * job_submit/lua - add "network" field. * Fix situations when a reoccuring reservation could erroneously skip a period. * Ensure that a reservations [pro|epi]log are ran on reoccuring reservations. * Fix threads-per-core memory allocation issue when using CR_CPU_MEMORY. * Fix scheduling issue with --gpus. * Fix gpu allocations that request --cpus-per-task. * mpi/pmix: fixed print messages for all PMIXP_* macros * Add mapping for XCPU to --signal option. * Fix regression in 20.11 that prevented a full pass of the main scheduler from ever executing. * Work around a glibc bug in which "0" is incorrectly printed as "nan" which will result in corrupted association state on restart. * Fix regression in 20.11 which made slurmd incorrectly attempt to find the parent slurmd address when not applicable and send incorrect reverse*tree info to the slurmstepd. * Fix cgroup ns detection when using containers (e.g. LXC or Docker). * scrontab - change temporary file handling to work with emacs. - Removed check-for-lipmix.so.MAJOR.patch - Added: load-pmix-major-version.patch * Wed Jan 20 2021 Ana Guerrero Lopez <aguerrero@suse.com> - Update to 20.11.03 - This release includes a major functional change to how job step launch is handled compared to the previous 20.11 releases. This affects srun as well as MPI stacks - such as Open MPI - which may use srun internally as part of the process launch. One of the changes made in the Slurm 20.11 release was to the semantics for job steps launched through the 'srun' command. This also inadvertently impacts many MPI releases that use srun underneath their own mpiexec/mpirun command. For 20.11.{0,1,2} releases, the default behavior for srun was changed such that each step was allocated exactly what was requested by the options given to srun, and did not have access to all resources assigned to the job on the node by default. This change was equivalent to Slurm setting the --exclusive option by default on all job steps. Job steps desiring all resources on the node needed to explicitly request them through the new '--whole' option. In the 20.11.3 release, we have reverted to the 20.02 and older behavior of assigning all resources on a node to the job step by default. This reversion is a major behavioral change which we would not generally do on a maintenance release, but is being done in the interest of restoring compatibility with the large number of existing Open MPI (and other MPI flavors) and job scripts that exist in production, and to remove what has proven to be a significant hurdle in moving to the new release. Please note that one change to step launch remains - by default, in 20.11 steps are no longer permitted to overlap on the resources they have been assigned. If that behavior is desired, all steps must explicitly opt-in through the newly added '--overlap' option. Further details and a full explanation of the issue can be found at: https://bugs.schedmd.com/show_bug.cgi?id=10383#c63 - Other changes from 20.11.03 * Fix segfault when parsing bad "#SBATCH hetjob" directive. * Allow countless gpu:<typenode GRES specifications in slurm.conf. * PMIx - Don't set UCX_MEM_MMAP_RELOC for older version of UCX (pre 1.5). * Don't green-light any GPU validation when core conversion fails. * Allow updates to a reservation in the database that starts in the future. * Better check/handling of primary key collision in reservation table. * Improve reported error and logging in _build_node_list(). * Fix uninitialized variable in _rpc_file_bcast() which could lead to an incorrect error return from sbcast / srun --bcast. * mpi/cray_shasta - fix use-after-free on error in _multi_prog_parse(). * Cray - Handle setting correct prefix for cpuset cgroup with respects to expected_usage_in_bytes. This fixes Cray's OOM killer. * mpi/pmix: Fix PMIx_Abort support. * Don't reject jobs allocating more cores than tasks with MaxMemPerCPU. * Fix false error message complaining about oversubscribe in cons_tres. * scrontab - fix parsing of empty lines. * Fix regression causing spank_process_option errors to be ignored. * Avoid making multiple interactive steps. * Fix corner case issues where step creation should fail. * Fix job rejection when --gres is less than --gpus. * Fix regression causing spank prolog/epilog not to be called unless the spank plugin was loaded in slurmd context. * Fix regression preventing SLURM_HINT=nomultithread from being used to set defaults for salloc->srun, sbatch->srun sequence. * Reject job credential if non-superuser sets the LAUNCH_NO_ALLOC flag. * Make it so srun --no-allocate works again. * jobacct_gather/linux - Don't count memory on tasks that have already finished. * Fix 19.05/20.02 batch steps talking with a 20.11 slurmctld. * jobacct_gather/common - Do not process jobacct's with same taskid when calling prec_extra. * Cleanup all tracked jobacct tasks when extern step child process finishes. * slurmrestd/dbv0.0.36 - Correct structure of dbv0.0.36_tres_list. * Fix regression causing task/affinity and task/cgroup to be out of sync when configured ThreadsPerCore is different than the physical threads per core. * Fix situation when --gpus is given but not max nodes (-N1-1) in a job allocation. * Interactive step - ignore cpu bind and mem bind options, and do not set the associated environment variables which lead to unexpected behavior from srun commands launched within the interactive step. * Handle exit code from pipe when using UCX with PMIx. * Fri Jan 08 2021 Egbert Eich <eich@suse.com> - Fix fallout introduced by: "Replace '%service_del_postun -n' with '%service_del_postun_without_restart'" for older Leap/SLE versions. * Fri Jan 08 2021 Egbert Eich <eich@suse.com> - Fix Provides:/Conflicts: for libnss_slurm. * Tue Jan 05 2021 Ana Guerrero Lopez <aguerrero@suse.com> - Add support for configuration files from external plugins. While built-in plugins have their configuration added in slurm.conf, external SPANK plugins add their configuration to plugstack.conf To allow packaging easily spank plugins, their configuration files should be added independently at /etc/spack/plugstack.conf.d and plugstack.conf should be left with an oneliner including all the files under /etc/spack/plugstack.conf.d * Mon Dec 28 2020 Ana Guerrero Lopez <aguerrero@suse.com> - Update to 20.11.02 * Fix older versions of sacct not working with 20.11. * Fix slurmctld crash when using a pre-20.11 srun in a job allocation. * Correct logic problem in _validate_user_access. * Fix libpmi to initialize Slurm configuration correctly. - Update to 20.11.01 * Fix spelling of "overcomited" to "overcomitted" in sreport's cluster utilization report. * Silence debug message about shutting down backup controllers if none are configured. * Don't create interactive srun until PrologSlurmctld is done. * Fix fd symlink path resolution. * Fix slurmctld segfault on subnode reservation restore after node configuration change. * Fix resource allocation response message environment allocation size. * Ensure that details->env_sup is NULL terminated. * select/cray_aries - Correctly remove jobs/steps from blades using NPC. * cons_tres - Avoid max_node_gres when entire node is allocated with - -ntasks-per-gpu. * Allow NULL arg to data_get_type(). * In sreport have usage for a reservation contain all jobs that ran in the reservation instead of just the ones that ran in the time specified. This matches the report for the reservation is not truncated for a time period. * Fix issue with sending wrong batch step id to a < 20.11 slurmd. * Add a job's alloc_node to lua for job modification and completion. * Fix regression getting a slurmdbd connection through the perl API. * Stop the extern step terminate monitor right after proctrack_g_wait(). * Fix removing the normalized priority of assocs. * slurmrestd/v0.0.36 - Use correct name for partition field: "min nodes per job" -"min_nodes_per_job". * slurmrestd/v0.0.36 - Add node comment field. * Fix regression marking cloud nodes as "unexpectedly rebooted" after multiple boots. * Fix slurmctld segfault in _slurm_rpc_job_step_create(). * slurmrestd/v0.0.36 - Filter node states against NODE_STATE_BASE to avoid the extended states all being reported as "invalid". * Fix race that can prevent the prolog for a requeued job from running. * cli_filter - add "type" to readily distinguish between the CLI command in use. * smail - reduce sleep before seff to 5 seconds. * Ensure SPANK prolog and epilog run without an explicit PlugStackConfig. * Disable MySQL automatic reconnection. * Fix allowing "b" after memory unit suffixes. * Fix slurmctld segfault with reservations without licenses. * Due to internal restructuring ahead of the 20.11 release, applications calling libslurm MUST call slurm_init(NULL) before any API calls. Otherwise the API call is likely to fail due to libslurm's internal configuration not being available. * slurm.spec - allow custom paths for PMIx and UCX install locations. * Use rpath if enabled when testing for Mellanox's UCX libraries. * slurmrestd/dbv0.0.36 - Change user query for associations to optional. * slurmrestd/dbv0.0.36 - Change account query for associations to optional. * mpi/pmix - change the error handler error message to be more useful. * Add missing connection in acct_storage_p_{clear_stats, reconfig, shutdown}. * Perl API - fix issue when running in configless mode. * nss_slurm - avoid deadlock when stray sockets are found. * Display correct value for ScronParameters in 'scontrol show config' * Mon Nov 30 2020 Egbert Eich <eich@suse.com> - Update to version 20.11.0 Slurm 20.11 includes a number of new features including: * Overhaul of the job step management and launch code, alongside improved GPU task placement support. * A new "Interactive Step" mode of operation for salloc. * A new "scrontab" command that can be used to submit and manage periodically repeating jobs. * IPv6 support. * Changes to the reservation logic, with new options allowing users to delete reservations, allowing admins to skip the next occurance of a repeated reservation, and allowing for a job to be submitted and eligible to run within multiple reservations. * Dynamic Future Nodes - automatically associate a dynamically provisioned (or "cloud") node against a NodeName definition with matching hardware. * An experimental new RPC queuing mode for slurmctld to reduce thread contention on heavily loaded clusters. * SlurmDBD integration with the Slurm REST API. Also check https://github.com/SchedMD/slurm/blob/slurm-20-11-0-1/RELEASE_NOTES * Wed Nov 18 2020 Ana Guerrero Lopez <aguerrero@suse.com> - Updated to 20.02.6, addresses two security fixes: * PMIx - fix potential buffer overflows from use of unpackmem(). CVE-2020-27745 (bsc#1178890) * X11 forwarding - fix potential leak of the magic cookie when sent as an argument to the xauth command. CVE-2020-27746 (bsc#1178891) - And many other bugfixes, full log and details available at: * https://lists.schedmd.com/pipermail/slurm-announce/2020/000045.html * Tue Nov 03 2020 Franck Bui <fbui@suse.com> - Replace '%service_del_postun -n' with '%service_del_postun_without_restart' '-n' is deprecated and will be removed in the future. * Thu Oct 29 2020 Ana Guerrero Lopez <aguerrero@suse.com> - Updated to 20.02.5, changes: * Fix leak of TRESRunMins when job time is changed with --time-min * pam_slurm - explicitly initialize slurm config to support configless mode. * scontrol - Fix exit code when creating/updating reservations with wrong Flags. * When a GRES has a no_consume flag, report 0 for allocated. * Fix cgroup cleanup by jobacct_gather/cgroup. * When creating reservations/jobs don't allow counts on a feature unless using an XOR. * Improve number of boards discovery * Fix updating a reservation NodeCnt on a zero-count reservation. * slurmrestd - provide an explicit error messages when PSK auth fails. * cons_tres - fix job requesting single gres per-node getting two or more nodes with less CPUs than requested per-task. * cons_tres - fix calculation of cores when using gres and cpus-per-task. * cons_tres - fix job not getting access to socket without GPU or with less than --gpus-per-socket when not enough cpus available on required socket and not using --gres-flags=enforce binding. * Fix HDF5 type version build error. * Fix creation of CoreCnt only reservations when the first node isn't available. * Fix wrong DBD Agent queue size in sdiag when using accounting_storage/none. * Improve job constraints XOR option logic. * Fix preemption of hetjobs when needed nodes not in leader component. * Fix wrong bit_or() messing potential preemptor jobs node bitmap, causing bad node deallocations and even allocation of nodes from other partitions. * Fix double-deallocation of preempted non-leader hetjob components. * slurmdbd - prevent truncation of the step nodelists over 4095. * Fix nodes remaining in drain state state after rebooting with ASAP option. - changes from 20.02.4: * srun - suppress job step creation warning message when waiting on PrologSlurmctld. * slurmrestd - fix incorrect return values in data_list_for_each() functions. * mpi/pmix - fix issue where HetJobs could fail to launch. * slurmrestd - set content-type header in responses. * Fix cons_res GRES overallocation for --gres-flags=disable-binding. * Fix cons_res incorrectly filtering cores with respect to GRES locality for - -gres-flags=disable-binding requests. * Fix regression where a dependency on multiple jobs in a single array using underscores would only add the first job. * slurmrestd - fix corrupted output due to incorrect use of memcpy(). * slurmrestd - address a number of minor Coverity warnings. * Handle retry failure when slurmstepd is communicating with srun correctly. * Fix jobacct_gather possibly duplicate stats when _is_a_lwp error shows up. * Fix tasks binding to GRES which are closest to the allocated CPUs. * Fix AMD GPU ROCM 3.5 support. * Fix handling of job arrays in sacct when querying specific steps. * slurmrestd - avoid fallback to local socket authentication if JWT authentication is ill-formed. * slurmrestd - restrict ability of requests to use different authentication plugins. * slurmrestd - unlink named unix sockets before closing. * slurmrestd - fix invalid formatting in openapi.json. * Fix batch jobs stuck in CF state on FrontEnd mode. * Add a separate explicit error message when rejecting changes to active node features. * cons_common/job_test - fix slurmctld SIGABRT due to double-free. * Fix updating reservations to set the duration correctly if updating the start time. * Fix update reservation to promiscuous mode. * Fix override of job tasks count to max when ntasks-per-node present. * Fix min CPUs per node not being at least CPUs per task requested. * Fix CPUs allocated to match CPUs requested when requesting GRES and threads per core equal to one. * Fix NodeName config parsing with Boards and without CPUs. * Ensure SLURM_JOB_USER and SLURM_JOB_UID are set in SrunProlog/Epilog. * Fix error messages for certain invalid salloc/sbatch/srun options. * pmi2 - clean up sockets at step termination. * Fix 'scontrol hold' to work with 'JobName'. * sbatch - handle --uid/--gid in #SBATCH directives properly. * Fix race condition in job termination on slurmd. * Print specific error messages if trying to run use certain priority/multifactor factors that cannot work without SlurmDBD. * Avoid partial GRES allocation when --gpus-per-job is not satisfied. * Cray - Avoid referencing a variable outside of it's correct scope when dealing with creating steps within a het job. * slurmrestd - correctly handle larger addresses from accept(). * Avoid freeing wrong pointer with SlurmctldParameters=max_dbd_msg_action with another option after that. * Restore MCS label when suspended job is resumed. * Fix insufficient lock levels. * slurmrestd - use errno from job submission. * Fix "user" filter for sacctmgr show transactions. * Fix preemption logic. * Fix no_consume GRES for exclusive (whole node) requests. * Fix regression in 20.02 that caused an infinite loop in slurmctld when requesting --distribution=plane for the job. * Fix parsing of the --distribution option. * Add CONF READ_LOCK to _handle_fed_send_job_sync. * prep/script - always call slurmctld PrEp callback in _run_script(). * Fix node estimation for jobs that use GPUs or --cpus-per-task. * Fix jobcomp, job_submit and cli_filter Lua implementation plugins causing slurmctld and/or job submission CLI tools segfaults due to bad return handling when the respective Lua script failed to load. * Fix propagation of gpu options through hetjob components. * Add SLURM_CLUSTERS environment variable to scancel. * Fix packing/unpacking of "unlinked" jobs. * Connect slurmstepd's stderr to srun for steps launched with --pty. * Handle MPS correctly when doing exclusive allocations. * slurmrestd - fix compiling against libhttpparser in a non-default path. * slurmrestd - avoid compilation issues with libhttpparser < 2.6. * Fix compile issues when compiling slurmrestd without --enable-debug. * Reset idle time on a reservation that is getting purged. * Fix reoccurring reservations that have Purge_comp= to keep correct duration if they are purged. * scontrol - changed the "PROMISCUOUS" flag to "MAGNETIC" * Early return from epilog_set_env in case of no_consume. * Fix cons_common/job_test start time discovery logic to prevent skewed results between "will run test" executions. * Ensure TRESRunMins limits are maintained during "scontrol reconfigure". * Improve error message when host lookup fails. - Refresh patch: pam_slurm-Initialize-arrays-and-pass-sizes.patch * Tue Jul 07 2020 Egbert Eich <eich@suse.com> - Add support for openPMIx also for Leap/SLE 15.0/1 (bsc#1173805). - Do not run %check on SLE-12-SP2: Some incompatibility in tcl makes this fail. - Remove unneeded build dependency to postgresql-devel. - Disable build on s390 (requires 64bit). * Wed Jun 03 2020 Egbert Eich <eich@suse.com> - Bring QA to the package build: add %%check stage. - Remove cruft that isn't needed any longer. - Add 'ghosted' run-file. - Add rpmlint filter to handle issues with library packages for Leap and enterprise upgrade versions. * Fri May 22 2020 Christian Goll <cgoll@suse.com> - Updated to 20.02.3 which fixes CVE-2020-12693 (bsc#1172004). - Other changes are: * Factor in ntasks-per-core=1 with cons_tres. * Fix formatting in error message in cons_tres. * Fix calling stat on a NULL variable. * Fix minor memory leak when using reservations with flags=first_cores. * Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node. * Fix --mem-per-gpu for heterogenous --gres requests. * Fix slurmctld load order in load_all_part_state(). * Fix race condition not finding jobacct gather task cgroup entry. * Suppress error message when selecting nodes on disjoint topologies. * Improve performance of _pack_default_job_details() with large number of job * arguments. * Fix archive loading previous to 17.11 jobs per-node req_mem. * Fix regresion validating that --gpus-per-socket requires --sockets-per-node * for steps. Should only validate allocation requests. * error() instead of fatal() when parsing an invalid hostlist. * nss_slurm - fix potential deadlock in slurmstepd on overloaded systems. * cons_tres - fix --gres-flags=enforce-binding and related --cpus-per-gres. * cons_tres - Allocate lowest numbered cores when filtering cores with gres. * Fix getting system counts for named GRES/TRES. * MySQL - Fix for handing typed GRES for association rollups. * Fix step allocations when tasks_per_core > 1. * Fix allocating more GRES than requested when asking for multiple GRES types. * Wed May 06 2020 Egbert Eich <eich@suse.com> - Treat libnss_slurm like any other package: add version string to upgrade package. * Fri Mar 27 2020 Christian Goll <cgoll@suse.com> - Updated to 20.02.1 with following changes" * Improve job state reason for jobs hitting partition_job_depth. * Speed up testing of singleton dependencies. * Fix negative loop bound in cons_tres. * srun - capture the MPI plugin return code from mpi_hook_client_fini() and use as final return code for step failure. * Fix segfault in cli_filter/lua. * Fix --gpu-bind=map_gpu reusability if tasks > elements. * Make sure config_flags on a gres are sent to the slurmctld on node registration. * Prolog/Epilog - Fix missing GPU information. * Fix segfault when using config parser for expanded lines. * Fix bit overlap test function. * Don't accrue time if job begin time is in the future. * Remove accrue time when updating a job start/eligible time to the future. * Fix regression in 20.02.0 that broke --depend=expand. * Reset begin time on job release if it's not in the future. * Fix for recovering burst buffers when using high-availability. * Fix invalid read due to freeing an incorrectly allocated env array. * Update slurmctld -i message to warn about losing data. * Fix scontrol cancel_reboot so it clears the DRAIN flag and node reason for a pending ASAP reboot. * Sun Mar 08 2020 Egbert Eich <eich@suse.com> - Remove legacy_cray: with 20.02 the special treatment for cray-specific plugins on SLE version prior to 15SP2 is no longer required. * Wed Mar 04 2020 Christian Goll <cgoll@suse.com> - slurm-plugins will now also require pmix not only libpmix (bsc#1164326) * Fri Feb 28 2020 Egbert Eich <eich@suse.com> - Removed autopatch as it doesn't work for the SLE-11-SP4 build. * Thu Feb 27 2020 Kasimir _ <kasimir_@outlook.de> - Disable %arm builds as this is no longer supported. * Thu Feb 27 2020 Christian Goll <cgoll@suse.com> - pmix searches now also for libpmix.so.2 so that there is no dependency for devel package (bsc#1164386) * added patch file check-for-lipmix.so.MAJOR.patch * reworded patch file Remove-rpath-from-build.patch to use %autopatch * Wed Feb 26 2020 Egbert Eich <eich@suse.com> - Update to version 20.02.0 (jsc#SLE-8491) * Fix minor memory leak in slurmd on reconfig. * Fix invalid ptr reference when rolling up data in the database. * Change shtml2html.py to require python3 for RHEL8 support, and match man2html.py. * slurm.spec - override "hardening" linker flags to ensure RHEL8 builds in a usable manner. * Fix type mismatches in the perl API. * Prevent use of uninitialized slurmctld_diag_stats. * Fixed various Coverity issues. * Only show warning about root-less topology in daemons. * Fix accounting of jobs in IGNORE_JOBS reservations. * Fix issue with batch steps state not loading correctly when upgrading from 19.05. * Deprecate max_depend_depth in SchedulerParameters and move it to DependencyParameters. * Silence erroneous error on slurmctld upgrade when loading federation state. * Break infinite loop in cons_tres dealing with incorrect tasks per tres request resulting in slurmctld hang. * Improve handling of --gpus-per-task to make sure appropriate number of GPUs is assigned to job. * Fix seg fault on cons_res when requesting --spread-job. - Move to python3 for everything but SLE-11-SP4 * For SLE-11-SP4 add a workaround to handle a python3 script (python2.7 compliant). * Wed Feb 19 2020 Egbert Eich <eich@suse.com> - Add explicit version dependency to libpmix as well. 'slurm-devel' has a tight version dependency on libpmix - allowing multiple libpmix versions in one package repository is therefore essential. * Thu Feb 13 2020 Egbert Eich <eich@suse.com> - Update to version 20.02.0-rc1 * sbatch - fix segfault when no newline at the end of a burst buffer file. * Change scancel to only check job's base state when matching -t options. * Save job dependency list in state files. * cons_tres - allow jobs to be run on systems with root-less topologies. * Restore pre-20.02pre1 PrologSlurmctld synchonization behavior to avoid various race conditions, and ensure proper batch job launch. * Add new slurmrestd command/daemon which implements the Slurm REST API. * Tue Feb 11 2020 Christian Goll <cgoll@suse.com> - Update to version 20.02.0-0pre1, highlights are Highlights: * Exclusive behavior of a node includes all GRES on a node as well as the cpus. * Use python3 instead of python for internal build/test scripts. The slurm.spec file has been updated to depend on python3 as well. * Added new NodeSet configuration option to help simplify partition configuration sections for heterogeneous / condo*style clusters. * Added slurm.conf option MaxDBDMsgs to control how many messages will be stored in the slurmctld before throwing them away when the slurmdbd is down. * The checkpoint plugin interface and all associated API calls have been removed. * slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows mail_type to be set to NONE with scontrol. * Add new slurm_spank_log() function to print messages back to the user from within a SPANK plugin without prepending "error: " from slurm_error(). * Enforce having partition name and nodelist=ALL when creating reservations with flags=PART_NODES. * SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This hook has always been accessible through slurm_spank_init() in the S_CTX_SLURMD context instead. * sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic to flow over an alternate network path. * Added auth/jwt plugin, and 'scontrol token' subcommand. PMIx - improve * performance of proc map generation. Deprecate kill_invalid_depend in * SchedulerParameters and move it to a new option called DependencyParameters. * Enable job dependencies for any job on any cluster in the same federation. * Allow clusters to be added automatically to db at startup of ctld. Add * AccountingStorageExternalHost slurm.conf parameter. The * "ConditionPathExists" condition in slurmd.service has been disabled by default to permit simpler installation of a "configless" Slurm cluster. * In SchedulerParameters remove deprecated max_job_bf and replace with bf_max_job_test. * Disable sbatch, salloc, srun --reboot for non-admins. SPANK - added support * for S_JOB_GID in the job script context with spank_get_item(). * Prolog/Epilog - add SLURM_JOB_GID environment variable. configuration file changes: * The mpi/openmpi plugin has been removed as it does nothing. MpiDefault=openmpi will be translated to the functionally-equivalent MpiDefault=none. command changes (see man pages for details) * Display StepId=<jobid>.batch instead of StepId=<jobid>.4294967294 in output of "scontrol show step". (slurm_sprint_job_step_info()) * MPMD in srun will now defer PATH resolution for the commands to launch to slurmstepd. Previously it would handle resolution client*side, but with a non*standard approach that walked PATH in reverse. * squeue - added "--me" option, equivalent to --user=$USER. * The LicensesUsed line has been removed from 'scontrol show config'. Please see the 'scontrol show licenses' command as an alternative. * sbatch - adjusted backoff times for "--wait" option to reduce load on slurmctld. This results in a steady*state delay of 32s between queries, instead of the prior 10s delay. - Removed following deprecated patches: * removed patch slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch * removed patch split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for.patch * removed patch slurmctld-uses-xdaemon_-for-systemd.patch * removed patch slurmd-uses-xdaemon_-for-systemd.patch * removed patch slurmdbd-uses-xdaemon_-for-systemd.patch * removed patch slurmsmwd-uses-xdaemon_-for-systemd.patch * removed patch removed-deprecated-xdaemon.patch * Wed Feb 05 2020 Christian Goll <cgoll@suse.com> - standard slurm.conf uses now also SlurmctldHost on all build targets (bsc#1162377) * Mon Jan 27 2020 Egbert Eich <eich@suse.com> - Fix a missed systemd_requires -> systemd_ordering conversion. * Fri Jan 24 2020 Egbert Eich <eich@suse.com> - Remove special OHPC compatibility macro: these settings should be applied univerally. - Add a Recommends for mariadb to slurm-slurmdbd: it is recommened to run the database on the same machine as the daemon. * Fri Jan 24 2020 Dominique Leuenberger <dimstar@opensuse.org> - BuildRequire pkgconfig(systemd) instead of systemd: allow OBS to shortcut through the -mini flavors. - Use systemd_ordering instead of systemd_requires: systemd is never a strict requirement; but in case the system is scheduled for installation together with systemd, we want systemd to be installed prior to slurm. * Thu Jan 23 2020 Christian Goll <cgoll@suse.com> - start slurmdbd after mariadb (bsc#1161716) * Mon Jan 13 2020 Egbert Eich <eich@suse.com> - Fix base_ver for SLE 15 SP2. * Wed Jan 08 2020 Egbert Eich <eich@suse.com> - Update to version 19.05.5 (jsc#SLE-8491) * Check %docdir/NEWS for details. * Includes security fixes CVE-2019-19727, CVE-2019-19728, CVE-2019-12838. * Disable i586 builds as this is no longer supported. * Create libnss_slurm package to support user and group resolution thru slurmstepd. * slurm-2.4.4-rpath.patch -> Remove-rpath-from-build.patch Obsoleted: - pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch - pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch - pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch * Thu Jan 02 2020 Egbert Eich <eich@suse.com> - Deprecate "ControlMachine" only for SLURM version upgrades and products newer than 1501. This ensures that the original setting is retained for the SLURM version shipped origianlly with SLE-15-SP1 or Leap 15.1. * Sat Dec 21 2019 Egbert Eich <eich@suse.com> - Update to v18.08.9 for fixing CVE-2019-19728 (bsc#1159692). * Wrap END_TIMER{,2,3} macro definition in "do {} while (0)" block. * Make sview work with glib2 v2.62. * Make Slurm compile on linux after sys/sysctl.h was deprecated. * Install slurmdbd.conf.example with 0600 permissions to encourage secure use. CVE-2019-19727. * srun - do not continue with job launch if --uid fails. CVE-2019-19728. * Wed Dec 11 2019 Christian Goll <cgoll@suse.com> - added pmix support jsc#SLE-10800 * Sun Dec 08 2019 Egbert Eich <eich@suse.com> - Use --with-shared-libslurm to build slurm binaries using libslurm. - Make libslurm depend on slurm-config. * Fri Dec 06 2019 Egbert Eich <eich@suse.com> - Fix ownership of /var/spool/slurm on new installations and upgrade (boo#1158696). * Thu Oct 31 2019 Egbert Eich <eich@suse.com> - Fix permissions of slurmdbd.conf (bsc#1155784, CVE-2019-19727). - Fix %posttrans macro _res_update to cope with added newline (bsc#1153259). * Mon Oct 21 2019 Egbert Eich <eich@suse.com> - Add package slurm-webdoc which sets up a web server to provide the documentation for the version shipped. * Mon Oct 07 2019 Egbert Eich <eich@suse.com> - Move srun from 'slurm' to 'slurm-node': srun is required on the nodes as well so sbatch will work. 'slurm-node' is a requirement when 'slurm' is installed (bsc#1153095). * Wed Oct 02 2019 Egbert Eich <eich@suse.com> - Set %base_ver for SLE-15-SP2 to 18.08 (for now). * Wed Sep 11 2019 Egbert Eich <eich@suse.com> - Edit sample configuration to deprecate "ControlMachine", "ControlAddr", "BackupController" and "BackupAddr" in favor "SlurmctldHost". * Sat Aug 17 2019 Egbert Eich <eich@suse.com> - Fix logic of slurm-munge recommends: slurm-munge requires munge already, so if we have munge installed we recommend slurm-munge as the authentication when installing slurm or slurm-node. * Sun Jul 14 2019 Egbert Eich <eich@suse.com> - Fix build for SLE-11-SP4 and older. * Fri Jul 12 2019 Christian Goll <cgoll@suse.com> - added cray depend libraries to seperate package, as they are now built, since json is enabled * Thu Jul 11 2019 Christian Goll <cgoll@suse.com> - Updated to 18.08.8 for fixing (CVE-2019-12838, bsc#1140709, jsc#SLE-7341, jsc#SLE-7342) * Update "xauth list" to use the same 10000ms timeout as the other xauth commands. * Fix issue in gres code to handle a gres cnt of 0. * Don't purge jobs if backfill is running. * Verify job is pending add/removing accrual time. * Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. * Set state_reason if select_nodes() fails job for QOS or Account. * Avoid seg_fault on referencing association without a valid_qos bitmap. * If Association/QOS is removed on a pending job set that job as ineligible. * When changing a jobs account/qos always make sure you remove the old limits. * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or account changed. * Restore "sreport -T ALL" functionality. * Correctly typecast signals being sent through the api. * Properly initialize structures throughout Slurm. * Sync "numtask" squeue format option for jobs and steps to "numtasks". * Fix sacct -PD to avoid CA before start jobs. * Fix potential deadlock with backup slurmctld. * Fixed issue with jobs not appearing in sacct after dependency satisfied. * Fix showing non-eligible jobs when asking with -j and not -s. * Fix issue with backfill scheduler scheduling tasks of an array when not the head job. * accounting_storage/mysql - fix SIGABRT in the archive load logic. * accounting_storage/mysql - fix memory leak in the archive load logic. * Limit records per single SQL statement when loading archived data. * Fix unnecessary reloading of job submit plugins. * Allow job submit plugins to be turned on/off with a reconfigure. * Fix segfault when loading/unloading Lua job submit plugin multiple times. * Fix printing duplicate error messages of jobs rejected by job submit plugin. * Fix printing of job submit plugin messages of het jobs without pack id. * Fix memory leak in group_cache.c * Fix jobs stuck from FedJobLock when requeueing in a federation * Fix requeueing job in a federation of clusters with differing associations * sacctmgr - free memory before exiting in 'sacctmgr show runaway'. * Fix seff showing memory overflow when steps tres mem usage is 0. * Upon archive file name collision, create new archive file instead of overwriting the old one to prevent lost records. * Limit archive files to 50000 records per file so that archiving large databases will succeed. * Remove stray newlines in SPANK plugin error messages. * Fix archive loading events. * In select/cons_res: Only allocate 1 CPU per node with the --overcommit and - -nodelist options. * Fix main scheduler from potentially not running through whole queue. * cons_res/job_test - prevent a job from overallocating a node memory. * cons_res/job_test - fix to consider a node's current allocated memory when testing a job's memory request. * Fix issue where multi-node job steps on cloud nodes wouldn't finish cleaning up until the end of the job (rather than the end of the step). * Fix issue with a 17.11 sbcast call to a 18.08 daemon. * Add new job bit_flags of JOB_DEPENDENT. * Make it so dependent jobs reset the AccrueTime and do not count against any AccrueTime limits. * Fix sacctmgr --parsable2 output for reservations and tres. * Prevent slurmctld from potential segfault after job_start_data() called for completing job. * Fix jobs getting on nodes with "scontrol reboot asap". * Record node reboot events to database. * Fix node reboot failure message getting to event table. * Don't write "(null)" to event table when no event reason exists. * Fix minor memory leak when clearing runaway jobs. * Avoid flooding slurmctld and logging when prolog complete RPC errors occur. * Fix GCC 9 compiler warnings. * Fix seff human readable memory string for values below a megabyte. * Fix dump/load of rejected heterogeneous jobs. * For heterogeneous jobs, do not count the each component against the QOS or association job limit multiple times. * slurmdbd - avoid reservation flag column corruption with the use of newer flags, instead preserve the older flag fields that we can still fit in the smallint field, and discard the rest. * Fix security issue in accounting_storage/mysql plugin on archive file loads by always escaping strings within the slurmdbd. CVE-2019-12838. * Mon Jul 08 2019 Egbert Eich <eich@suse.com> - Fix build dependency issue around libibmad-devel introduced in SLE-12-SP4. * Mon Jul 08 2019 Egbert Eich <eich@suse.com> - Add BuildRequires to address warnings during build: * for libcurl-devel, libssh2-devel and rrdtool-devel * for libjson-c-devel and liblz4-devel where available, disable these with --without-json and --without-lz4 where not. * disable DataWarp (--without-datawarp). * Sat Jul 06 2019 Egbert Eich <eich@suse.com> - Update SLURM to 18.08.7: * Set debug statement to debug2 to avoid benign error messages. * Add SchedulerParameters option of bf_hetjob_immediate to attempt to start a heterogeneous job as soon as all of its components are determined able to do so. * Fix underflow causing decay thread to exit. * Fix main scheduler not considering hetjobs when building the job queue. * Fix regression for sacct to display old jobs without a start time. * Fix setting correct number of gres topology bits. * Update hetjobs pending state reason when appropriate. * Fix accounting_storage/filetxt's understanding of TRES. * Set Accrue time when not enforcing limits. * Fix srun segfault when requesting a hetjob with test_exec or bcast options. * Hide multipart priorities log message behind Priority debug flag. * sched/backfill - Make hetjobs sensitive to bf_max_job_start. * Fix slurmctld segfault due to job's partition pointer NULL dereference. * Fix issue with OR'ed job dependencies. * Add new job's bit_flags of INVALID_DEPEND to prevent rebuilding a job's dependency string when it has at least one invalid and purged dependency. * Promote federation unsynced siblings log message from debug to info. * burst_buffer/cray - fix slurmctld SIGABRT due to illegal read/writes. * burst_buffer/cray - fix memory leak due to unfreed job script content. * node_features/knl_cray - fix script_argv use-after-free. * burst_buffer/cray - fix script_argv use-after-free. * Fix invalid reads of size 1 due to non null-terminated string reads. * Add extra debug2 logs to identify why BadConstraints reason is set. * Sat Jul 06 2019 Egbert Eich <eich@suse.com> - Do not build hdf5 support where not available. * Sat Jul 06 2019 Egbert Eich <eich@suse.com> - Add support for version updates on SLE: Update packages to a later version than the version supported originally on SLE will receive a version string in their package name. * Wed Feb 27 2019 Christian Goll <cgoll@suse.com> - added the hdf5 job data gathering plugin * Fri Feb 01 2019 eich@suse.com - Add backward compatibility with SLE-11 SP4 * Thu Jan 31 2019 eich@suse.com - Update to version 18.08.05-2: This version obsoletes: Fix-contrib-perlapi-to-build-with-the-fix-for-CVE-2019-6438-750cc23ed.patch - Fix spec file for older SUSE versions. * Thu Jan 31 2019 eich@suse.com - Update to version 18.08.05: * Add mitigation for a potential heap overflow on 32-bit systems in xmalloc. (CVE-2019-6438, bsc#1123304). * Other fixes: + Backfill - If a job has a time_limit guess the end time of a job better if OverTimeLimit is Unlimited. + Fix "sacctmgr show events event=cluster" + Fix sacctmgr show runawayjobs from sibling cluster + Avoid bit offset of -1 in call to bit_nclear(). + Insure that "hbm" is a configured GresType on knl systems. + Fix NodeFeaturesPlugins=node_features/knl_generic to allow other gres other than knl. + cons_res: Prevent overflow on multiply. + Better debug for bad values in gres.conf. + Fix double accounting of energy at end of job. + Read gres.conf for cloud nodes on slurmctld. + Don't assume the first node of a job is the batch host when purging jobs from a node. + Better debugging when a job doesn't have a job_resrcs ptr. + Store ave watts in energy plugins. + Add XCC plugin for reading Lenovo Power. + Fix minor memory leak when scheduling rebootable nodes. + Fix debug2 prefix for sched log. + Fix printing correct SLURM_JOB_ACCOUNT_PACK_GROUP_* in env for a Het Job. + sbatch - search current working directory first for job script. + Make it so held jobs reset the AccrueTime and do not count against any AccrueTime limits. + Add SchedulerParameters option of bf_hetjob_prio=[min|avg|max] to alter the job sorting algorithm for scheduling heterogeneous jobs. + Fix initialization of assoc_mgr_locks and slurmctld_locks lock structures. + Fix segfault with job arrays using X11 forwarding. + Revert regression caused by e0ee1c7054 which caused negative values and values starting with a decimal to be invalid for PriorityWeightTRES and TRESBillingWeight. + Fix possibility to update a job's reservation to none. + Suppress connection errors to primary slurmdbd when backup dbd is active. + Suppress connection errors to primary db when backup db kicks in + Add missing fields for sacct --completion when using jobcomp/filetxt. + Fix incorrect values set for UserCPU, SystemCPU, and TotalCPU sacct fields when JobAcctGatherType=jobacct_gather/cgroup. + Fixed srun from double printing invalid option msg twice. + Remove unused -b flag from getopt call in sbatch. + Disable reporting of node TRES in sreport. + Re-enabling features combined by OR within parenthesis for non-knl setups. + Prevent sending duplicate requests to reboot a node before ResumeTimeout. + Down nodes that don't reboot by ResumeTimeout. + Update seff to reflect API change from rss_max to tres_usage_in_max. + Add missing TRES constants from perl API. + Fix issue where sacct would return incorrect array tasks when querying specific tasks. + Add missing variables to slurmdb_stats_t in the perlapi. + Fix nodes not getting reboot RPC when job requires reboot of nodes. + Fix failing update the partition list of a job. + Use slurm.conf gres ids instead of gres.conf names to get a gres type name. * Disable slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch: Believed to be fixed by commit c1a537dbbe6 See: https://bugs.schedmd.com/show_bug.cgi?id=5511 * Add Fix-contrib-perlapi-to-build-with-the-fix-for-CVE-2019-6438-750cc23ed.patch: Fix fallout from 750cc23ed for CVE-2019-6438. * Thu Dec 13 2018 cgoll@suse.com - Update to 18.08.04, with following highlights * Fix message sent to user to display preempted instead of time limit when a job is preempted. * Fix memory leak when a failure happens processing a nodes gres config. * Improve error message when failures happen processing a nodes gres config. * Don't skip jobs in scontrol hold. * Allow --cpu-bind=verbose to be used with SLURM_HINT environment variable. * Enhanced handling for runaway jobs * cons_res: Delay exiting cr_job_test until after cores/cpus are calculated and distributed. * Don't check existence of srun --prolog or --epilog executables when set to "none" and SLURM_TEST_EXEC is used. * Add "P" suffix support to job and step tres specifications. * Fix jobacct_gather/cgroup to work correctly when more than one task is started on a node. * salloc - set SLURM_NTASKS_PER_CORE and SLURM_NTASKS_PER_SOCKET in the environment if the corresponding command line options are used. * slurmd - fix handling of the -f flag to specify alternate config file locations. * Add SchedulerParameters option of bf_ignore_newly_avail_nodes to avoid scheduling lower priority jobs on resources that become available during the backfill scheduling cycle when bf_continue is enabled. * job_submit/lua: Add several slurmctld return codes and add user/group info * salloc/sbatch/srun - print warning if mutually exclusive options of --mem and --mem-per-cpu are both set. - Refreshed: * pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch * Mon Dec 10 2018 cgoll@suse.com - restarting services on update only when activated - added rotation of logs - Added backported patches which harden the pam module pam_slurm_adopt (BOO#1116758) which will be in slurm 19.05.x * added pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch [PATCH 1/3] pam_slurm_adopt: avoid running outside of the sshd PAM * added pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch [PATCH 2/3] pam_slurm_adopt: send_user_msg: don't copy undefined data * added pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch [PATCH 3/3] pam_slurm_adopt: use uid to determine whether root is logging on - package slurm-pam_slurm now depends on slurm-node and not on slurm * Wed Dec 05 2018 Christian Goll <cgoll@suse.com> - fixed code in %pretrans section to be compatible with lua 5.1 * Tue Nov 20 2018 eich@suse.com - Added missing perl-base dependency. * Tue Nov 20 2018 eich@suse.com - Moved HTML docs to doc package. * Tue Nov 20 2018 eich@suse.com - Moved config man pages to a separate package: This way, they won't get installed on compute nodes. * Tue Nov 20 2018 eich@suse.com - Update to 18.08.3 * Add new burst buffer state of "teardown-fail" to indicate the burst buffer teardown operation is failing on specific buffers. * Multiple backup slurmctld daemons can be configured * Enable jobs with zero node count for creation and/or deletion of persistent burst buffers. * Add "scontrol show dwstat" command to display Cray burst buffer status. * Add "GetSysStatus" option to burst_buffer.conf file. * Add node and partition configuration options of "CpuBind" to control default task binding. * Add "NumaCpuBind" option to knl.conf * Add sbatch "--batch" option to identify features required on batch node. * Add "BatchFeatures" field to output of "scontrol show job". * Add support for "--bb" option to sbatch command. * Add new SystemComment field to job data structure and database. * Expand reservation "flags" field from 32 to 64 bits. * Add job state flag of "SIGNALING" to avoid race condition. * Properly handle srun --will-run option when there are jobs in COMPLETING state. * Properly report who is signaling a step. * Don't combine updated reservation records in sreport's reservation report. * node_features plugin - Add suport for XOR & XAND of job constraints (node feature specifications). * Improvements to how srun searches for the executible when using cwd. * Now programs can be checked before execution if test_exec is set. * Report NodeFeatures plugin configuration with scontrol and sview commands. * Add acct_gather_profile/influxdb plugin. * Add new job state of SO/STAGE_OUT * Correct SLURM_NTASKS and SLURM_NPROCS environment variable for heterogeneous job step. * Expand advanced reservation feature specification to support parenthesis and counts of nodes with specified features. * Defer job signaling until prolog is completed * Have the primary slurmctld wait until the backup has completely shutdown before taking control. * Fix issue where unpacking job state after TRES count changed could lead to invalid reads. * Heterogeneous job steps allocations supported with Open MPI. * Remove redundant function arguments from task plugins. * Add Slurm configuration file check logic using "slurmctld -t" command. * Add the use of a xml file to help performance when using hwloc. * Remove support for "ChosLoc" configuration parameter. * Configuration parameters "ControlMachine", "ControlAddr", "BackupController" and "BackupAddr" replaced by an ordered list of "SlurmctldHost" records. * Remove --immediate option from sbatch. * Add infrastructure for per-job and per-step TRES parameters. * Add DefCpuPerGpu and DefMemPerGpu to global and per-partition configuration parameters. * Add ValidateMode configuration parameter to knl_cray.conf. * Disable local PTY output processing when using 'srun --unbuffered'. * Change the column name for the %U (User ID) field in squeue to 'UID'. * CRAY - Add CheckGhalQuiesce to the CommunicationParameters. * When a process is core dumping, avoid terminating other processes in that task group. * CPU frequency management enhancements: If scaling_available_frequencies file is not available, then derive values from scaling_min_freq and scaling_max_freq values. * Add pending jobs count to sdiag output. * Add configuration paramerers SlurmctldPrimaryOnProg and SlurmctldPrimaryOffProg, which define programs to execute when a slurmctld daemon changes state. * Add configuration paramerers SlurmctldAddr for use with virtual IP to manage backup slurmctld daemons. * Explicitly shutdown the slurmd process when instructed to reboot. * Add ability to create/update partition with TRESBillingWeights through scontrol. * Calcuate TRES billing values at submission. * Add node_features plugin function "node_features_p_reboot_weight()". * Add NodeRebootWeight parameter to knl.conf configuration file. * Completely remove "gres" field from step record. Use "tres_per_node", "tres_per_socket", etc. * Add "Links" parameter to gres.conf configuration file. * Force slurm_mktime() to set tm_isdst to -1. * burst_buffer.conf - Add SetExecHost flag to enable burst buffer access from the login node for interactive jobs. * Append ", with requeued tasks" to job array "end" emails if any tasks in the array were requeued. * Add ResumeFailProgram slurm.conf option to specify a program that is called when a node fails to respond by ResumeTimeout. * Add new job pending reason of "ReqNodeNotAvail, reserved for maintenance". * Remove AdminComment += syntax from 'scontrol update job'. * sched/backfill: Reset job time limit if needed for deadline scheduling. * For heterogeneous job component with required nodes, explicitly exclude those nodes from all other job components. * Add name of partition used to output of srun --test-only output. * sdiag output now reports outgoing slurmctld message queue contents. * Improve escaping special characters on user commands when specifying paths. * Add salloc/sbatch/srun option of --gres-flags=disable-binding to disable filtering of CPUs with respect to generic resource locality. * SlurmDBD - Print warning if MySQL/MariaDB internal tuning is not at least half of the recommended values. * Add ability to specify a node reason when rebooting nodes with "scontrol reboot". * Add nextstate option to "scontrol reboot". * Consider "resuming" (nextstate=resume) nodes as available in backfill future scheduling. * Add TimelimitRaw sacct output field to display timelimit numbers. * Add support for sacct --whole-hetjob=[yes|no] option. * Make salloc handle node requests the same as sbatch. * Add shutdown_on_reboot SlurmdParameter to control whether the Slurmd will shutdown itself down or not when a reboot request is received. * Add cancel_reboot scontrol option to cancel pending reboot of nodes. * Make Users case insensitive in the database based on Parameters=PreserveCaseUser in the slurmdbd.conf. * Improve scheduling when dealing with node_features that could have a boot delay. * Changed the default AuthType for slurmdbd to auth/munge. * Added 'remote-fs.target' to After directive of slurmd.service file. * Remove drain on node when reboot nextstate used. * Speed up pack of job's qos. * Add sacctmgr options to prevent/manage job queue stuffing: - GrpJobsAccrue=<max_jobs> - MaxJobsAccrue=<max_jobs> * MinPrioThreshold Minimum priority required to reserve resources when scheduling. * Add control_inx value to trigger_info_msg_t to permit future work in the trigger management code to distinguish which of multiple backup controllers has changed state. * NOTES: PreemptType=preempt/job_prio has been removed - use PreemptType=preempt/qos instead. * Bluegene support was deprecated has now been removed * cgroup_allowed_devices_file.conf was removed. It was never used by default, as ConstrainDevices was not set. If needed, refer to the cgroups.conf man page on how to create one. * slurm.epilog.clean: Removed. User should use pam_slurm_adopt instead. - Refreshed: * removed-deprecated-xdaemon.patch * slurmctld-uses-xdaemon_-for-systemd.patch * slurmd-uses-xdaemon_-for-systemd.patch * slurmdbd-uses-xdaemon_-for-systemd.patch * slurmsmwd-uses-xdaemon_-for-systemd.patch * slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch * Sun Sep 30 2018 eich@suse.com - Move config man-pages to config package. * Mon Sep 24 2018 cgoll@suse.com - added correct link flags for perl bindings (bsc#1108671) * added correct linker search path in slurm-2.4.4-rpath.patch * perl:Switch is required by slurm torque wrappers * Sat Sep 22 2018 eich@suse.com - Fix Requires(pre) and Requires(post) for slurm-config and slurm-node. This fixes issues with failing slurm user creation when installed during initial system installation (bsc#1109373). * Tue Aug 14 2018 eich@suse.com - Update to 17.11.9 * Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. * Remove erroneous unlock in acct_gather_energy/ipmi. * Enable support for hwloc version 2.0.1. * Fix 'srun -q' (--qos) option handling. * Fix socket communication issue that can lead to lost task completition messages, which will cause a permanently stuck srun process. * Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. * Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. * Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. * Fix sinfo to print correct node state. * Thu Aug 02 2018 eich@suse.com - When using a remote shared StateSaveLocation, slurmctld needs to be started after remote filesystems have become available. Add 'remote-fs.target' to the 'After=' directive in slurmctld.service (boo#1103561). * Tue Jul 31 2018 eich@suse.com - Update to 17.11.8 * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. * Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout. * task/cray plugin - search for "mems" cgroup information in the file "cpuset.mems" then fall back to the file "mems". * Fix ipmi profile debug uninitialized variable. * PMIx: fixed the direct connect inline msg sending. * MYSQL: Fix issue not handling all fields when loading an archive dump. * Allow a job_submit plugin to change the admin_comment field during job_submit_plugin_modify(). * job_submit/lua - fix access into reservation table. * MySQL - Prevent deadlock caused by archive logic locking reads. * Don't enforce MaxQueryTimeRange when requesting specific jobs. * Modify --test-only logic to properly support jobs submitted to more than one partition. * Prevent slurmctld from abort when attempting to set non-existing qos as def_qos_id. * Add new job dependency type of "afterburstbuffer". The pending job will be delayed until the first job completes execution and it's burst buffer stage-out is completed. * Reorder proctrack/task plugin load in the slurmstepd to match that of slurmd and avoid race condition calling task before proctrack can introduce. * Prevent reboot of a busy KNL node when requesting inactive features. * Revert to previous behavior when requesting memory per cpu/node introduced in 17.11.7. * Fix to reinitialize previously adjusted job members to their original value when validating the job memory in multi-partition requests. * Fix _step_signal() from always returning SLURM_SUCCESS. * Combine active and available node feature change logs on one line rather than one line per node for performance reasons. * Prevent occasionally leaking freezer cgroups. * Fix potential segfault when closing the mpi/pmi2 plugin. * Fix issues with --exclusive=[user|mcs] to work correctly with preemption or when job requests a specific list of hosts. * Make code compile with hdf5 1.10.2+ * mpi/pmix: Fixed the collectives canceling. * SlurmDBD: improve error message handling on archive load failure. * Fix incorrect locking when deleting reservations. * Fix incorrect locking when setting up the power save module. * Fix setting format output length for squeue when showing array jobs. * Add xstrstr function. * Fix printing out of --hint options in sbatch, salloc --help. * Prevent possible divide by zero in _validate_time_limit(). * Add Delegate=yes to the slurmd.service file to prevent systemd from interfering with the jobs' cgroup hierarchies. * Change the backlog argument to the listen() syscall within srun to 4096 to match elsewhere in the code, and avoid communication problems at scale. * Tue Jul 31 2018 eich@suse.com - slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch: Fix race in the slurmctld backup controller which prevents it to clean up allocations on nodes properly after failing over (bsc#1084917). - Handled %license in a backward compatible manner. * Sat Jul 28 2018 eich@suse.com - Add a 'Recommends: slurm-munge' to slurm-slurmdbd. * Wed Jul 11 2018 eich@suse.com - Shield comments between script snippets with a %{!?nil:...} to avoid them being interpreted as scripts - in which case the update level is passed as argument (see chapter 'Shared libraries' in: https://en.opensuse.org/openSUSE:Packaging_scriptlet_snippets) (bsc#1100850). * Tue Jun 05 2018 cgoll@suse.com - Update from 17.11.5 to 17.11.7 - Fix security issue in handling of username and gid fields CVE-2018-10995 and bsc#1095508 what implied an update from 17.11.5 to 17.11.7 Highlights of 17.11.6: * CRAY - Add slurmsmwd to the contribs/cray dir * PMIX - Added the direct connect authentication. * Prevent the backup slurmctld from losing the active/available node features list on takeover. * Be able to force power_down of cloud node even if in power_save state. * Allow cloud nodes to be recognized in Slurm when booted out of band. * Numerous fixes - check 'NEWS' file. Highlights of 17.11.7: * Notify srun and ctld when unkillable stepd exits. * Numerous fixes - check 'NEWS' file. - Add: slurmsmwd-uses-xdaemon_-for-systemd.patch * Fixes daemoniziation in newly introduced slurmsmwd daemon. - Rename: split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for-systemd-compatibilty.patch to split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for.patch * remain in sync with commit messages which introduced that file * Thu Apr 19 2018 eich@suse.com - Avoid running pretrans scripts when running in an instsys: there may be not much installed, yet. pretrans code should be done in lua, this way, it will be executed by the rpm-internal lua interpreter and not be passed to a shell which may not be around at the time this scriptlet is run (bsc#1090292). * Fri Apr 13 2018 eich@suse.com - Add requires for slurm-sql to the slurmdbd package. * Thu Apr 12 2018 eich@suse.com - Package READMEs for pam and pam_slurm_adopt. - Use the new %%license directive for COPYING file. * Thu Apr 12 2018 eich@suse.com - Add: * split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for-systemd-compatibilty.patch * slurmctld-uses-xdaemon_-for-systemd.patch * slurmd-uses-xdaemon_-for-systemd.patch * slurmdbd-uses-xdaemon_-for-systemd.patch * removed-deprecated-xdaemon.patch Fix interaction with systemd: systemd expects that a daemonizing process doesn't go away until the PID file with it PID of the daemon has bee written (bsc#1084125). * Wed Apr 11 2018 eich@suse.com - Make sure systemd services get restarted only when all packages are in a consistent state, not in the middle of an 'update' transaction (bsc#1088693). Since the %postun scripts that run on update are from the old package they cannot be changed - thus we work around the restart breakage. * Fri Mar 23 2018 cgoll@suse.com - fixed wrong log file location in slurmdbd.conf and fixed pid location for slurmdbd and made slurm-slurmdbd depend on slurm config which provides the dir /var/run/slurm (bsc#1086859). * Fri Mar 16 2018 cgoll@suse.com - added comment for (bsc#1085606) * Wed Mar 14 2018 eich@suse.com - Fix security issue in accounting_storage/mysql plugin by always escaping strings within the slurmdbd. CVE-2018-7033 http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-7033 (bsc#1085240). - Update slurm to v17.11.5 (FATE#325451) Highlights of 17.11: * Support for federated clusters to manage a single work-flow across a set of clusters. * Support for heterogeneous job allocations (various processor types, memory sizes, etc. by job component). Support for heterogeneous job steps within a single MPI_COMM_WORLD is not yet supported for most configurations. * X11 support is now fully integrated with the main Slurm code. Remove any X11 plugin configured in your plugstack.conf file to avoid errors being logged about conflicting options. * Added new advanced reservation flag of "flex", which permits jobs requesting the reservation to begin prior to the reservation's start time and use resources inside or outside of the reservation. A typical use case is to prevent jobs not explicitly requesting the reservation from using those reserved resources rather than forcing jobs requesting the reservation to use those resources in the time frame reserved. * The sprio command has been modified to report a job's priority information for every partition the job has been submitted to. * Group ID lookup performed at job submit time to avoid lookup on all compute nodes. Enable with PrologFlags=SendGIDs configuration parameter. * Slurm commands and daemons dynamically link to libslurmfull.so instead of statically linking. This dramatically reduces the footprint of Slurm. * In switch plugin, added plugin_id symbol to plugins and wrapped switch_jobinfo_t with dynamic_plugin_data_t in interface calls in order to pass switch information between clusters with different switch types. * Changed default ProctrackType to cgroup. * Changed default sched_min_interval from 0 to 2 microseconds. * Added new 'scontrol write batch_script ' command to fetch a job's batch script. Removed the ability to see the script as part of the 'scontrol -dd show job' command. * Add new "billing" TRES which allows jobs to be limited based on the job's billable TRES calculated by the job's partition's TRESBillingWeights. * Regular user use of "scontrol top" command is now disabled. Use the configuration parameter "SchedulerParameters=enable_user_top" to enable that functionality. The configuration parameter "SchedulerParameters=disable_user_top" will be silently ignored. * Change default to let pending jobs run outside of reservation after reservation is gone to put jobs in held state. Added NO_HOLD_JOBS_AFTER_END reservation flag to use old default. Support for PMIx v2.0 as well as UCX support. * Remove plugins for obsolete MPI stacks: - lam - mpich1_p4 - mpich1_shmem - mvapich * Numerous fixes - check 'NEWS' file. - slurmd-Fix-slurmd-for-new-API-in-hwloc-2.0.patch plugins-cgroup-Fix-slurmd-for-new-API-in-hwloc-2.0.patch: Removed. Code upstream. - slurmctld-service-var-run-path.patch: Replaced by sed script. - Fix some rpmlint warnings. * Mon Jan 29 2018 cgoll@suse.com - moved config files to slurm-config package (FATE#324574). * Mon Jan 29 2018 jjolly@suse.com - Moved slurmstepd and man page into slurm-node due to slurmd dependency - Moved config files into slurm-node - Moved slurmd rc scripts into slurm-node - Made slurm-munge require slurm-plugins instead of slurm itself - slurm-node suggested slurm-munge, causing the whole slurm to be installed. The slurm-plugins seems to be a more base class (FATE#324574). * Wed Jan 17 2018 cgoll@suse.com - split up light wight slurm-node package for deployment on nodes (FATE#324574). * Fri Dec 01 2017 cgoll@suse.com - added /var/spool/ directory and removed duplicated entries from slurm.conf * Fri Nov 10 2017 eich@suse.com - Package so-versioned libs separately. libslurm is expected to change more frequently and thus is packaged separately from libpmi. * Wed Nov 01 2017 eich@suse.com - Updated to 17.02.9 to fix CVE-2017-15566 (bsc#1065697). Changes in 17.0.9 * When resuming powered down nodes, mark DOWN nodes right after ResumeTimeout has been reached (previous logic would wait about one minute longer). * Fix sreport not showing full column name for TRES Count. * Fix slurmdb_reservations_get() giving wrong usage data when job's spanned reservation that was modified. * Fix sreport reservation utilization report showing bad data. * Show all TRES' on a reservation in sreport reservation utilization report by default. * Fix sacctmgr show reservation handling "end" parameter. * Work around issue with sysmacros.h and gcc7 / glibc 2.25. * Fix layouts code to only allow setting a boolean. * Fix sbatch --wait to keep waiting even if a message timeout occurs. * CRAY - If configured with NodeFeatures=knl_cray and there are non-KNL nodes which include no features the slurmctld will abort without this patch when attemping strtok_r(NULL). * Fix regression in 17.02.7 which would run the spank_task_privileged as part of the slurmstepd instead of it's child process. * Fix security issue in Prolog and Epilog by always prepending SPANK_ to all user-set environment variables. CVE-2017-15566. Changes in 17.0.8: * Add 'slurmdbd:' to the accounting plugin to notify message is from dbd instead of local. * mpi/mvapich - Buffer being only partially cleared. No failures observed. * Fix for job --switch option on dragonfly network. * In salloc with --uid option, drop supplementary groups before changing UID. * jobcomp/elasticsearch - strip any trailing slashes from JobCompLoc. * jobcomp/elasticsearch - fix memory leak when transferring generated buffer. * Prevent slurmstepd ABRT when parsing gres.conf CPUs. * Fix sbatch --signal to signal all MPI ranks in a step instead of just those on node 0. * Check multiple partition limits when scheduling a job that were previously only checked on submit. * Cray: Avoid running application/step Node Health Check on the external job step. * Optimization enhancements for partition based job preemption. * Address some build warnings from GCC 7.1, and one possible memory leak if /proc is inaccessible. * If creating/altering a core based reservation with scontrol/sview on a remote cluster correctly determine the select type. * Fix autoconf test for libcurl when clang is used. * Fix default location for cgroup_allowed_devices_file.conf to use correct default path. * Document NewName option to sacctmgr. * Reject a second PMI2_Init call within a single step to prevent slurmstepd from hanging. * Handle old 32bit values stored in the database for requested memory correctly in sacct. * Fix memory leaks in the task/cgroup plugin when constraining devices. * Make extremely verbose info messages debug2 messages in the task/cgroup plugin when constraining devices. * Fix issue that would deny the stepd access to /dev/null where GRES has a 'type' but no file defined. * Fix issue where the slurmstepd would fatal on job launch if you have no gres listed in your slurm.conf but some in gres.conf. * Fix validating time spec to correctly validate various time formats. * Make scontrol work correctly with job update timelimit [+|-]=. * Reduce the visibily of a number of warnings in _part_access_check. * Prevent segfault in sacctmgr if no association name is specified for an update command. * burst_buffer/cray plugin modified to work with changes in Cray UP05 software release. * Fix job reasons for jobs that are violating assoc MaxTRESPerNode limits. * Fix segfault when unpacking a 16.05 slurm_cred in a 17.02 daemon. * Fix setting TRES limits with case insensitive TRES names. * Add alias for xstrncmp() -- slurm_xstrncmp(). * Fix sorting of case insensitive strings when using xstrcasecmp(). * Gracefully handle race condition when reading /proc as process exits. * Avoid error on Cray duplicate setup of core specialization. * Skip over undefined (hidden in Slurm) nodes in pbsnodes. * Add empty hashes in perl api's slurm_load_node() for hidden nodes. * CRAY - Add rpath logic to work for the alpscomm libs. * Fixes for administrator extended TimeLimit (job reason & time limit reset). * Fix gres selection on systems running select/linear. * sview: Added window decorator for maximize,minimize,close buttons for all systems. * squeue: interpret negative length format specifiers as a request to delimit values with spaces. * Fix the torque pbsnodes wrapper script to parse a gres field with a type set correctly. - Fixed ABI version of libslurm. * Fri Oct 06 2017 jengelh@inai.de - Trim redundant wording in descriptions. * Wed Sep 27 2017 jjolly@suse.com - Updated to slurm 17-02-7-1 * Added python as BuildRequires * Removed sched-wiki package * Removed slurmdb-direct package * Obsoleted sched-wiki and slurmdb-direct packages * Removing Cray-specific files * Added /etc/slurm/layout.d files (new for this version) * Remove /etc/slurm/cgroup files from package * Added lib/slurm/mcs_account.so * Removed lib/slurm/jobacct_gather_aix.so * Removed lib/slurm/job_submit_cnode.so - Created slurm-sql package - Moved files from slurm-plugins to slurm-torque package - Moved creation of /usr/lib/tmpfiles.d/slurm.conf into slurm.spec * Removed tmpfiles.d-slurm.conf - Changed /var/run path for slurm daemons to /var/run/slurm * Added slurmctld-service-var-run-path.patch (FATE#324026). * Tue Sep 12 2017 jjolly@suse.com - Made tmpfiles_create post-install macro SLE12 SP2 or greater - Directly calling systemd-tmpfiles --create for before SLE12 SP2 * Mon Jul 10 2017 jjolly@suse.com - Allows OpenSUSE Factory build as well - Removes unused .service files from project - Adds /var/run/slurm to /usr/lib/tmpfiles.d for boottime creation * Patches upstream .service files to allow for /var/run/slurm path * Modifies slurm.conf to allow for /var/run/slurm path * Tue May 30 2017 eich@suse.com - Move wrapper script mpiexec provided by slrum-torque to mpiexec.slurm to avoid conflicts. This file is normally provided by the MPI implementation (boo#1041706). * Mon May 08 2017 eich@suse.com - Replace remaining ${RPM_BUILD_ROOT}s. - Improve description. - Fix up changelog. * Fri Mar 31 2017 eich@suse.com - Spec file: Replace "Requires : slurm-perlapi" by "Requires: perl-slurm = %{version}" (boo#1031872). * Thu Feb 16 2017 jengelh@inai.de - Trim redundant parts of description. Fixup RPM groups. - Replace unnecessary %__ macro indirections; replace historic $RPM_* variables by macros. * Wed Feb 15 2017 eich@suse.com - slurmd-Fix-for-newer-API-versions.patch: Stale patch removed. * Tue Feb 07 2017 eich@suse.com - Use %slurm_u and %slurm_g macros defined at the beginning of the spec file when adding the slurm user/group for consistency. - Define these macros to daemon,root for non-systemd. - For anything newer than Leap 42.1 or SLE-12-SP1 build OpenHPC compatible. * Wed Feb 01 2017 eich@suse.com - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Fix array initialzation and ensure strings are always NULL terminated in - pam_slurm.c (bsc#1007053). - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. * Tue Jan 03 2017 eich@suse.com - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. * Fri Dec 02 2016 eich@suse.com - Create slurm user/group in preinstall script. * Wed Nov 30 2016 eich@suse.com - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. * Tue Nov 22 2016 eich@suse.com - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. * Fri Nov 04 2016 eich@suse.com - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. Required for FATE#316379. * Mon Oct 17 2016 eich@suse.com - Setting 'download_files' service to mode='localonly' and adding source tarball. (Required for Factory). * Sat Oct 15 2016 eich@suse.com - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. * Sun Nov 01 2015 scorot@free.fr - version 15.08.3 * Many new features and bug fixes. See NEWS file - update files list accordingly - fix wrong end of line in some files * Thu Aug 06 2015 scorot@free.fr - version 14.11.8 * Many bug fixes. See NEWS file - update files list accordingly * Sun Nov 02 2014 scorot@free.fr - add missing systemd requirements - add missing rclink * Sun Nov 02 2014 scorot@free.fr - version 14.03.9 * Many bug fixes. See NEWS file - add systemd support
/usr/bin/seff /usr/bin/smail
Generated by rpm2html 1.8.1
Fabrice Bellet, Tue Jul 9 13:44:15 2024