Download HTCondor 8.7.6 / 8.6.9

Spread the love

The HTCondor Team at the University of Wisconsin-Madison has released two new versions of its workload management system HTCondor. Version 8.6.9 has been released in the stable branch and version 8.7.6 in the development branch. HTCondor focuses on the management of compute-intensive tasks and can distribute them over several connected nodes. The user sends his task to HTCondor, after which it handles the process based on set policies and the availability of connected resources, and finally sends the results back to the user. HTCondor can, for example, control a dedicated Beowulf cluster, but also regular desktops that have nothing to do for a while. During the day SC16 Google, Fermilab and the HTCondor Team have a 160k-core cloud-based elastic compute cluster demonstrated. The list of changes of these releases is as follows:

Version 8.7.6

New Features:

  • Changed the default value of configuration parameter IS_OWNER to False. The previous default value is now set as part of the use POLICY : Desktop configuration template. (Ticket #6463).
  • You may now use SCHEDD and JOB instead of MY and TARGET in SUBMIT_REQUIREMENTS expressions. (Ticket #4818).
  • Added cmake build option WANT_PYTHON_WHEELS and make target pypi_staging to build the framework for Python wheels. This option and target are not enabled by default and are not likely to work outside of Linux environments with a single Python installation. (Ticket #6486).
  • Added new job attributes BatchProject and BatchRuntime for grid-type batch jobs. They specify the project/allocation name and maximum runtime in seconds for the job that’s submitted to the underlying batch system. (Ticket #6451).
  • HTCondor now respects ATTR_JOB_SUCCESS_EXIT_CODE when sending job notifications. (Ticket #6432).
  • Added some graph metrics (height, width, etc.) to DAGMan’s metrics file output. (Ticket #6470).
  • Removed Quill from HTCondor codebase. (Ticket #6496).

Bugs Fixed:

  • HTCondor now reports all submit warnings, not just the first one. (Ticket #6446).
  • The job log will no longer contain empty submit warnings. (Ticket #6465).
  • DAGMan previously connected to condor_schedd every time it detected an update in its internal state. This is too aggressive for rapidly changing DAGs, so we’ve changed the connection to happen in time intervals defined by DAGMAN_QUEUE_UPDATE_INTERVAL, by default once every five minutes. (Ticket #6464).
  • DAGMan now enforces the DAGMAN_MAX_JOB_HOLDS limit by the number of held jobs in a cluster at the same time. Previously it counted all holds over the lifetime of a cluster, even if only a small number of them are active at the same time. (Ticket #6492).
  • Fixed a bug where on rare occasions the ShadowLog would become owned by root. (Ticket #6485).
  • Fixed a bug where using condor_qedit to change any of the concurrency limits of a job would have no effect. (Ticket #6448).
  • When copy_to_spool is set to True, condor_submit now attempts to transfer the job executable only once per job cluster, instead of once per job. (Ticket #6459).
  • Fixed a bug that could result in an incorrect total reported by condor_rm when the -totals option is used. (Ticket #6450).

Version 8.6.9

New Features:

  • When a daemon crashes, more information about the cause is now written to its log file. (Ticket #6483).

Bugs Fixed:

  • Fixed a bug in the group quotas that would give too much surplus quota to some groups when ACCEPT_SURPLUS is on and NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION is true (the default) (Ticket #6514).
  • Fixed a bug in the Python bindings when doing queries that specify a projection with the “attr_list” argument. The bug could potentially result in memory corruption of the python interpreter process. (Ticket #6468).
  • Reduced the amount of time that condor_preen will block the condor_schedd. condor_preen now connects only when specifically needed, and automatically disconnects after PREEN_MAX_SCHEDD_CONNECTION_TIME seconds. (Ticket #6490).
  • Fixed a bug on Windows that would often result in the job sandbox on the execute node not being deleted when the condor_schedd relinquished its claim on the slot before the condor_starter had exited. (Ticket #6497).
  • Fixed a bug where the condor_master stopped sending watchdog notifications to systemd after restarting itself. This resulted in systemd killing the condor_master shortly after the restart. (Ticket #6476).
  • Updated the systemd configuration to only restart HTCondor upon failure. Otherwise, systemd would restart HTCondor if condor_off requested the condor_master to exit. (Ticket #6503).
  • Fixed a bug with the use of the scheduler parameter MAX_JOBS_SUBMITTED. If this limit was ever reached by a submit with more than one proc in the cluster, the limit would be reduced by the difference until the condor_schedd was restarted. (Ticket #6460).
  • Fixed a bug that caused very large RequestDisk requests to fail, and cause the Disk attribute in the machine ad to go negative. (Ticket #6467).
  • Fixed a bug with the RESERVED_DISK parameter that would not accept an argument larger than 2 Gigabytes. (Ticket #6472).
  • Improved validation of the lengths of messages in PASSWORD and SSL authentication methods. (Ticket #6493).
  • Fixed a problem where the VM universe would be taken offline on the execute node, if the qcow2 disk image was corrupt. The offending job is now put on hold with an appropriate hold message. (Ticket #6505).
  • Fixed a problem which would prevent Java universe jobs from working when using a relative path name to a jar file and submitting from Linux to Windows or vice versa. (Ticket #6474).
  • Fixed a bug on 32 bit Linux systems that caused the starter to crash on startup if cgroup limits were enabled. (Ticket #6501).
  • Fixed a bug in Startd Cron (see 4.4.3) where, in effect, SlotMergeConstraint was ignored. (Ticket #6488).
  • Fixed a bug when IPv6 is enabled which could cause the condor_startd to crash when spawning a starter. (Ticket #6462).
  • Fixed a bug in condor_q which could cause the DONE amount to be incorrect when multiple clusters shared a batch name. (Ticket #6469).
  • Fixed issue on newer versions of Linux where core files generated by a daemon were not usable by gdb. A side effect of this fix is ​​that the configuration parameter CORE_FILE_NAME no longer has any effect on Linux. (Ticket #6482).
  • condor_chirp will now no longer abort when given a command that it cannot successfully execute, such as fetching a file that does not exist. (Ticket #6402).
  • Removed unneeded copy_to_spool statement from default interactive submit file. (Ticket #6315).

Version number 8.7.6 / 8.6.9
Release status Final
Operating systems Windows 7, Linux, BSD, macOS, Solaris, UNIX, Windows Server 2008, Windows Server 2012, Windows 8, Windows 10
Website HTCondor
Download
License type Conditions (GNU/BSD/etc.)
You might also like
Exit mobile version