Monthly Archives: March 2015

Hexagon: MDS crash

Lóránd Szentannai • March 30, 2015

The metadata server for /work filesystem crashed on Friday evening. Some user might have encountered filesystem errors at this point of time.

OOM on login3 and login4

Alexander Oltu • March 25, 2015

Someone managed to kill login3 and login4 by oversubscribing memory. As a result, the jobs started from these login nodes were killed. We've started these login nodes and will investigate reasons tomorrow.

Hexagon: rebooted login1

Lóránd Szentannai • March 19, 2015

Login node 1 hung and we had to reboot it. Affected jobs are: 1689188, 1691986, 1693190, 1688272, 1688273, 1688264, 1688265, 1691903, 1693054, 1693083, 1693214, 1693209, 1693084, 1693203, 1693204, 1693056, 1692989, 1693499.

Hexagon: NFS timeouts on login nodes

Alexander Oltu • March 2, 2015

We have once in a while NFS timeouts on different login nodes, the user logged in experience them as a short hangs. This been going for some last week, but not that often. The last week it started to be very often and almost on all nodes. We've applied patch which is suppose to fix this issue. In order for changes to be picked up we need to restart Hexagon. Update 15:30: Hexagon is up again.

HPC Syslog

Log over changes and events on UiB's HPC systems

Monthly Archives: March 2015

Hexagon: MDS crash

OOM on login3 and login4

Hexagon: rebooted login1

Hexagon: NFS timeouts on login nodes