Weekly on-call Summary July 21 - July 26, 2021

On-call: @giv July 21 - July 26, 2021

Summary

Jul 21, 2021

  • Giv: Removed testnet from #status bot due to noisy incidents until @sophoah implements maintenance window.
  • Giv: Added Explorer backend (35.167.126.78:8888) to status monitoring.
  • Giv: Incident 8:27 pm PT: (@sophoah) to adjust Watchdog to reduce noise.

Jul 22, 2021

Jul 23, 2021

Jul 24, 2021

  • Soph: Grafana mem alert has been updated to alert only after 15 min of mem usage above 85%
  • Soph: S2 Explorer node beacon shard chain is being clone to fix the 300k block behind
  • Soph: Multiple high memory grafana alert which needed a node harmony process restart (x5)
  • Giv: Incident 12:22 pm PT: getlastcrosslinks method is taking 3.5 mins to respond. TODO: Add timeouts to all methods.

Jul 25, 2021

Jul 26, 2021