Weekly OnCall Summary Sept 27th - Oct 4th, 2021

Weekly OnCall Rotation

OnCall US Pacific: Leo (leo@harmony.one)

OnCall Asia/EMEA: Yuriy (yuriy@harmony.one)

Duration: 09/28 8:30am - 10/05 8:30am PST

Summary

  • Quiet week with two mainnet upgrade/downgrade

Details

09/27:

  • Temporary upgrade of s0 nodes to block some addresses, and downgraded back to 4.2.1 again to unblock the addresses

09/30:

  • One explorer node is out of space. No warning/paging on this node. PagerDuty
  • Found 3 m5d.2xlarge nodes are OOS. Taking them out of service. They have caused api.harmony.one service interruption

10/04:

  • Mainnet upgrade v4.3.0 smoothly
  • Two snapshot nodes OOS, upgraded to next level of nodes in lightsail
  • Node Disk Space Alert - the free space of the mainnet shard0 node(34.216.159.65) abnormal
  • Soph has upgraded two node instances above from 8G to 20 G

Takeaways:

  1. Take nodes offline of ELB asap if they are put into maintenance mode, to avoid service interruption on RPC nodes
1 Like

love our practice of openly sharing team ops on this forum.