Weekly On-call Summary Nov 2nd - Nov 9th

Weekly OnCall Rotation

Oncall: Jack & Jacky (US Pacific Daytime) & Haodi (US Pacific NightTime)

Duration: Nov 2nd, 2021 to Nov 9th, 2021

Summary

  1. Beacon out of sync Shard 1 Mainnet, due to high memory usage
  2. Most beacon out of sync issues automatically resolve themselves (memory leak)
  3. Bridge endpoint issues (nodes exposed to websockets)
  4. Minor Explorer issues

Details

Nov 2, 2021

Fairly quiet, beacon out of sync

Nov 3, 2021

Pocket Network bridge issues, switched back to Harmony.

Root cause :

  • user generated transaction errors no longer count against the node and are no longer also relayed to our backups

  • two new nodes upgraded to replace smaller nodes

Nov 4, 2021

High memory usage on Shard 1 Mainnet, restart solved it

After restart, memory’s freed up

Nov 5, 2021

Indexer down ​​

Monitor is DOWN: explorer-indexer-s123 ( http://indexer123.explorer.t.hmny.io:3002 ).

Site was up. Blip in Uptime Robot

Average UnHealthyHostCount GreaterThanOrEqualToThreshold 1.0

Bridge-RPC-at-least-1-unhealthy

Root cause – a mix of old and new boxes, some used for websockets endpoint, which comes under heavier load.

Solution – Cleaned that up to only use API-based nodes, similar to RPC endpoints (api.harmony, rpc/api.s0.t.hmny.io)

Nov 6, 2021

From Soph: crypto.com saying that it’s been 48h they are not able to do any transfer, they say their transaction always fail response error: {“code”:-32000,“message”:“replacement transaction underpriced”}" they gas fee is already at 20 gwei, and they also tried higher without success

and we prepared another txn with nonce 52327

Here is the signed result

0xf87182cc678504a817c8008261a8808094d39f985af48a2befd8b9bda5f70b44ff07efccb9891fec9e3e473fb800008025a0fdcc40eafd6efec5e92027ae41a735721f124019c939eeaf3a62abc60a260934a02293a48d876db053dbc070fefcb45aa160fc729eff1616a0ca2ebc1776ba6bc8

response error: {"code":-32000,"message":"replacement transaction underpriced"}"

Update: Clearing mempool by restarting the RPC nodes. Error message above happens when the transaction is run twice with the same nonce (from same node, or different node)

Nov 7, 2021

Several BTC index node out of sync issue

  • Indexer software is open source, it auto resolved after 10 mins
  • Soph resolved monitoring bug - when auto resolves, Uptime Robot report to PD now auto-closes

Beacon out of sync issue.

  • Related to memory issue on shard nodes?
  • Most likely to be caused by beacon Sync

Nov 8, 2021

  1. Beacon out of sync issue.

  2. Nodes unhealthy due to surge from VenomDAO release of Euphoria

Harmony nodes became unhealthy, Pocket looked fine

Nov 9, 2021

From Soph

http://api2.explorer.t.hmny.io:3000/v0/shard/0/block/number/0 not working

both api1/api2 were fixed after restarting them

Action Items

  1. Open GH issue – Memory leak on all shard chains (running on 4GB nodes) – may be from beacon sync service (@Jacky)
  2. Open GH issue – Error when transactions are sent with the same nonce – issue happened with the lower gas price failure – transaction pool logic, shouldn’t get stuck for all proceeding issue (@Jacky)
  3. Open GH issue – displaying fees burned per block in explorer (@Jacksteroo)
  4. Data Dog – currently in trial period, could replace Grafana