Weekly OnCall Rotation
Oncall: Jack & Jacky (US Pacific Daytime) & Haodi (US Pacific NightTime)
Duration: Nov 2nd, 2021 to Nov 9th, 2021
Summary
- Beacon out of sync Shard 1 Mainnet, due to high memory usage
- Most beacon out of sync issues automatically resolve themselves (memory leak)
- Bridge endpoint issues (nodes exposed to websockets)
- Minor Explorer issues
Details
Nov 2, 2021
Fairly quiet, beacon out of sync
Nov 3, 2021
Pocket Network bridge issues, switched back to Harmony.
Root cause :
-
user generated transaction errors no longer count against the node and are no longer also relayed to our backups
-
two new nodes upgraded to replace smaller nodes
Nov 4, 2021
High memory usage on Shard 1 Mainnet, restart solved it
After restart, memory’s freed up
Nov 5, 2021
Indexer down
Monitor is DOWN: explorer-indexer-s123 ( http://indexer123.explorer.t.hmny.io:3002 ).
Site was up. Blip in Uptime Robot
Average UnHealthyHostCount GreaterThanOrEqualToThreshold 1.0
Bridge-RPC-at-least-1-unhealthy
Root cause – a mix of old and new boxes, some used for websockets endpoint, which comes under heavier load.
Solution – Cleaned that up to only use API-based nodes, similar to RPC endpoints (api.harmony, rpc/api.s0.t.hmny.io)
Nov 6, 2021
From Soph: crypto.com saying that it’s been 48h they are not able to do any transfer, they say their transaction always fail response error: {“code”:-32000,“message”:“replacement transaction underpriced”}" they gas fee is already at 20 gwei, and they also tried higher without success
and we prepared another txn with nonce 52327
Here is the signed result
0xf87182cc678504a817c8008261a8808094d39f985af48a2befd8b9bda5f70b44ff07efccb9891fec9e3e473fb800008025a0fdcc40eafd6efec5e92027ae41a735721f124019c939eeaf3a62abc60a260934a02293a48d876db053dbc070fefcb45aa160fc729eff1616a0ca2ebc1776ba6bc8
response error: {"code":-32000,"message":"replacement transaction underpriced"}"
Update: Clearing mempool by restarting the RPC nodes. Error message above happens when the transaction is run twice with the same nonce (from same node, or different node)
Nov 7, 2021
Several BTC index node out of sync issue
- Indexer software is open source, it auto resolved after 10 mins
- Soph resolved monitoring bug - when auto resolves, Uptime Robot report to PD now auto-closes
Beacon out of sync issue.
- Related to memory issue on shard nodes?
- Most likely to be caused by beacon Sync
Nov 8, 2021
-
Beacon out of sync issue.
-
Nodes unhealthy due to surge from VenomDAO release of Euphoria
Harmony nodes became unhealthy, Pocket looked fine
Nov 9, 2021
From Soph
http://api2.explorer.t.hmny.io:3000/v0/shard/0/block/number/0 not working
both api1/api2 were fixed after restarting them
Action Items
- Open GH issue – Memory leak on all shard chains (running on 4GB nodes) – may be from beacon sync service (@Jacky)
- Open GH issue – Error when transactions are sent with the same nonce – issue happened with the lower gas price failure – transaction pool logic, shouldn’t get stuck for all proceeding issue (@Jacky)
- Open GH issue – displaying fees burned per block in explorer (@Jacksteroo)
- Data Dog – currently in trial period, could replace Grafana