Nodes lose devices after updating Appium to the latest version

Hello, guys, This issue is quite complicated, so I don’t really expect it to be solved, but I will appreciate any help.

So, we’ve been executing our tests on CI server for a couple of years now, but lately we decided to update our appium. Note that we’d been using 1.6.4 beta version all this time. We’ve gone from it to 1.13 version, and we started getting occasional NullPointers in some tests, but that was solved by upgrading selenium-server from 2.2.3 to 3.14. And it seemed like we had our CI stable again.

But not so much. Now we have a single successful suite run, and then we begin to face issues. On the second run our devices usually become heavily desynchronized (we have parallel testing implemented, and we have a lot of tests where two devices must work simultaneously). And everything usually breaks completely on the third run. Tests execution is getting stuck, or tests are getting skipped with NullPointers as if the devices are not connected. All this time appium nodes keep on pinging the hub, and we have no error traces in the logs, but we also have no other activity on them. It’s like the nodes are suddenly losing the devices without any trace of what of why has happened. After we restart the hub, we have another successful run, and then we start facing issues again.

What do you think it could be? Is this the Appium or the Selenium issue?
Tests are being executed on real devices.