|Reported by:||gregwilkins||Owned by:||gregwilkins|
From mailing list: I've come across several issues with load balancers and cometd:
1) A load balancer (or some other network failure) may hide the failure of a connect request. The result is the connection to the browser stays open, but no response is ever sent. The full browser timeout (300s) is request before the connect fails.
I think we need to implement the timeout method in the binds we do for connect (open tunnel).
But in order to set that timeout to a reasonable value, we need to know what that value is! Currently this is private config on the server.
I'd like to update the spec to allow the server to include a timeout value in it's advice field, so the client can use that (plus an expected network delay) to set a timeout.
Thus the advice from the server could potentially look like:
and the client could set a timeout a little bigger than 180000
but I'd also like individual clients to be able to specify shorter timeouts! Thus I'd like a client to be able to say:
and then it will set the bind timeoutSeconds to 30s
However, the client will need to advise the server of this, so I'd like to modify the spec so that if a client has a connectTimeout shorter than the advised timeout from the server, it can advise the server of that by sending advice in a connect request:
The expected network timeout is controlled by dojox.cometd.expectedNetworkDelay
I've got this implemented, but wanted to check that everybody thinks this makes sense before I updated the spec and checked the code in.
2) With a stupid load balancer, it is possible for a connect to go to 1 host and a subscribe/publish to go to another. The problem is that the subscribe/publish will receive an error like 402::unknown client, but we do nothing with this and everything just hangs as the connect is never woken up.
We need to decide how to handle bind errors, bind timeouts and error returns when sending messages.
At the very least, we should cancel the current connection and try to reconnect with backoff.
We could also provide the failed messages in an local event so that a client can decide if they should be resent after a renegotiated connection.