Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting sockets to non-existing endpoints does not work #73

Open
TheButlah opened this issue Oct 4, 2020 · 4 comments
Open

Connecting sockets to non-existing endpoints does not work #73

TheButlah opened this issue Oct 4, 2020 · 4 comments

Comments

@TheButlah
Copy link
Contributor

In ZMQ, you can connect to an endpoint before any socket has actually bound to that endpoint. That lets you avoid all sorts of ordering issues with sockets. Currently, at least the SubSocket will fail to connect to non-existent endpoints. We should instead have the sockets compliant with ZMTP:

From https://rfc.zeromq.org/spec/23/ on general socket semantics:

All sockets SHALL establish connections opportunistically, that is: they connect to an endpoint asynchronously, and if the connection is broken, SHOULD reconnect after a suitable delay.

From https://rfc.zeromq.org/spec/29/ on Subscriber semantics:

SHALL create a queue when initiating an outgoing connection to a publisher, and SHALL maintain the queue whether or not the connection is established.

@Alexei-Kornienko
Copy link
Collaborator

This is the tough one.

Should connect call block until a connection is finally established? Talking into account zmq specs it seems that it should just prepare all the internal queues and return immediately. However it makes it really hard to return appropriate errors to user in case of any issues. For example user tries to connect to wrong address with no sockets ever listening. How he is supposed to get an error/timeout from the socket? Should there be some kind of backoff between errors? IMHO all of this needs to be carefully investigated before fixing this issue

@TheButlah
Copy link
Contributor Author

I think modeling it after how libzmq does things, but making things idiomatic for Rust and taking care to make error handling idiomatic, is the right approach. I'm unsure of exactly what that would entail and thats easier said than done, but I'll start by enumerating some relevant details I can dig up from libzmq, which may help us figure out what our approach should be.

From http://api.zeromq.org/4-3:zmq-connect

for most transports and socket types the connection is not performed immediately but as needed by ØMQ. Thus a successful call to zmq_connect() does not mean that the connection was or could actually be established. Because of this, for most transports and socket types the order in which a server socket is bound and a client socket is connected to it does not matter. The ZMQ_PAIR sockets are an exception, as they do not automatically reconnect to endpoints.

From that same link:

following a zmq_connect(), for socket types except for ZMQ_ROUTER, the socket enters its normal ready state. By contrast, following a zmq_bind() alone, the socket enters a mute state in which the socket blocks or drops messages according to the socket type, as defined in zmq_socket(3). A ZMQ_ROUTER socket enters its normal ready state for a specific peer only when handshaking is complete for that peer, which may take an arbitrary time.

zmq_socket_monitor sets up a ZMQ_PAIR socket with the inproc transport that lets you monitor the asynchronous socket events.

@TheButlah
Copy link
Contributor Author

TheButlah commented Oct 4, 2020

For example user tries to connect to wrong address with no sockets ever listening. How he is supposed to get an error/timeout from the socket?

I'd say thats a feature, not a bug. The user can stay connected to a non-existent endpoint and their messages will continue to get queued. Connection, reconnection, etc are handled asynchronously without any effort - which means there aren't really any errors to send back to the caller of connect(). The only time something is really an "error" per se is when the queue fills up - which is controlled by the high water mark and is specific to the particular socket type's semantics. Most commonly at this point the socket goes into a mute state, and depending on the socket type and settings newly queued messages are either dropped or the call blocks.

In this way of viewing things, any asynchronous errors are really just internal to our codebase, and shouldn't impact the user of the library. Other than figuring out how the user interacts with the mute state of the socket, I'm not sure that any such error-handling for the asynchronous connection, disconnection, or reconnection would ever need to be exposed in the public api, with the exception of rust equivalent of zmq_socket_monitor (either as an inproc PAIR socket like in the C api, or as a rust channel of ZmqSocketEvents). That monitor would be mostly unnecessary for the broad majority of cases however, and we probably don't need to worry about implementing that in a user-facing way until a while from now (although perhaps we will find that we need it internally, in order to properly address this current issue).

@TheButlah
Copy link
Contributor Author

TheButlah commented Oct 4, 2020

To make my thoughts a bit more explicit, here's what I think we should plan for:

  • Errors that don't rely on the existence of the remote socket can be returned immediately to the caller via Result (for example, attempting to connect a SubSocket multiple times to the same endpoint, or an invalid syntax for the endpoint, incompatible transport for the socket type, etc)
  • The connection, disconnection, and re-connection to the remote socket will be managed by an asynchronous task, and the user won't generally get any control or information about it. ZMQ should handle all of that for them.
  • The user will always have their messages queued, unless the socket is in the mute state. The ZMQ RFCs define the semantics of when a socket goes into the mute state and what should happen to newly queued messages when in the mute state.
  • We may find that we need a channel we can listen to for the socket events happening in the asynchronous task (connection, disconnection, message received, etc). Initially this will probably be only for internal testing purposes as we will want our tests to have the ability to see the events in the async task. Later we may wish to expose a receiver for this channel in the public api, either as a ZMQ pair socket or as a regular rust channel.

Alexei-Kornienko added a commit that referenced this issue Jan 7, 2021
This enables us to connect socket to server that doesn't exist yet.
Alexei-Kornienko added a commit that referenced this issue Jan 7, 2021
This enables us to connect socket to server that doesn't exist yet.
Alexei-Kornienko added a commit that referenced this issue Jan 8, 2021
This enables us to connect socket to server that doesn't exist yet.
Alexei-Kornienko added a commit that referenced this issue Jan 8, 2021
Implement connect_forever in util. Related to #73
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants