Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decode without using the entire input #61

Open
fasfsfgs opened this issue Dec 29, 2016 · 6 comments
Open

decode without using the entire input #61

fasfsfgs opened this issue Dec 29, 2016 · 6 comments

Comments

@fasfsfgs
Copy link

Not sure if this is a bug or not but I'd like to discuss how decode should behave when we feed multiple parts.

Example:

bencode.decode('i123ei123e', 'utf8'); // 123

The input has more data than that. When we use the decode function, it silently returns just the first decoded part. Is this intended?

I saw some cases where a single buffer they wanted to decode had multiple bencoded parts (it was some data received from the network they had no control over). And when they used this lib, it was silently missing parts. So they had to implement some logic to check if the decoded result was everything the input had or not. They managed to get everything working, but I'm wondering if the lib should give a heads up (exception or whatever) when this happens.

@themasch
Copy link
Sponsor Contributor

themasch commented Jan 2, 2017

At first: sorry for the delay. The holidays...

I see your point. Albeit that "i123ei123e" (AFAIK) is not valid bencode (since its actually two values), there are (at least) two ways we can handle that (except the current, 'ignore it' way that ist):

a) throw an Error when there are unconsumed bytes left in the buffer after "finishing" the decode.
b) provide a decode.resume() method that takes off where decode() stopped and just starts a new decode task starting at lastPosition + 1.

Are there any examples on how other parsers (JSON, YML, XML, other bencoders) handle this?
As far as I am aware most of them throw an error like this:
image

@fasfsfgs
Copy link
Author

fasfsfgs commented Jan 2, 2017

Sorry I don't have much experience with parsers and how they usually deal with this. I ended up using another lib that behaves pretty much like json parser - option a).

Since I'm dealing with chunks of data coming from a network connection, I had to expect chunks with incomplete bencode parts as well as chunks with multiple bencode parts and I had to try to decode them as I received them because the message saying that I could stop listening to the connection is actually in those bencoded parts.

So I did a method that received a buffer and returned two objects:

  • an array of successfully decoded parts
  • a buffer with the rest of the buffer received that didn't get decoded

Could you elaborate on how your potential resume method would be used?

Anyway, if it's not a common case, I think it's safe to just go with option a).
Only if you saw that a lot of your clients had a case like mine that I'd put more logic into the lib.
That's my 2c. Let me know if I can help you out with anything.

@themasch
Copy link
Sponsor Contributor

themasch commented Jan 13, 2017

in case we do (b), the parser would work as it currently does. There would be a bencode.decode.resume() function that checks if the internal cursor is at the end of the internal buffer (decode uses global state all over the place anyway) and if there are bytes left to be read it would just start another decode() on the remaining buffer.
repeat until everything is read.

const input = buffer_from_source()

let results = [ bencode.decode(input) ]
while(let more = bencode.decode.resume()) {
  results.push(more)
}

Maybe a "canResume" method could be of good use so you can actually check if theres more data before calling resume.

@fasfsfgs
Copy link
Author

Oh I understand better your idea now.
Well. If there is any way of getting at least other contributor's opinion, that would be great.
I'm really not sure what's the best decision here.

  1. ignore this issue
  2. stateful resume
  3. exception
  4. other

@themasch
Copy link
Sponsor Contributor

while strict parser should choose 3., you should currently be able to call decode.next() and get the next result. Thats pretty close to what a resume() would be.

Throwing an exception would also require a new major release since it would change the current behavior in these cases.

@jhermsmeier what do you think about a 2.0 branch that throws an exception and offers a decodeGreedy (name to be fixed) method that returns something clever that contains (or returns) all results. Like an iterator?

@jimmywarting
Copy link
Contributor

Like iterators!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants