decode without using the entire input #61

fasfsfgs · 2016-12-29T13:23:29Z

Not sure if this is a bug or not but I'd like to discuss how decode should behave when we feed multiple parts.

Example:

bencode.decode('i123ei123e', 'utf8'); // 123

The input has more data than that. When we use the decode function, it silently returns just the first decoded part. Is this intended?

I saw some cases where a single buffer they wanted to decode had multiple bencoded parts (it was some data received from the network they had no control over). And when they used this lib, it was silently missing parts. So they had to implement some logic to check if the decoded result was everything the input had or not. They managed to get everything working, but I'm wondering if the lib should give a heads up (exception or whatever) when this happens.

themasch · 2017-01-02T07:29:25Z

At first: sorry for the delay. The holidays...

I see your point. Albeit that "i123ei123e" (AFAIK) is not valid bencode (since its actually two values), there are (at least) two ways we can handle that (except the current, 'ignore it' way that ist):

a) throw an Error when there are unconsumed bytes left in the buffer after "finishing" the decode.
b) provide a decode.resume() method that takes off where decode() stopped and just starts a new decode task starting at lastPosition + 1.

Are there any examples on how other parsers (JSON, YML, XML, other bencoders) handle this?
As far as I am aware most of them throw an error like this:

fasfsfgs · 2017-01-02T13:34:53Z

Sorry I don't have much experience with parsers and how they usually deal with this. I ended up using another lib that behaves pretty much like json parser - option a).

Since I'm dealing with chunks of data coming from a network connection, I had to expect chunks with incomplete bencode parts as well as chunks with multiple bencode parts and I had to try to decode them as I received them because the message saying that I could stop listening to the connection is actually in those bencoded parts.

So I did a method that received a buffer and returned two objects:

an array of successfully decoded parts
a buffer with the rest of the buffer received that didn't get decoded

Could you elaborate on how your potential resume method would be used?

Anyway, if it's not a common case, I think it's safe to just go with option a).
Only if you saw that a lot of your clients had a case like mine that I'd put more logic into the lib.
That's my 2c. Let me know if I can help you out with anything.

themasch · 2017-01-13T08:45:13Z

in case we do (b), the parser would work as it currently does. There would be a bencode.decode.resume() function that checks if the internal cursor is at the end of the internal buffer (decode uses global state all over the place anyway) and if there are bytes left to be read it would just start another decode() on the remaining buffer.
repeat until everything is read.

const input = buffer_from_source()

let results = [ bencode.decode(input) ]
while(let more = bencode.decode.resume()) {
  results.push(more)
}

Maybe a "canResume" method could be of good use so you can actually check if theres more data before calling resume.

fasfsfgs · 2017-01-13T14:10:31Z

Oh I understand better your idea now.
Well. If there is any way of getting at least other contributor's opinion, that would be great.
I'm really not sure what's the best decision here.

ignore this issue
stateful resume
exception
other

themasch · 2017-01-13T15:50:35Z

while strict parser should choose 3., you should currently be able to call decode.next() and get the next result. Thats pretty close to what a resume() would be.

Throwing an exception would also require a new major release since it would change the current behavior in these cases.

@jhermsmeier what do you think about a 2.0 branch that throws an exception and offers a decodeGreedy (name to be fixed) method that returns something clever that contains (or returns) all results. Like an iterator?

jimmywarting · 2018-10-10T09:32:14Z

Like iterators!

themasch added the enhancement label Jan 2, 2017

ckcr4lyf mentioned this issue Apr 1, 2024

Correctly set the value of decode.bytes to bytes used. #179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decode without using the entire input #61

decode without using the entire input #61

fasfsfgs commented Dec 29, 2016

themasch commented Jan 2, 2017

fasfsfgs commented Jan 2, 2017 •

edited

themasch commented Jan 13, 2017 •

edited

fasfsfgs commented Jan 13, 2017

themasch commented Jan 13, 2017

jimmywarting commented Oct 10, 2018

decode without using the entire input #61

decode without using the entire input #61

Comments

fasfsfgs commented Dec 29, 2016

themasch commented Jan 2, 2017

fasfsfgs commented Jan 2, 2017 • edited

themasch commented Jan 13, 2017 • edited

fasfsfgs commented Jan 13, 2017

themasch commented Jan 13, 2017

jimmywarting commented Oct 10, 2018

fasfsfgs commented Jan 2, 2017 •

edited

themasch commented Jan 13, 2017 •

edited