Skip to content

PHP library for iteratively encoding large JSON documents piece by piece

License

Notifications You must be signed in to change notification settings

violet-php/streaming-json-encoder

Repository files navigation

Streaming JSON Encoder

Streaming JSON Encoder is a PHP library that provides a set of classes to help with encoding JSON in a streaming manner, i.e. allowing you to encode the JSON document bit by bit rather than encoding the whole document at once. Compared to the built in json_encode function, there are two main advantages:

  • You will not need to load the entire data set into memory, as the encoder supports iterating over both arrays and any kind of iterators, like generators, for example.
  • You will not need to load the entire resulting JSON document into the memory, since the JSON document will be encoded value by value and it's possible to output the encoded document piece by piece.

In other words, the Streaming JSON Encoder can provide the greatest benefit when you need to handle large data sets that may otherwise take up too much memory to process.

In order to increase interoperability, the library also provides a PSR-7 compatible stream to use with frameworks and HTTP requests.

The API documentation is available at: http://violet.riimu.net/api/streaming-json-encoder/

CI Scrutinizer codecov Packagist

Requirements

  • The minimum supported PHP version is 5.6
  • The library depends on the following external PHP libraries:

Installation

Installation with Composer

The easiest way to install this library is to use Composer to handle your dependencies. In order to install this library via Composer, simply follow these two steps:

  1. Acquire the composer.phar by running the Composer Command-line installation in your project root.

  2. Once you have run the installation script, you should have the composer.phar file in you project root and you can run the following command:

    php composer.phar require "violet/streaming-json-encoder:^1.1"
    

After installing this library via Composer, you can load the library by including the vendor/autoload.php file that was generated by Composer during the installation.

Adding the library as a dependency

If you are already familiar with how to use Composer, you may alternatively add the library as a dependency by adding the following composer.json file to your project and running the composer install command:

{
    "require": {
        "violet/streaming-json-encoder": "^1.1"
    }
}

Manual installation

If you do not wish to use Composer to load the library, you may also download the library manually by downloading the latest release and extracting the src folder to your project. You may then include the provided src/autoload.php file to load the library classes.

Please note that using Composer will also automatically download the other required PHP libraries. If you install this library manually, you will also need to make those other required libraries available.

Usage

This library offers 3 main different ways to use the library via the classes BufferJsonEncoder, StreamJsonEncoder and the PSR-7 compatible stream JsonStream.

Using BufferJsonEncoder

The buffer encoder is most useful when you need to generate the JSON document in a way that does not involve passing callbacks to handle the generated JSON.

The easiest way to use the BufferJsonEncoder is to instantiate it with the JSON value to encode and call the encode() method to return the entire output as a string:

<?php

require 'vendor/autoload.php';

$encoder = new \Violet\StreamingJsonEncoder\BufferJsonEncoder(['array_value']);
echo $encoder->encode();

The most useful way to use this encoder, however, is to use it as an iterator. As the encoder implements the Iterator interface, you can simply loop over the generated JSON with a foreach loop:

<?php

require 'vendor/autoload.php';

$encoder = new \Violet\StreamingJsonEncoder\BufferJsonEncoder(range(0, 10));

foreach ($encoder as $string) {
    echo $string;
}

It's also worth noting that the encoder supports iterators for values. What's more, any closure passed to the encoder will also be called and the return value used as the value instead. The previous example could also be written as:

<?php

require 'vendor/autoload.php';

$encoder = new \Violet\StreamingJsonEncoder\BufferJsonEncoder(function () {
    for ($i = 0; $i <= 10; $i++) {
        yield $i;
    }
});

foreach ($encoder as $string) {
    echo $string;
}

As a side note, the encoder will respect the JsonSerializable interface as well and will call the jsonSerialize for objects that implement the interface.

Using StreamJsonEncoder

The stream encoder works very similarly to the BufferJsonEncoder as they extend the same abstract class. However, the key difference is in how they handle passing the JSON output.

The StreamJsonEncoder accepts a callable as the second constructor argument. Whenever JSON needs to be outputted, this callable is called with two arguments, the actual string to output and the type of the token to output (which is one of the JsonToken constants).

If no callable is passed, the StreamJsonEncoder will simply output the JSON using an echo statement. For example:

<?php

require 'vendor/autoload.php';

$encoder = new \Violet\StreamingJsonEncoder\StreamJsonEncoder(['array_value']);
$encoder->encode();

The encode() method in StreamJsonEncoder returns the total number of bytes it passed to the output. This encoder makes it convenient, for example, to write the JSON to file in a streaming manner. For example:

<?php

require 'vendor/autoload.php';

$fp = fopen('test.json', 'wb');
$encoder = new \Violet\StreamingJsonEncoder\StreamJsonEncoder(
    range(1, 100),
    function ($json) use ($fp) {
        fwrite($fp, $json);
    }
);

$encoder->encode();
fclose($fp);

Using JsonStream

The stream class provides a PSR-7 compatible StreamInterface for streaming JSON content. It actually uses the BufferJsonEncoder to do the hard work and simply wraps the calls in a stream like fashion.

The constructor of JsonStream either accepts a value to encode as JSON or an instance of BufferJsonEncoder (which allows you to set the encoding options). You can then operate on the stream using the methods provided by the PSR-7 interface. For example:

<?php

require 'vendor/autoload.php';

$iterator = function () {
    foreach (new DirectoryIterator(__DIR__) as $file) {
        yield $file->getFilename();
    }
};

$encoder = (new \Violet\StreamingJsonEncoder\BufferJsonEncoder($iterator))
    ->setOptions(JSON_PRETTY_PRINT);

$stream = new \Violet\StreamingJsonEncoder\JsonStream($encoder);

while (!$stream->eof()) {
    echo $stream->read(1024 * 8);
}

For more information about PSR-7 streams, please refer to the PSR-7 documentation.

How the encoder resolves values

In many ways, the Streaming JSON encoder is intended to work mostly as a drop in replacement for json_encode(). However, since the encoder is intended to deal with large data sets, there are some notable differences in how it handles objects and arrays.

First, to determine how to encode an object, the encoder will attempt to resolve the object values in following ways:

  • For any object that implements JsonSerializable the implemented method jsonSerialize() is called and return value is used instead.
  • Any Closure will be invoked and the return value will be used instead. However, no other invokables are called in this manner.

The returned value is looped until it cannot be resolved further. After that, a decision is made on whether the array or object is encoded as an array or as an object. The following logic is used:

  • Any empty array or array that has keys from 0 to n-1 in that order are encoded as JSON arrays. All other arrays are encoded as JSON objects.
  • If an object implements Traversable and it either returns an interger 0 as the first key or returns no values at all, it will be encoded as a JSON array (regardless of other keys). All other objects implementing Traversable are encoded as JSON objects.
  • Any other object, whether it's empty or whatever keys it mey have, is encoded as a JSON object.

Note, however, that if the JSON encoding option JSON_FORCE_OBJECT is used, all objects and arrays are encoded as JSON objects.

Note that all objects are traversed via a foreach statement. This means that all Traversable objects are encoded using the values returned by the iterator. For other objects, this means that the public properties are used (as per default iteration behavior).

All other values (i.e. nulls, booleans, numbers and strings) are treated exactly the same way as json_encode() does (and in fact, it's used to encode those values).

JSON encoding options

Both BufferJsonEncoder and StreamJsonEncoder have a method setOptions() to change the JSON encoding options. The accepted options are the same as those accepted by json_encode() function. The encoder still internally uses the json_encode() method to encode values other than arrays or object. A few options also have additional effects on the encoders:

  • Using JSON_FORCE_OBJECT will force all arrays and objects to be encoded as JSON objects similar to json_encode().
  • Using JSON_PRETTY_PRINT causes the encoder to output whitespace to in order to make the output more readable. The used indentation can be changed using the method setIndent() which accepts either a string argument to use as the indent or an integer to indicate the number of spaces.
  • Using JSON_PARTIAL_OUTPUT_ON_ERROR will cause the encoder to continue the output despite encoding errors. Otherwise the encoding will halt and the encoder will throw an EncodingException.

Credits

This library is Copyright (c) 2017-2022 Riikka Kalliomäki.

See LICENSE for license and copying information.