Skip to content

KxSystems/protobufkdb

Repository files navigation

protobufkdb

GitHub release (latest by date) Travis (.org) branch

This interface allows kdb+ users to parse data which has been encoded using Google Protocol Buffers (protobuf) into kdb+ according to the proto schema and serialise it back to the encoded wire format. The interface utilises the libprotobuf descriptor and reflection C++ APIs.

This is part of the Fusion for kdb+ interface collection.

New to kdb+ ?

Kdb+ is the world’s fastest timeseries database, optimized for ingesting, analyzing and storing massive amounts of structured data. To get started with kdb+, visit https://code.kx.com/q/learn/ for downloads and developer information. For general information, visit https://kx.com/

New to Protocol Buffers ?

Protocol Buffers (Protobuf) is a language-neutral, platform-neutral, extensible mechanism for serializing structured data. It is used both in the development of programs which are required to communicate over the wire or for data storage. Developed originally for internal use by Google, it is released under an open source BSD license. The core principle behind this data format is to allow a user to define the expected structure of their data and incorporate this within specially generated source code. This allows a user to easily read and write structured data to and from a variety of languages.

🌐 Protobuf documentation

Importing protobuf schema files

Protobuf messages are defined in a .proto schema file and these message definitions must be imported into the interface in order for it to be able to create messages of those types. The interface supports two ways to do this (or a combination of both) but the method used will impact how protobufkdb should be installed.

1. Compiling in the generated message definitions

Normally the Protocol Buffers compiler is used to generate source code from the .proto schema files which is then compiled in to the binary:

protoc compiles your message definitions based on a file defined as:

<schema>.proto

producing both a C++ source and header file defined as:

<schema>.pb.cc
<schema>.pb.h

These files contain the classes and metadata which describe the schema and the functionality required to serialize to and parse from this schema.

This mechanism is more performant but does require that protobufkdb be built from source since the binary needs to be rebuilt to change the statically available messages.

2. Dynamically importing the message definitions at runtime

To provide greater flexibility and usability it is also possible to dynamically import a .proto schema file at runtime from within the q session. Imported message definitions can then be used subsequently by the interface and behave similarly to compiled in ones (the import procedure leverages the same functionality as used by the protobuf compiler).

If only dynamically imported message definitions are required then the packaged installation of protobufkdb can be used. However, importing message definitions is less performant - in addition to the one-off import cost, there is also an overhead from the subsequent use of these dynamically created messages (approx. 10% for parsing, 20% for serializing). Alternatively a hybrid approach can be employed where dynamic messages are used during development until the schemas are finalized, at which point they are compiled into the interface.

Installation

Requirements

  • kdb+ ≥ 3.5 64-bit (Linux/macOS/Windows)
  • protobuf ≥ 3.0 (recommended 1)
  • C++11 or later 2
  • CMake ≥ 3.1 2

Installing a release

The protobufkdb releases are linked statically against libprotobuf to avoid potential C++ ABI compatibility issues with different versions of libprotobuf. Therefore it is unnecessary to install protobuf separately when used a packaged release.

  1. Download a release from here

  2. Install required q executable script q/protobufkdb.q and binary file lib/protobufkdb.(so|dll) to $QHOME and $QHOME/[mlw](64), by executing the following from the Release directory

    ## Linux/MacOS
    chmod +x install.sh && ./install.sh
    
    ## Windows
    install.bat
  3. To use the KdbTypeSpecifier field option (described below) with dynamic messages then the directory containing kdb_type_specifier.proto must be specified to the interface as an import search location. In the release package kdb_type_specifier.proto (and its dependencies) are found in the proto subdirectory. Import paths can be relative or absolute. For example, if the q session is started from the root of the release package run:

    .protobufkdb.addProtoImportPath["proto"]
    

Building and installing from source

Third-party library installation

Protobufkdb requires the full Protocol Buffers runtime (protoc compiler, libprotobuf and its header files) to be installed on your system. Many packaged installations only contain a subset of the required functionality or use an incompatible build. Furthermore, version mismatches can occur between protoc and libprotobuf if a new installation is applied on top of an existing one.

It is therefore recommend that the protocol buffer runtime is built from source and installed to a non-system directory. This directory can then be specified to the protobufkdb build so it will use that Protocol Buffers installation in preference to any existing system installs.

Building Protocol Buffers - Linux/macOS

The tools required to build Protocol Buffers from source on Linux/macOS are described here.

However, do not build Protocol Buffers using Google's configure script, since that will create a debug version of libprotobuf.a which protobufkdb links against. Rather, follow the instructions below to build Protocol Buffers using CMake with the correct compiler flags and install it to a non-system directory.

Clone the Protocol Buffers source from GitHub:

git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf

Create an install directory and set an environment variable to this directory (this is used again later when building protobufkdb):

mkdir install
export PROTOBUF_INSTALL=$(pwd)/install

Create the CMake build directory and generate the build files, specifying position independent code (otherwise symbol relocation errors will occur during linking of protobufkdb):

mkdir cmake/build
cd cmake/build
cmake -DCMAKE_BUILD_TYPE=Release -Dprotobuf_BUILD_TESTS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_INSTALL_PREFIX=$PROTOBUF_INSTALL ..

Finally build and install Protocol Buffers:

cmake --build . --config Release
cmake --build . --config Release --target install

Building Protocol Buffers - Windows

The tools required to build Protocol Buffers from source on Windows are described here and details on how to setup your environment to build with VS2019 are here. Then follow the below instructions to build a Release version Protocol Buffers and install it to a non-system directory.

From a Visual Studio command prompt, clone the Protocol Buffers source from github:

C:\Git> git clone https://github.com/protocolbuffers/protobuf.git
C:\Git> cd protobuf

Create an install directory and set an environment variable to this directory (substituting the correct absolute path as appropriate). This environment variable is used again later when building protobufkdb:

C:\Git\protobuf> mkdir install
C:\Git\protobuf> set PROTOBUF_INSTALL=C:\Git\protobuf\install

Create the CMake build directory (note that if you also wish to build a Debug version of Protocol Buffers then a second CMake build directory is required):

C:\Git\protobuf> mkdir cmake\release_build
C:\Git\protobuf> cd cmake\release_build

Generate the build files (this will default to using the Visual Studio CMake generator when run from a VS command prompt):

C:\Git\protobuf\cmake\release_build> cmake -Dprotobuf_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX=%PROTOBUF_INSTALL% ..

Finally build and install Protocol Buffers:

C:\Git\protobuf\cmake\release_build> cmake --build . --config Release
C:\Git\protobuf\cmake\release_build> cmake --build . --config Release --target install

Add the protobuf schema files to the build procedure

Protobufkdb uses a factory to create a message class object of the correct type from the message type string passed from kdb. The lookup requires that the message type string passed from kdb is the same as the message name in its .proto definition.

In order to populate the factory, the .proto files for all messages to be serialised/parsed must be incorporated into the build as follows:

  1. Place the new <schema>.proto file into the src/ subdirectory

  2. Edit src/CMakeLists.txt file, adding the new .proto file to the line below the following comment:

    # ### GENERATE PROTO FILES ###

    For example, to add examples.proto (which is already present in the src/ subdirectory), in addition to the existing tests.proto, change:

    set(MY_PROTO_FILES tests.proto)

    to:

    set(MY_PROTO_FILES tests.proto examples.proto)

Note: MY_PROTO_FILES is a CMake-space separated list; do not wrap the list of .proto files in a string.

Building protobufkdb

A CMake script is provided to build protobufkdb. This uses the CMake functionality to locate the protobuf installation on your system. By setting the CMake environment variable CMAKE_PREFIX_PATH to the Protocol Buffers installation directory created above when building protobuf from source, CMake will use this installation in preference to any existing system installs. This avoids issues with existing incompatible or mismatched protobuf installs.

From the root of this repository create and move into a directory in which to perform the build:

mkdir build && cd build

Generate the build scripts, specifying the protobuf buffers installation created above when building protobuf from source (referenced by the environment variable $PROTOBUF_INSTALL which should have been set during that procedure):

## Linux/MacOS
cmake -DCMAKE_PREFIX_PATH=$PROTOBUF_INSTALL ..

## Windows
cmake -DCMAKE_PREFIX_PATH=%PROTOBUF_INSTALL% ..

Start the build:

cmake --build . --config Release

Create the install package and deploy:

cmake --build . --config Release --target install

Note: By default src/CMakeLists.txt is configured to link statically against libprotobuf to avoid potential C++ ABI compatibility issues with different versions of libprotobuf. This is a particular issue on Windows.

Build issues

Because the protobufkdb interface uses both the protoc compiler and the Protocol Buffers’ runtime, the versions of protoc, libprotobuf and its header files must be consistent and installed from the same build. Otherwise build errors can occur when compiling any of the proto-generated .pb.h or .pb.cc files. To help identify these problems the protobufkdb CMake scripts log the locations of the Protocol Buffers installation it has found. For example:

[build]$ cmake ..
 -- The CXX compiler identification is GNU 4.8.5
 -- Check for working CXX compiler: /usr/bin/c++
 -- Check for working CXX compiler: /usr/bin/c++ - works
 -- Detecting CXX compiler ABI info
 -- Detecting CXX compiler ABI info - done
 -- Detecting CXX compile features
 -- Detecting CXX compile features - done
 -- Generator : Unix Makefiles
 -- Build Tool : /usr/bin/gmake
 -- Proto files: tests.proto;examples.proto
 -- [ /usr/share/cmake3/Modules/FindProtobuf.cmake:321 ] Protobuf_USE_STATIC_LIBS = ON
 -- [ /usr/share/cmake3/Modules/FindProtobuf.cmake:455 ] requested version of Google Protobuf is
 -- [ /usr/share/cmake3/Modules/FindProtobuf.cmake:463 ] location of common.h: /usr/local/include/google/protobuf/stubs/common.h
 -- [ /usr/share/cmake3/Modules/FindProtobuf.cmake:481 ] /usr/local/include/google/protobuf/stubs/common.h reveals protobuf 3.7.1
 -- [ /usr/share/cmake3/Modules/FindProtobuf.cmake:495 ] /home/protobuf/install/bin/protoc reveals version 3.11.4
 -- Found Protobuf: /usr/local/lib/libprotobuf.a;-lpthread (found version "3.7.1")
 -- Configuring done
 -- Generating done
 -- Build files have been written to: /home/protobufkdb/build

indicates it found protoc version 3.11.4 at /home/protobuf/install/bin/protoc but version 3.7.1 of libprotobuf.a (and the headers) installed on the system under /usr/local/. This can occur if there was a conflicting packaged version of protobuf already on the system and will likely cause the protobufkdb build to fail.

The solution, as described above, is to build the Protocol Buffers runtime from source, install it to non-system directory then specify that directory when building protobufkdb.

Docker – Linux

A sample Docker file is provided in the docker_linux directory to create a Ubuntu 18.04 LTS environment (including downloading and building the Protocol Buffers runtime from source) before building and installing the kdb+ protobufkdb interface.

For Docker Windows, the PROTOBUFKDB_SOURCE and QHOME_LINUX directories are specified at the top of protobufkdb_build.bat, which sets up the environment specified in Dockerfile.build and invokes protobufkdb_build.sh to build the interface.

Status

The protobufkdb interface is provided here as a beta release under an Apache 2.0 license.

If you find issues with the interface or have feature requests, please raise an issue.

To contribute to this project, please follow the contribution guide.

Protocol Buffers is used under the terms of Google’s license:

Copyright 2008 Google Inc.  All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

    * Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
    * Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Code generated by the Protocol Buffer compiler is owned by the owner
of the input file used when generating it.  This code is not
standalone and requires a support library to be linked with it.  This
support library is itself covered by the above license.

Footnotes

  1. Protocol Buffers language version 3 (proto3) simplifies the protocol buffer language, both for ease of use and to make it available in a wider range of programming languages. However, schemas defined in proto2 should also be supported.

  2. Required when building from source 2