Cassandra.Tools

A Curated List of Apache Cassandra Dev Tools

About

ValuStor

sensaphone (149) Sensaphone/ValuStor (47)
Languages:C++, CQL, JSON
Templates:Cassandra
License:MIT

ValuStor

Summary

ValuStor is a key-value pair database solution originally designed as an alternative to memcached. It resolves a number of out-of-the-box limitations including lack of persistent storage, type-inflexibility, no direct redundancy or failover capabilities, poor scalability, and lack of TLS support. It can also be used for JSON document storage and asynchronous distributed messaging applications. It is an easy to use, single-file, header-only C++11-compatible project.

This project wraps abstracted client-side key-value-pair database operations around the Cassandra client driver using a simple API. It utilizes a ScyllaDB database backend.

See the usage guide for example applications.

See the language binding guide for instructions on how to use ValuStor with other languages. Examples include PHP, Python, and Perl.

Key Features

  • Single header-only implementation makes it easy to drop into C++ projects.
  • A optional backlog queues data in the event that the database is temporarily inaccessible.
  • Adaptive fault tolerance, consistency, and availability.
  • TLS support, including client authentication
  • Supports a variety of native C++ data types in the keys and values.

    • 8-, 16-, 32-, and 64-bit signed integers
    • single- and double-precision floating point numbers
    • booleans
    • strings
    • binary data (blobs)
    • UUID
    • JSON
  • Simple API: Only a single store() and a single retrieve() function are needed. There is no need to write database queries.
  • RAM-like database performance for most applications.
  • There is no need to batch read or write requests for performance.
  • There is no special client configuration required for redundancy, scalability, or multi-thread performance.

Configuration

Dependencies

This project requires a C++11 compatible compiler. This project has been tested with g++ 5.4.0.

The Cassandra C/C++ driver is required. See https://github.com/datastax/cpp-driver/releases This project has only been tested with version 2.7.1 and 2.8.1, but in principle it should work with other versions. Example installation:

# Prerequisites: e.g. apt-get install build-essential cmake automake libtool libssl-dev

wget https://github.com/libuv/libuv/archive/v1.20.0.tar.gz
tar xvfz v1.20.0.tar.gz
cd libuv-1.20.0/
./autogen.sh
./configure
make
make install

wget https://github.com/datastax/cpp-driver/archive/2.8.1.tar.gz
tar xvfz 2.8.1.tar.gz
cd cpp-driver-2.8.1
mkdir build
cd build
cmake ..
make
make install

If using g++, cassandra.h must be in the include path and the application must be linked with -L/path/to/libcassandra.so/ -lcassandra -lpthread.

An installation of either Cassandra or ScyllaDB is required. The latter is strongly recommended for this application due to its advantageous design decisions. ScyllaDB is incredibly easy to setup. This project has been tested with ScyllaDB v.2.x.

# Prerequisite: Install ScyllaDB

vi /etc/scylla/scylla.yaml
scylla_io_setup
service scylla-server start

TLS

Using TLS for encryption and authentication is highly recommended. It is not difficult to setup. See the instructions.

Database Setup

Configuration can use either a configuration file or setting the same configuration at runtime. See the API documentation. The only requirement is to set the following fields:

  table = <database>.<table>
  key_field = <key field>
  value_field = <value field>
  username = <username>
  password = <password>
  hosts = <ip_address_1>,<ip_address_2>,<ip_address_3>

The schema of a scylla table should be setup as follows:

  CREATE TABLE <database>.<table> (
    <key_field> bigint PRIMARY KEY,
    <value_field> text
  ) WITH compaction = {'class': 'SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'};

The following Cassandra data types (along with their C++ equivalent) are supported in the CREATE TABLE:

  • tinyint (int8_t)
  • smallint (int16_t)
  • int (int32_t)
  • bigint (int64_t)
  • float (float)
  • double (double)
  • boolean (bool)
  • varchar, text, and ascii (std::string and nlohmann::json)
  • blob (std::vector<uint8_t>)
  • uuid (CassUuid)

API

ValuStor is implemented as a template class using two constructors. See the usage documentation.

  template<typename Val_T, typename Key_T...> class ValuStor

  ValuStor::ValuStor(std::string config_file)
  ValuStor::ValuStor(std::map<std::string, std::string> configuration_kvp)

Both single and compound keys are supported.

The public API is very simple:

  ValuStor::Result store(Key_T... keys,
                         Val_T value,
                         uint32_t seconds_ttl = 0,
                         InsertMode_t insert_mode = ValuStor::DEFAULT_BACKLOG_MODE,
                         int64_t microseconds_since_epoch = 0)

  ValuStor::Result retrieve(Key_T... keys,
                            size_t key_count)

The optional seconds TTL is the number of seconds before the stored value expires in the database. Setting a value of 0 means the record will not expire. Setting a value of 1 is effectively a delete operation (after 1 second elapses).

The optional insert modes are ValuStor::DISALLOW_BACKLOG, ValuStor::ALLOW_BACKLOG, and ValuStor::USE_ONLY_BACKLOG. If the backlog is disabled, any failures will be permanent and there will be no further retries. If the backlog is enabled, failures will retry automatically until they are successful or the ValuStor object is deleted.

The optional microseconds since epoch can be specified to explicitly control which inserted records are considered to be current in the database. It is possible for rapidly inserted stores with the same key to get applied out-of-order. Specifying this explicity removes all ambiguity, but makes it especially important that store() calls from multiple clients use the same synchronized time source. The default value of 0 lets the database apply the timestamp automatically. If no timestamp is explicitly given, stores added to the backlog will use the timestamp of when they were added to the queue.

The optional key count is the number of keys to include in the WHERE clause of a value SELECT. A key count of 0 (the default) means all keys are used and at most only one record can be returned. If fewer than all the keys are used, the retrieval may return multiple records. While all keys must be specified as function parameters, if you are only using a subset of keys, the values of the unused keys are "don't care". NOTE: You must always specify a partition key completely, but you can leave out all or part of the clustering key.

The ValuStor::Result has the following data members:

  ErrorCode_t error_code
  std::string result_message
  Val_T data
  std::vector<std::pair<Val_T, std::tuple<Keys...>>> results;

Data for a single record (the default for store()) will be returned in Result::data. Data for multiple records are returned in the Result::results along with the keys associated with each record.

Requests that fail to commit changes to the database store will return an unsuccessful error code, unless the backlog mode is set to USE_ONLY_BACKLOG. If the backlog mode is set to ALLOW_BACKLOG, then the change will eventually be committed.

The ValuStor::ErrorCode_t is one of the following:

  ValuStor::VALUE_ERROR
  ValuStor::UNKNOWN_ERROR
  ValuStor::BIND_ERROR
  ValuStor::QUERY_ERROR
  ValuStor::CONSISTENCY_ERROR
  ValuStor::PREPARED_SELECT_FAILED
  ValuStor::PREPARED_INSERT_FAILED
  ValuStor::SESSION_FAILED
  ValuStor::SUCCESS
  ValuStor::NOT_FOUND

Usage

Writing code to use ValuStor is very easy. You only need a constructor and then call the store() and retrieve() functions in any combination. Connection management is automatic.

The following example shows the most basic key-value pair usage:

Code:

  #include "ValuStor.hpp"
  ...  

  // e.g. CREATE TABLE cache.values (key_field bigint, value_field text, PRIMARY KEY (key_field))
  //
  //                    <value>    <key>
  ValuStor::ValuStor<std::string, int64_t> store("example.conf");
  auto store_result = store.store(1234, "value");
  if(store_result){
    auto retrieve_result = store.retrieve(1234);
    if(retrieve_result){
      std::cout << 1234 << " => " << result.data << std::endl;
    }
  }

Output:

  1234 => value

You can use a file to load the configuration (as above) or specify the configuration in your code (as below). See the example config for more information.

The following example uses a compound key and a multi-select retrieval.

Code:

  #include "ValuStor.hpp"
  ...  
  
  // e.g. CREATE TABLE cache.values (k1 bigint, k2 bigint, v text, PRIMARY KEY (k1, k2))
  ValuStor::ValuStor<std::string, int64_t, int64_t> store({
        {"table", "cache.values"},
        {"key_field", "k1, k2"},
        {"value_field", "v"},
        {"username", "username"},
        {"password", "password"},
        {"hosts", "127.0.0.1"}
  });

  store.store(1234, 10, "first");
  store.store(1234, 20, "last");
  auto retrieve_result = store.retrieve(1234, -1/*don't care*/, 1);
  for(auto& pair : retrieve_result.results){
    std::cout << "{\"k1\":" << std::get<0>(pair.second)
              << ", \"k2\":" << std::get<1>(pair.second)
              << ", \"v\":\"" << pair.first << "\""
              << std::endl;
  }

Output:

  {"k1":1234, "k2":10, "v":"first"}
  {"k1":1234, "k2":20, "v":"last"}

UUID

This package has support for UUID using the CassUuid that comes with the Cassandra driver. You can create a UUID in a number of different ways:

ValuStor::ValuStor<std::string, CassUuid> store("example.conf");

// Load from a standard UUID string.
CassUuid uuid;
cass_uuid_from_string("550e8400-e29b-41d4-a716-446655440000", &uuid);
store.store(uuid, "<record>");

// Generate your own UUID
CassUuidGen* uuid_gen = cass_uuid_gen_new();
CassUuid record_uuid;
cass_uuid_gen_time(uuid_gen, record_uuid);
store.store(record_uuid, "<record>");

// Automatically generated by ValuStor
store.store(CassUuid{}, "<record>");

You can extract a retrieved CassUuid like this:

ValuStor::ValuStor<CassUuid, int64_t> store("example.conf");
auto result = store.retrieve(123456);
char uuid_str[CASS_UUID_STRING_LENGTH];
cass_uuid_string(result.data, uuid_str);
std::string uuid(uuid_str);

JSON

This package integrates with JSON for Modern C++ for easy document storage. The values will be serialized and deserialized automatically. The serialization uses strings and thus requires a cassandra text or varchar field. To use it, include the json header before the ValuStor header:

#include "nlohmann/json.hpp"
#include "ValuStor.hpp"
...
ValuStor::ValuStor<nlohmann::json, int64_t> valuestore("example.conf");

See the JSON document storage system example in the usage guide.

© 2020 Anant Corporation, All Rights Reserved. All logos, trademarks and registered trademarks are the property of their respective owners. .

© Netlify 2020