Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental support for BSON (de)serialization #1254

Closed
wants to merge 29 commits into from
Closed
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
f06c8fd
BSON: serialization of non-objects is not supported
julian-becker Sep 14, 2018
5f5836c
BSON: Support empty objects
julian-becker Sep 14, 2018
9a0dddc
BSON: Object with single boolean
julian-becker Sep 15, 2018
0c0f2e4
BSON: support doubles
julian-becker Sep 15, 2018
6c447de
BSON: Support objects with string members
julian-becker Sep 15, 2018
c5ef023
BSON: support objects with null members
julian-becker Sep 15, 2018
7ee361f
BSON: support objects with int32 members
julian-becker Sep 15, 2018
c0d8921
BSON: support objects with int64 members
julian-becker Sep 15, 2018
83b427a
BSON: unsigned integers
julian-becker Sep 15, 2018
5ce7d6b
BSON: support objects with objects as members
julian-becker Sep 15, 2018
120d1d7
BSON: test case for a more complex document
julian-becker Sep 15, 2018
cf485c2
BSON: Support for arrays
julian-becker Sep 15, 2018
df33a90
BSON: Bugfix for non-empty arrays
julian-becker Sep 15, 2018
763705c
Fix: Add missing `begin()` and `end()` member functions to `alt_string`
julian-becker Sep 24, 2018
bce4816
BSON: Added test case for the different input/output_adapters
julian-becker Sep 24, 2018
ef358ae
BSON: Fixed hangup in case of incomplete bson input and improved test…
julian-becker Sep 25, 2018
0a09db9
BSON: Extend `binary_reader::get_number` to be able to hanlde little …
julian-becker Sep 29, 2018
e8730e5
BSON: Reworked `binary_reader::get_bson_cstr()`
julian-becker Sep 29, 2018
81f4b34
BSON: Improved documentation and error handling/reporting
julian-becker Oct 7, 2018
062aeaf
BSON: Reworked the `binary_writer` such that it precomputes the size …
julian-becker Oct 7, 2018
df0f612
BSON: allow and discard values and object entries of type `value_t::d…
julian-becker Oct 7, 2018
5bccacd
BSON: throw json.exception.out_of_range.407 in case a value of type `…
julian-becker Oct 16, 2018
daa3ca8
BSON: Adjusted documentation of `binary_writer::to_bson()`
julian-becker Oct 16, 2018
978c3c4
BSON: throw `json.exception.out_of_range.409` in case a key to be ser…
julian-becker Oct 16, 2018
2a63869
Merge branch 'develop' of https://github.com/nlohmann/json into featu…
julian-becker Oct 17, 2018
8de10c5
BSON: Hopefully fixing ambiguity (on some compilers) to call to strin…
julian-becker Oct 17, 2018
5ba812d
BSON: fixed incorrect casting in unit-bson.cpp
julian-becker Oct 18, 2018
ad11b6c
BSON: Improved exception-related tests and report location of U+0000 …
julian-becker Oct 18, 2018
24a4142
BSON: Fixed array serialization by adding increasing integral names t…
julian-becker Oct 25, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion include/nlohmann/detail/exceptions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ json.exception.parse_error.109 | parse error: array index 'one' is not a number
json.exception.parse_error.110 | parse error at 1: cannot read 2 bytes from vector | When parsing CBOR or MessagePack, the byte vector ends before the complete value has been read.
json.exception.parse_error.112 | parse error at 1: error reading CBOR; last byte: 0xF8 | Not all types of CBOR or MessagePack are supported. This exception occurs if an unsupported byte was read.
json.exception.parse_error.113 | parse error at 2: expected a CBOR string; last byte: 0x98 | While parsing a map key, a value that is not a string has been read.
json.exception.parse_error.114 | parse error: Unsupported BSON record type 0x0F | The parsing of the corresponding BSON record type is not implemented (yet).

@note For an input with n bytes, 1 is the index of the first character and n+1
is the index of the terminating null byte or the end of file. This also
Expand Down Expand Up @@ -236,6 +237,7 @@ json.exception.type_error.313 | invalid value to unflatten | The @ref unflatten
json.exception.type_error.314 | only objects can be unflattened | The @ref unflatten function only works for an object whose keys are JSON Pointers.
json.exception.type_error.315 | values in object must be primitive | The @ref unflatten function only works for an object whose keys are JSON Pointers and whose values are primitive.
json.exception.type_error.316 | invalid UTF-8 byte at index 10: 0x7E | The @ref dump function only works with UTF-8 encoded strings; that is, if you assign a `std::string` to a JSON value, make sure it is UTF-8 encoded. |
json.exception.type_error.317 | JSON value cannot be serialized to requested format | The dynamic type of the object cannot be represented in the requested serialization format (e.g. a raw `true` or `null` JSON object cannot be serialized to BSON) |

@liveexample{The following code shows how a `type_error` exception can be
caught.,type_error}
Expand Down Expand Up @@ -278,8 +280,9 @@ json.exception.out_of_range.403 | key 'foo' not found | The provided key was not
json.exception.out_of_range.404 | unresolved reference token 'foo' | A reference token in a JSON Pointer could not be resolved.
json.exception.out_of_range.405 | JSON pointer has no parent | The JSON Patch operations 'remove' and 'add' can not be applied to the root element of the JSON value.
json.exception.out_of_range.406 | number overflow parsing '10E1000' | A parsed number could not be stored as without changing it to NaN or INF.
json.exception.out_of_range.407 | number overflow serializing '9223372036854775808' | UBJSON only supports integers numbers up to 9223372036854775807. |
json.exception.out_of_range.407 | number overflow serializing '9223372036854775808' | UBJSON and BSON only support integer numbers up to 9223372036854775807. |
json.exception.out_of_range.408 | excessive array size: 8658170730974374167 | The size (following `#`) of an UBJSON array or object exceeds the maximal capacity. |
json.exception.out_of_range.409 | BSON key cannot contain code point U+0000 | Key identifiers to be serialized to BSON cannot contain code point U+0000, since the key is stored as zero-terminated c-string |

@liveexample{The following code shows how an `out_of_range` exception can be
caught.,out_of_range}
Expand Down
214 changes: 212 additions & 2 deletions include/nlohmann/detail/input/binary_reader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@ class binary_reader
result = parse_ubjson_internal();
break;

case input_format_t::bson:
result = parse_bson_internal();
break;

// LCOV_EXCL_START
default:
assert(false);
Expand Down Expand Up @@ -121,6 +125,207 @@ class binary_reader
}

private:

/*!
@brief Parses a C-style string from the BSON input.
@param [out] result A reference to the string variable where the read string
is to be stored.
@return `true` if the \x00-byte indicating the end of the
string was encountered before the EOF.
`false` indicates an unexpected EOF.
*/
bool get_bson_cstr(string_t& result)
{
auto out = std::back_inserter(result);
while (true)
{
get();
if (JSON_UNLIKELY(not unexpect_eof(input_format_t::bson, "cstring")))
{
return false;
}
if (current == 0x00)
{
return true;
}
*out++ = static_cast<char>(current);
}

return true;
}

/*!
@brief Parses a zero-terminated string of length @a len from the BSON input.
@param [in] len The length (including the zero-byte at the end) of the string to be read.
@param [out] result A reference to the string variable where the read string
is to be stored.
@tparam NumberType The type of the length @a len
@pre len > 0
@return `true` if the string was successfully parsed
*/
template <typename NumberType>
bool get_bson_string(const NumberType len, string_t& result)
{
return get_string(input_format_t::bson, len - static_cast<NumberType>(1), result)
&& get() != std::char_traits<char>::eof();
}

/*!
@return A hexadecimal string representation of the given @a byte
@param byte The byte to convert to a string
*/
static std::string byte_hexstring(unsigned char byte)
{
char cr[3];
snprintf(cr, sizeof(cr), "%02hhX", byte);
return std::string{cr};
}

/*!
@brief Read a BSON document element of the given @a element_type.
@param element_type The BSON element type, c.f. http://bsonspec.org/spec.html
@param element_type_parse_position The position in the input stream, where the `element_type` was read.
@warning Not all BSON element types are supported yet. An unsupported @a element_type will
give rise to a parse_error.114: Unsupported BSON record type 0x...
@return whether a valid BSON-object/array was passed to the SAX parser
*/
bool parse_bson_element_internal(int element_type, std::size_t element_type_parse_position)
{
switch (element_type)
{
case 0x01: // double
{
double number;
return get_number<double, true>(input_format_t::bson, number)
&& sax->number_float(static_cast<number_float_t>(number), "");
}
case 0x02: // string
{
std::int32_t len;
string_t value;
return get_number<std::int32_t, true>(input_format_t::bson, len)
&& get_bson_string(len, value)
&& sax->string(value);
}
case 0x08: // boolean
{
return sax->boolean(static_cast<bool>(get()));
}
case 0x10: // int32
{
std::int32_t value;
return get_number<std::int32_t, true>(input_format_t::bson, value)
&& sax->number_integer(static_cast<std::int32_t>(value));
}
case 0x12: // int64
{
std::int64_t value;
return get_number<std::int64_t, true>(input_format_t::bson, value)
&& sax->number_integer(static_cast<std::int64_t>(value));
}
case 0x0A: // null
{
return sax->null();
}
case 0x03: // object
{
return parse_bson_internal();
}
case 0x04: // array
{
return parse_bson_array();
}
default: // anything else not supported (yet)
{
auto element_type_str = byte_hexstring(element_type);
return sax->parse_error(element_type_parse_position, element_type_str, parse_error::create(114, element_type_parse_position, "Unsupported BSON record type 0x" + element_type_str));
}
}
}

/*!
@brief Read a BSON element list (as specified in the BSON-spec) from the input
and passes it to the SAX-parser.
The same binary layout is used for objects and arrays, hence it must
be indicated with the argument @a is_array which one is expected
(true --> array, false --> object).
@param is_array Determines if the element list being read is to be treated as
an object (@a is_array == false), or as an array (@a is_array == true).
@return whether a valid BSON-object/array was passed to the SAX parser
*/
bool parse_bson_element_list(bool is_array)
{
while (auto element_type = get())
{
if (JSON_UNLIKELY(not unexpect_eof(input_format_t::bson, "element list")))
{
return false;
}

const std::size_t element_type_parse_position = chars_read;
string_t key;
if (JSON_UNLIKELY(not get_bson_cstr(key)))
{
return false;
}

if (!is_array)
{
sax->key(key);
}

if (JSON_UNLIKELY(not parse_bson_element_internal(element_type, element_type_parse_position)))
{
return false;
}
julian-becker marked this conversation as resolved.
Show resolved Hide resolved
}
return true;
}

/*!
@brief Reads an array from the BSON input and passes it to the SAX-parser.
@return whether a valid BSON-array was passed to the SAX parser
*/
bool parse_bson_array()
{
std::int32_t documentSize;
get_number<std::int32_t, true>(input_format_t::bson, documentSize);

if (JSON_UNLIKELY(not sax->start_array(-1)))
{
return false;
}

if (JSON_UNLIKELY(not parse_bson_element_list(/*is_array*/true)))
{
return false;
}

return sax->end_array();
}

/*!
@brief Reads in a BSON-object and pass it to the SAX-parser.
@return whether a valid BSON-value was passed to the SAX parser
*/
bool parse_bson_internal()
{
std::int32_t documentSize;
get_number<std::int32_t, true>(input_format_t::bson, documentSize);

if (JSON_UNLIKELY(not sax->start_object(-1)))
{
return false;
}

if (JSON_UNLIKELY(not parse_bson_element_list(/*is_array*/false)))
{
return false;
}

return sax->end_object();
}

/*!
@param[in] get_char whether a new character should be retrieved from the
input (true, default) or whether the last read
Expand Down Expand Up @@ -875,7 +1080,7 @@ class binary_reader
bytes in CBOR, MessagePack, and UBJSON are stored in network order
(big endian) and therefore need reordering on little endian systems.
*/
template<typename NumberType>
template<typename NumberType, bool InputIsLittleEndian = false>
bool get_number(const input_format_t format, NumberType& result)
{
// step 1: read input into array with system's byte order
Expand All @@ -889,7 +1094,7 @@ class binary_reader
}

// reverse byte order prior to conversion if necessary
if (is_little_endian)
if (is_little_endian && !InputIsLittleEndian)
{
vec[sizeof(NumberType) - i - 1] = static_cast<uint8_t>(current);
}
Expand All @@ -904,6 +1109,7 @@ class binary_reader
return true;
}


/*!
@brief create a string by reading characters from the input

Expand Down Expand Up @@ -1715,6 +1921,10 @@ class binary_reader
error_msg += "UBJSON";
break;

case input_format_t::bson:
error_msg += "BSON";
break;

// LCOV_EXCL_START
default:
assert(false);
Expand Down
2 changes: 1 addition & 1 deletion include/nlohmann/detail/input/input_adapters.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ namespace nlohmann
namespace detail
{
/// the supported input formats
enum class input_format_t { json, cbor, msgpack, ubjson };
enum class input_format_t { json, cbor, msgpack, ubjson, bson };

////////////////////
// input adapters //
Expand Down
Loading