Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipv4 and ipv6 domains #3669

Merged
merged 1 commit into from
Feb 6, 2019
Merged

Conversation

Enmk
Copy link
Contributor

@Enmk Enmk commented Nov 26, 2018

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Added:

  • IDataTypeDomain interface;
  • method DataTypeFactory::registerDataTypeDomain for registering domains;
  • DataTypeDomainWithSimpleSerialization domain base class with simple serialization/deserialization;
  • Concrete IPv4 and IPv6 domain implementations: DataTypeDomanIPv6 and DataTypeDomanIPv4;

Updated:

  • IDataType text serialization/deserialization methods;
  • IDataType implementation to use domain for text serialization/deserialization;
  • Refactored implementation of the IPv4/IPv6 functions to use formatIPv4/v6 and parseIPv4/v6 from Common/formatIPv6.h;

Tests:

  • Added test cases for IPv4 and IPv6 domains.
  • Updated IPv4/v6 functions tests to validate more cases;
  • Added performance tests for IPv4 and IPv6 related functions;


} // namespace

void registerDataTypeDomainIPv4AndIPv5(DataTypeFactory & factory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPv5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

/** Text serialization for the CSV format.
*/
void serializeTextCSV(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings &) const override;
/** delimiter - the delimiter we expect when reading a string value that is not double-quoted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

void deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings &) const override;

/** Text serialization intended for using in JSON format.
* force_quoting_64bit_integers parameter forces to brace UInt64 and Int64 types into quotes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -128,6 +115,48 @@ void DataTypeFactory::registerSimpleDataType(const String & name, SimpleCreator
}, case_sensitiveness);
}

void DataTypeFactory::registerDataTypeDomain(const String & domain_name, const String & family_name, DataTypeDomainUniquePtr domain, CaseSensitiveness case_sensitiveness)
{
const auto& data_type_creator = findCreatorByName(family_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed too

const auto& data_type_creator = findCreatorByName(family_name);
all_domains.reserve(all_domains.size() + 1);

// We can't move the pointer to the Creator since creator is copied later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared_ptr is Ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified that piece of code, shared_ptr is not needed anymore.

struct FormatSettings;
class IColumn;

class IDataTypeDomain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

virtual void deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings &) const = 0;

/** Text serialization intended for using in JSON format.
* force_quoting_64bit_integers parameter forces to brace UInt64 and Int64 types into quotes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

*/
virtual void serializeTextCSV(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings &) const = 0;

/** delimiter - the delimiter we expect when reading a string value that is not double-quoted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@champtar
Copy link
Contributor

Hi @Enmk @alexey-milovidov is there any plans to have an IP datatype where you can have both IPv4 and IPv6 ? Right now I'm storing all IPs as a FixedString(16) so it's way simpler when querying

@Enmk Enmk force-pushed the ipv4_and_ipv6_domains branch from ec3298b to eed1527 Compare December 7, 2018 08:55
@Enmk Enmk force-pushed the ipv4_and_ipv6_domains branch from eed1527 to f19e284 Compare December 13, 2018 13:47
@Enmk Enmk force-pushed the ipv4_and_ipv6_domains branch 2 times, most recently from 31b74e8 to 21abf95 Compare December 28, 2018 11:43
@Enmk Enmk force-pushed the ipv4_and_ipv6_domains branch from 21abf95 to 4060285 Compare January 14, 2019 14:37
@Enmk
Copy link
Contributor Author

Enmk commented Jan 14, 2019

Performance comparison result.

Performance was measured on my machine against base version (without these modifications) using performance tests included in the PR. Negative numbers represent degraded performance, positive - improved, see compare.py for details.

Please note that I've observed +/-1% variance of the results from run to run, so I believe there numbers here are within +/-2%.

ipv4 (ALL CASES)

Queries: 444
Stats:
avg: -12.1179753049
min: -33.0634032212
max: 12.7247256601
stdev: 6.32970567083

ipv4 IPv4StringToNum

Queries: 100
Stats:
avg: -16.9865447285
min: -23.5455008443
max: -12.0256227406
stdev: 2.44602672946

ipv4 IPv4NumToString+IPv4StringToNum

Queries: 100
Stats:
avg: -9.63764803906
min: -21.3878474019
max: -5.97267144694
stdev: 1.66649394699

ipv4 IPv4NumToStringClassC+IPv4StringToNum

Queries: 100
Stats:
avg: -8.24220705267
min: -13.3396067461
max: -3.68402493364
stdev: 1.42471396986

ipv4 error

Queries: 44
Stats:
avg: -1.78364095218
min: -12.4710148976
max: 12.7247256601
stdev: 7.59229876574

ipv6 (ALL CASES)

Queries: 426
Stats:
avg: -5.88968274932
min: -22.8925432598
max: 10.2782631191
stdev: 5.88792824151

ipv6 IPv6StringToNum

Queries: 100
Stats:
avg: -15.0270332058
min: -22.8925432598
max: -10.0665889641
stdev: 2.67577871908

ipv6 IPv6NumToString+IPv6StringToNum

Queries: 100
Stats:
avg: -5.09307059004
min: -10.7523961077
max: -2.51185092654
stdev: 1.6369094322

ipv6 error

Queries: 26
Stats:
avg: -4.67112698211
min: -13.9460662489
max: 4.58204584396
stdev: 5.82377752186

ipv6 mapped

Queries: 200
Stats:
avg: -1.87772585045
min: -18.7335566474
max: 10.2782631191
stdev: 2.49766966381

@@ -0,0 +1,59 @@
DROP TABLE IF EXISTS ipv4_test;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In stateless tests tables should be created in test database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

{
const auto limit = IPV4_BINARY_LENGTH - zeroed_tail_bytes_count;

for (const auto i : ext::range(0, IPV4_BINARY_LENGTH))
for (const auto i : ext::range(0, limit))
{
UInt8 byte = (i < limit) ? src[i] : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless ternary operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless ternary operator

fixed

@Enmk Enmk force-pushed the ipv4_and_ipv6_domains branch 4 times, most recently from aaf70e8 to 9b7818a Compare January 25, 2019 09:26
@@ -136,4 +145,134 @@ void formatIPv6(const unsigned char * src, char *& dst, UInt8 zeroed_tail_bytes_
*dst++ = '\0';
}

//UInt32 parseIPv4_fast(const char * src)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this function?

Copy link
Contributor Author

@Enmk Enmk Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, forgot to remove refactoring leftovers from commit. It is gone now.

}
}
}

std::vector<String> formatQueries(const String & query, StringToVector substitutions_to_generate)
Queries formatQueries(const Query & query, const SubstitutionsMap& substitutions_to_generate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const SubstitutionsMap &

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted all changes that I've maid to PerformaceTest.cpp to reduce the scope of this PR.

@@ -46,18 +48,24 @@ static void printInteger(char *& out, T value)
}

/// print IPv4 address as %u.%u.%u.%u
static void formatIPv4(const unsigned char * src, char *& dst, UInt8 zeroed_tail_bytes_count)
static void doFormatIPv4(const unsigned char * src, char *& dst, UInt8 zeroed_tail_bytes_count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need UInt8 zeroed_tail_bytes_count ?
There is only only one invocation, and there zeroed_tail_bytes_count equals IPV4_BINARY_LENGTH.

Copy link
Contributor Author

@Enmk Enmk Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That particular function was used by formatIPv6, used by both CutIPv6 and IPv6NumToString. I've removed that implementation and unified it with the rest of the code, so formatIPv4 it is used by formatIPv6, IPv4NumToString and IPv4NumToStringClassC.

@@ -6,6 +6,8 @@ select IPv4StringToNum('127.0.0.1' as p) == (0x7f000001 as n), IPv4NumToString(n
select IPv4StringToNum(materialize('127.0.0.1') as p) == (materialize(0x7f000001) as n), IPv4NumToString(n) == p;
select IPv4NumToString(toUInt32(0)) == '0.0.0.0';
select IPv4NumToString(materialize(toUInt32(0))) == materialize('0.0.0.0');
select IPv4NumToString(toUInt32(0x7f000001)) == '127.0.0.1';
select IPv4NumToString(materialize(toUInt32(0x7f000001))) == materialize('127.0.0.1');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this test fails on my pc. Here is the diff https://gist.github.com/s-mx/83ed11ed435525470869cf2edc33da3d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed that

@Enmk Enmk force-pushed the ipv4_and_ipv6_domains branch from 9b7818a to 59910eb Compare February 4, 2019 09:32
Added:
 * IDataTypeDomain interface;
 * method DataTypeFactory::registerDataTypeDomain for registering domains;
 * DataTypeDomainWithSimpleSerialization domain base class with simple serialization/deserialization;
 * Concrete IPv4 and IPv6 domain implementations: DataTypeDomanIPv6 and DataTypeDomanIPv4;

Updated:
 * IDataType text serialization/deserialization methods;
 * IDataType implementation to use domain for text serialization/deserialization;
 * Refactored implementation of the IPv4/IPv6 functions to use formatIPv4/v6 and parseIPv4/v6 from Common/formatIPv6.h;

Tests:
 * Added test cases for IPv4 and IPv6 domains.
 * Updated IPv4/v6 functions tests to validate more cases;
 * Added performance tests for IPv4 and IPv6 related functions;
@The-Alchemist
Copy link
Contributor

The-Alchemist commented Mar 1, 2019

@Enmk : Out of curiosity, why is IPv6 represented using FixedString(16) instead of UInt128?

@KochetovNicolai : Would a PR changing it to UInt128 be considered?

@Enmk
Copy link
Contributor Author

Enmk commented Mar 2, 2019

@Enmk : Out of curiosity, why is IPv6 represented using FixedString(16) instead of UInt128?

Well, the major reason is compatibility. Existing IPv6-functions work with FixedString(16) and hence work with IPv6 domain too, also that allows for seamless comparing of old FixedString(16)-IPv6 column values with IPv6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants