Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make KafkaError more catch friendly and efficient #127

Open
blindspotbounty opened this issue Aug 29, 2023 · 6 comments
Open

Make KafkaError more catch friendly and efficient #127

blindspotbounty opened this issue Aug 29, 2023 · 6 comments

Comments

@blindspotbounty
Copy link
Collaborator

Currently, to catch rdkafka specific errors it is required to compare a string, for example:

do {
    try self.kafkaProducer.send(record)
    return
} catch let error as KafkaError where error.description.contains("one possible rdkafka error") {
// handle exception
} catch let error as KafkaError where error.description.contains("second possible rdkafka error") {
// handle exception
} catch let error as KafkaError where error.description.contains("third possible rdkafka error") {
// handle exception
} catch {
// handle unknown exception
}

It would be nice to have errors listed as public enum somewhere:

do {
    try self.kafkaProducer.send(record)
    return
} catch let error as KafkaError where error.code == .onePossibleRdKafkaError {
// handle exception
} catch let error as KafkaError where error.code == .secondPossibleRdKafkaError {
// handle exception
} catch let error as KafkaError where error.code == .thirdPossibleRdKafkaError {
// handle exception
} catch {
// handle unknown exception
}

Since it might be an often error: e.g. RD_KAFKA_RESP_ERR__QUEUE_FULL/RD_KAFKA_RESP_ERR_MSG_SIZE_TOO_LARGE/RD_KAFKA_RESP_ERR__UNKNOWN_PARTITION/RD_KAFKA_RESP_ERR__UNKNOWN_TOPIC/RD_KAFKA_RESP_ERR__FATAL/RD_KAFKA_RESP_ERR__STATE

@FranzBusch
Copy link
Contributor

The second part should already be possible. Why can't you write catch let error as KafkaError where error.code == .onePossibleRdKafkaError? The errors codes are defined as static lets and the ErrorCode is Equatable.

@blindspotbounty
Copy link
Collaborator Author

Hi Franz!

Thank you for looking into it. Yes, for sure we can, but there are some problems with it:

  1. All librdkafka errors are converted to the only one error, in particular .rdKafkaError (aka .underlying error)
  2. No original code propagated to KafkaError, only string description (which might not always required)

Other thoughts regarding errors for librdkafka that might be related to the original issue.
librdkafka has several attributes for errors that would be nice to attach to KafkaError:

  1. rd_kafka_error_is_fatal - important to restart the whole client (or whole application) due to broker/communication error
  2. rd_kafka_error_is_retriable - it could be due timeout, full queue etc and up to downstream application to decide what to do with it, I believe that it is important information
  3. (for future) rd_kafka_error_txn_requires_abort - this error means that transaction should be aborted. I don't think that this particular error should be exposed - it can be handled automatically within transactional api but report final state - transaction is aborted
  4. Would be nice to see original error code of librdkafka as a part of KafkaError (at least in some debug information)

Thanks in advance!

@FranzBusch
Copy link
Contributor

I think we should definitely take those properties of the rd kafka errors into consideration but I don't think we should expose the rd kafka error in a nicer way. At best you don't have to catch the error at all.

It would be great if you can open issues where you currently have to catch rd kafka errors and do logic on your side and we can investigate if we can do this better in our library.

@blindspotbounty
Copy link
Collaborator Author

Sure, let me bring an example here so far, if it make sense I can move it to a different issue.
Currently, if queue is full (which may happen time to time on bursts, especially with transactions), that error is being converted to string. We receive some error that contains rdKafkaErorr...Queue is full...
For sure we can have something like the following:

extension KafkaError {
    var queueFull: Bool {
        description.contains("rdKafkaError") && description.contains("Queue full")
    }
}

do {
    self.kafkaProducer.send(...)
} catch let error as KafkaError where error.queueFull {
 /// retry or whatever
}

That is fragile in case description will change at some point and not efficient in case we want to process this specific error.

Other thing that we handle all errors and wish to commit them to our event log that has limited number of characters (250 bytes to be precise). Kafka error codes or any swift enum that is raw representable suit wonderful, so we can pack them compactly.

That are two main reasons why we ask to make them a little bit more efficient, in other words, we want to:

  1. catch some errors without additional overhead or fragile string parsing
  2. log any errors without writing their big string representation

Therefore, if that make sense for the library, I can open one or two cases for both problems.

@FranzBusch
Copy link
Contributor

So in general anytime you have to access the underlying rdkafka error we have to think about wether we either need to handle that error inside our library here and do something with it or if we should create a higher level error from it. I don't want users to ever reach for the rdkafka error and do logic based on it. That would be leaking to much implementation details.

@blindspotbounty
Copy link
Collaborator Author

Hi @FranzBusch. Sure, I agree that it might not be good to provide rdkafka error code as is.
However, swift-kafka-client may mimic some error codes and provide description based on that.

Regarding my previous example. While preparing code for other issue (#132), I had to make the following code to be able to run the sample (https://github.com/ordo-one/swift-kafka-client/blob/bulk-variants/Sources/ForTesting/Snapshot/main.swift#L13-L23):

        while true { // Note: this is an example of queue full
            do {
                try producer.send(message)
                break
            } catch let error as KafkaError where error.description.contains("Queue full") {
                continue
            } catch {
                print("Caught some error: \(error)")
                throw error
            }
        }

In case of example for issue #132 there just 15_000_000 messages and that error can appear up to 175k times. Which is okay but on every such exception we have to compare this string.

I am not sure what are the best practice for wrapping errors in swift. But my thoughts are that it would be nice to have an enumeration that would allow to compact such error and make lazy string description evaluation.
This is a live example of error, though other errors such as different timeouts, purges, retries could appear for this and other operations.
Do you have something in mind how it could be handled without comparing strings in the application that uses the library?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants