Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqs batch failure error stack #762

Merged
merged 3 commits into from
Jan 4, 2022

Conversation

shivang007
Copy link

What does this implement/fix? Explain your changes.

It passes Error Stack along with message.

Does this close any currently open issues?

No

Any other comments?

Currently, if any worker gets shared by multiple methods, then it's very difficult (sometimes impossible without running locally) to pinpoint the exact location where the error occurs. Hence adding the original stack of the error.

Where has this been tested?

Node.js Versions: 14.17.6
Middy Versions: 2.5.3
AWS SDK Versions: ^2.1014.0

@shivang007
Copy link
Author

Screenshot 2021-12-19 at 5 01 18 PM

The error stack is normally shown like above logs (error @middy/sqs-partial-batch-failure/index.js:91:1),
which makes it impossible to trace in a large scale application.

Hence added the actual error stack along with message.

It has been tested on a batch and will produce the output like:

Screenshot 2021-12-19 at 5 07 36 PM

@willfarrell
Copy link
Member

willfarrell commented Dec 19, 2021

What if instead of attaching the stack traces to the message we attach them to the error similar to how we do in @middy/core. What do you think of this approach?

const sqsPartialBatchFailureMiddlewareAfter = async (request) => {
    ...
    const rejectedReasons = getRejectedReasons(response)

    // If all messages were processed successfully, continue and let the messages be deleted by Lambda's native functionality
    if (!rejectedReasons.length) return

    ...

    const errorMessage = getErrorMessage(rejectedReasons)
    const error = new Error(errorMessage)
    error.originalErrors = rejectedReasons
    throw error
  }

  ...
}

const getRejectedReasons = (response) => {
  return response
    .filter((r) => r.status === 'rejected')
    .map((r) => r.reason)
}

const getErrorMessage = (rejectedReasons) => {
  return rejectedReasons.map(error => error.message).join('\n')
}

@shivang007
Copy link
Author

@willfarrell I tried out your code changes. It does add the originalErrors array inside the error.
But the problem is, by default only error message and stack gets logged (hiding originalErrors)!
Screenshot 2021-12-21 at 12 21 50 PM

So instead of this, how about we modify the .stack property of the new Error like this:
https://stackoverflow.com/questions/42754270/re-throwing-exception-in-nodejs-and-not-losing-stack-trace

checkout the first part of accepted answer.

@willfarrell
Copy link
Member

willfarrell commented Dec 22, 2021

Are you using @middy/error-logger? That should print out all nested errors. It might have issues due to the nesting being too deep.

@shivang007
Copy link
Author

Thanks for that!
We were not using @middy/error-logger, but after adding that, I can see the originalError getting logged.

I have modified the PR according your suggestions and after testing it, I am able to receive following logs (while failing a batch):

Test
    at after (/var/task/webpack:/javascript/@middy/sqs-partial-batch-failure/index.js:93:19)
    at c (/var/task/webpack:/javascript/@middy/core/index.js:120:17)
    at h (/var/task/webpack:/javascript/@middy/core/index.js:88:7) {
  originalErrors: [
    Error: Test
        at calculateGDD (/var/task/webpack:/workers/actualgdd/handler.js:18:11)
        at map (/var/task/webpack:/workers/actualgdd/handler.js:194:11)
        at Array.map (<anonymous>)
        at /var/task/webpack:/workers/actualgdd/handler.js:192:40
        at h (/var/task/webpack:/javascript/@middy/core/index.js:86:32),
    Error: Test
        at calculateGDD (/var/task/webpack:/workers/actualgdd/handler.js:18:11)
        at map (/var/task/webpack:/workers/actualgdd/handler.js:194:11)
        at Array.map (<anonymous>)
        at /var/task/webpack:/workers/actualgdd/handler.js:192:40
        at h (/var/task/webpack:/javascript/@middy/core/index.js:86:32)
  ]
}

@shivang007
Copy link
Author

@willfarrell, On a similar note,
can we extend the logging in the middleware?
As in, we want our own logger to log the stacks after attaching the mata data that we need,
to index it in Elasticsearch, to be able to visualise it in Kibana.

@willfarrell
Copy link
Member

There is an error logging middleware that can allow you to transform and ship where you like. You have to write the code those. Open a new issue if you’d like to discuss further.

@shivang007
Copy link
Author

@willfarrell
Sure!
Also, will this PR for sqs-partial-batch-failure be merged and released on npm anytime soon?

@willfarrell
Copy link
Member

I'm holding off on merging this just for a bit. It relates to another issue around handling errors from getInternal and error handling improvements planned for v3 (working on right now). I want to make sure all instances or arrayed errors are handled consistently, so it might be a week or two before I can put it in a release. Appologies for this delay.

@willfarrell willfarrell mentioned this pull request Dec 28, 2021
26 tasks
@shivang007
Copy link
Author

No problem!
Sounds like a good plan 👍

@willfarrell willfarrell merged commit b9b8ee1 into middyjs:main Jan 4, 2022
@willfarrell
Copy link
Member

@shivang007 Please take a look at #770, new AWS feature removes the need for throwing an error, so will need a logger instead. Would like your feedback on how errors should be handled for reporting purposes.

@shivang007
Copy link
Author

@willfarrell
I saw the thread and the updates that have been made by serverless (using changes published by AWS), but since they are successfully processing a batch after catching any errors, it will always show a batch as success - making it impossible for any external monitoring tool (e.g. Lumigo) to detect if there was a failure within a batch. As per our opinion marking a partially failed batch as successful is not the right way. Hence we might stick to this version and won't be using those functionality until it is handled in the right way.

@willfarrell
Copy link
Member

Thanks for sharing. I was thinking about that as well. I can see where both ways are valuable depending on the use case.

a. An error happened while processing a message (push to DLQ)
b. message is still being processed by an external service and should be tried again later

This v3 approach is much better for b, while the v2 approach is much better for a.

Looking at other use cases like API Gateway + Lambda it is expected the error is handled to return a "success" with a statusCode indicating an error. I imagine monitoring tools handle this use case and display properly. I would expect, give time and little nudging, monitoring tools will handle this use case as well.

The v2 middlewares will work with v3

@shivang007
Copy link
Author

Ofcourse, eventually they will adapt!

Also saw your V3 code at https://github.com/middyjs/middy/blob/release/3.x/packages/sqs-partial-batch-failure/index.js
Looks good to me. It accepts the logger function, allowing devs the freedom to handle their logs anyway they want.

@shivang007
Copy link
Author

@willfarrell is this released on npm?

@willfarrell
Copy link
Member

I hope to have an alpha out for the end of the week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants