Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readStream with highWaterMark causes 'data' event to miss data #33940

Closed
markwylde opened this issue Jun 18, 2020 · 1 comment
Closed

readStream with highWaterMark causes 'data' event to miss data #33940

markwylde opened this issue Jun 18, 2020 · 1 comment
Labels
confirmed-bug Issues with confirmed bugs. fs Issues and PRs related to the fs subsystem / file system. stream Issues and PRs related to the stream subsystem.

Comments

@markwylde
Copy link

markwylde commented Jun 18, 2020

  • Version: v14.4.0 (also tried with v12.17.0)
  • Platform: Darwin Marks-MacBook-Pro.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
  • Subsystem:

Explanation?

The code below will constantly append to a file (/tmp/test.txt) and constantly read the file, pushing the data into an in-memory buffer. This is essentially a "tail".

After a few iterations, we eventually dirty the buffer with incorrect data, as the 'data' event does not contain the correct data.

What steps will reproduce the bug?

const fs = require('fs');

// Options
const highWaterMark = 10

// Storage
let stream;
let lastBytePosition = 0;
let buffer = Buffer.alloc(0);
let counter = 0;

// Initialise a test file
fs.writeFileSync('/tmp/test.txt', '');

// Constantly append data to the file
setInterval(() => {
  counter = counter + 1
  const line = `hello at ${counter}\n`;
  fs.writeFileSync('/tmp/test.txt', line, { flag: 'a' });
}, 1)

// Constantly read the file
setInterval(() => {
  if (stream) {
    return
  }

  stream = fs.createReadStream('/tmp/test.txt', { highWaterMark, start: lastBytePosition });

  stream.on('data', chunk => {
    lastBytePosition = lastBytePosition + chunk.length;
    buffer = Buffer.concat([buffer, chunk]);
  });

  stream.on('end', () => {
    stream = null
  })
}, 10)

// Let's keep checking the buffer until the inevitable break
setInterval(() => {
  const hasBrokenLine = buffer.toString().split('\n').slice(0, -1).filter(line => {
    return !line.startsWith('hello')
  })

  if (hasBrokenLine.length > 0) {
    console.log(buffer.toString('utf8'));
    process.exit();
  }
}, 10)

How often does it reproduce? Is there a required condition?

Every time

What is the expected behaviour?

The file would never end, and every line in the buffer would start with 'hello'

What do you see instead?

The output reads:

node-bug % node test.js
hello at 1
hello at 2
hello at 3
hello at 4
hello at 5
hello at 6
hello at 7
hello at 8
hello at 9
hello at 10
hello at 11
hello at 12
hello at 13
hello at 14
hello at 15
hello at 16
hello at 17
hello at 18
hello at 19
hello at 20
hello at 21
hello at 22
hello at 23
hello at 24
hello at 25
 26

Note, the last line should be hello at 26

Additional information

  • The dirty line is not always the same, or the same format. The missing data varys.
  • This doesn't happen when you choose a highWaterMark greater than the size of the file change.
  • I have checked the /tmp/test.txt file and the data is as expected.
  • I have played with increasing the timers, and while the success rate it higher, it inevitably breaks after only a few more seconds.
  • I have tried on Linux (via Docker for Mac) too and the same issue occurs.
@markwylde markwylde changed the title readStream with highWaterMark causes data to go missing readStream with highWaterMark causes 'data' event to miss data Jun 18, 2020
@targos targos added fs Issues and PRs related to the fs subsystem / file system. stream Issues and PRs related to the stream subsystem. labels Dec 27, 2020
helloyou2012 added a commit to helloyou2012/node that referenced this issue Apr 19, 2021
@helloyou2012
Copy link
Contributor

helloyou2012 commented Apr 19, 2021

See #38292, bytesRead is not necessarily equal to highWaterMark, so pos should use bytesRead instead of highWaterMark. The following hack as well:

...
  stream.on('data', chunk => {
    lastBytePosition = lastBytePosition + chunk.length;
    stream.pos = lastBytePosition;
    buffer = Buffer.concat([buffer, chunk]);
  });
...

@Ayase-252 Ayase-252 added the confirmed-bug Issues with confirmed bugs. label Apr 19, 2021
helloyou2012 added a commit to helloyou2012/node that referenced this issue Apr 19, 2021
helloyou2012 added a commit to helloyou2012/node that referenced this issue Apr 20, 2021
helloyou2012 added a commit to helloyou2012/node that referenced this issue Apr 20, 2021
helloyou2012 added a commit to helloyou2012/node that referenced this issue Apr 28, 2021
targos pushed a commit that referenced this issue May 3, 2021
Fixes: #33940

PR-URL: #38292
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
targos pushed a commit that referenced this issue May 30, 2021
Fixes: #33940

PR-URL: #38292
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
targos pushed a commit that referenced this issue Jun 5, 2021
Fixes: #33940

PR-URL: #38292
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
targos pushed a commit that referenced this issue Jun 5, 2021
Fixes: #33940

PR-URL: #38292
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
targos pushed a commit that referenced this issue Jun 11, 2021
Fixes: #33940

PR-URL: #38292
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. fs Issues and PRs related to the fs subsystem / file system. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants