Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct highlighting for asymmetric matchers #7893

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions packages/jest-deep-diff/.npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
**/__mocks__/**
**/__tests__/**
src
tsconfig.json
tsconfig.tsbuildinfo
97 changes: 97 additions & 0 deletions packages/jest-deep-diff/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Jest New Diff

I have not thought about the name at all. If there are good suggestions, you are more than welcome.
jeysal marked this conversation as resolved.
Show resolved Hide resolved

## Motivation

`jest-diff` package works by serializing the values and then diffing the serializations. This approach works for most of the scenarios but has a few edge cases. New Diff tries to address these limitations by first by diffing the values and then serializing the differences.

## Understanding design and codebase

Note: Consider this as my attempt to write something that works to improve my understanding of the problem space.

API is almost identical to `jest-diff`. There are two extra fields added to the options but the rest is the same. For now, not options are partially implemented.

There are two major components in the current implementation.

1. `diff` function which returns DiffObject(Representation of difference between two values as a JS object)
2. `format` function which returns a formatted string based on DiffObject

There is support for plugins. I have made ReactElement and AsymmetricMatcher Plugin, but they are not fully functional.

## `diff`

### supported types:

- [x] primitives (String is the only primitve with childDiffs)
- [x] Date
- [x] RegExp
- [x] Function
- [x] PrimitiveWrapper
- [x] Array
- [x] Object
- [x] Circular
- [x] Map (Does not checks equality of complex value keys)
- [ ] Set
- [ ] DOM Node
- [ ] React
- [x] Asymmetric Any
- [x] Asymmetric Object

I am quite happy with this module. It's clear to me what it does. It recursively marks values as Inserted, Updated, Deleted, Equal or TypeUnequal and returns an object which represents the differences between 2 values.

Note: Diff will also traverse and mark all the children of complex values

## `format`

- [x] primitives
- [x] Function
- [x] Array
- [x] Object
- [x] Circular
- [ ] Date
- [ ] RegExp
- [x] Map
- [ ] Set
- [ ] DOM Node
- [ ] React
- [x] Asymmetric Any
- [x] Asymmetric Object

`format` has two parts, `diffFormat`(desperately needs a name) and `print`. The first one returns an array of `Line` objects and the second serializes this array based on options.

### `Line` type

`Line` is a representation of a terminal line that will be printed on the screen. It has `type` which can be Inserted, Common or Deleted. The type determines the color of the content and diff indicator('-', '+'). `val` is a value that will be serialized. `prefix` and `suffix` are added to the serialized value, as well as `indent`.

`Line` also has `skipSerialize` field which can be set to `true` if you want to the full control of `val` serialization.

Note: `Line` can be printed as multiple lines too, but that is up to print function.

Example:

```ts
const line: Line = {type: Inserted, prefix: '"a": ', sufix: ',', val: {b: 1}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, when reading this example (in particular the {b: 1} part at the end) I already kind of saw the question down there coming 😅

To express what I understand in my words and make sure we're on the same page:
The original DiffObject is a tree structure, while this Line does not have further Lines in its val - it is flat. So these Lines can only represent the leaves of the DiffObject tree, meaning in your example the outer {a: ...} (which is UPDATED, not a leaf) is handled differently from the inner {a: ...} (which is UNEQUAL_TYPE, a leaf).

At the risk of suggesting something you had already thought about and know is problematic:
Flattening the tree seems to be the problem - I think we want to traverse a full DiffObject tree that goes all the way down into all nested structures.
Does diffing {} vs {a: {b: 1}} result in tree 1 (from what I understand it currently does):

UPDATED Object
  INSERTED a: Object...

or tree 2:

UPDATED Object
  INSERTED a: Object...
    INSERTED b: 1

If we have tree 2 (we still go all the way into an INSERTED property, it's just that every child will necessarily also be INSERTED), then the printer can do a recursive traversal pass through the tree, increasing indentation whenever it descends into child diffs, and really all that the INSERTED/DELETED kinds do is modify the sign that the printer prints at the start of the line.
EQUAL vs UPDATED would be almost the same for the printer really, only difference I can imagine is that the printer must never abbreviate an UPDATED node, it always has to print it out to get to the actual differences inside.
UNEQUAL_TYPE is a bit awkward, I think it would have to go and instead be a pair of DELETED and INSERTED, because what kind would the child diffs of an UNEQUAL_TYPE be?
Let me know what you think of this approach. I think so far the assumption of "when a new property appears (or a property has a different type), mark the whole thing different in some form and move on" has not really been challenged, but it seems implementing a printer has shown that it would be beneficial to have the WHOLE structure already analyzed so that the printer can just traverse through it - otherwise this problem occurs where UPDATED objects are treated as transparent by the printer and recursed into, while INSERTED objects are treated as opaque and just printed out as one big bag of characters, necessarily using different code for the two.

Copy link
Contributor Author

@grosto grosto Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting approach. I am investigating these days whenever I have time.

From a high-level view, this approach would change the responsibilities of DiffObject, but probably it will help out formatter. I liked that DiffObject had no idea about the formatting part. It's a bit unclear to me what's the responsibility of DiffObject is now(I get it from implementation point of view).

From the implementation view, all nested structure diff functions require two values. diffObjects(a, b, opts, diff) and the same in formatter. I am working to changing things up. It's not that simple. A lot of things and types need to be changed around. I will update here if I have new insights. Currently, I am thinking that the plugin system could be a bit harder to implement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the responsibility is still fairly clear, it marks the parts that were different with kinds. It's true that there would now be lots of subtrees that are entirely equal. I view it as:
Before, the maximum tree depth we would store for equal parts of the objects is 1.
Now, the maximum tree depth we would store for equal parts of the objects is POSITIVE_INFINITY.
The equal parts were always there, it's just we didn't bother to store deeper levels of them. We were going through them anyway when figuring out that they were actually equal at the start. But now we store them entirely, because it helps consumers of the diff object, like the printer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some updates:

  1. Flattening Inserted/Delete the values(I will refer to it as flatten) is a completely different thing than diffing two objects. Some code can be reused between them but it's not a lot. For example, in the case of objects, only getObjectKeys gets reused. I was thinking to use the idea of empty structures({} for objects and [] for arrays) and diff against them in a or b position to get flattened inserted/deleted diffs. But not sure if there can be an empty structure for everything, mostly asymmetric matchers worried me. Might get back to that idea at some point.
  2. Plugins will need to provide flatten functions just like diff function. For example, diffWithCustomDiffs is created in the main diff function with plugins and then passed down. The same would need to happen with flatten function. Those two points combined, I think that maybe it's best to add flatten as a separate step? It's an extra traversal but it's a bit awkward to carry both flatten and diff functions around.
export function diffObjects(
  a: Record<Key, unknown>,
  b: Record<Key, unknown>,
  path: Path,
  diff: Function, 
  flatten: Function,
): DiffObject<Record<Key, unknown>> {
    // omited
    if (otherIndex >= 0) {
      childDiffs.push(diffFunc(a[key], b[key], key));
      bKeys[otherIndex] = null;
    } else {
      childDiffs.push(
        flatten({val: a[key], path: key, kind: Kind.DELETED, flatten}),
      );
    }
    // omited
}

So it with extra step it would be diff -> flatten -> format -> print. Not sure if it's worth it, but something I would be willing to try.

  1. Switching UNEQUAL_TYPE with INSERTED and DELETED messes up models a bit. Instead of one DiffObject, the diff function now will have to return Array of DiffObjects, because inserted and deleted cannot be represented as one DiffObject. So far I am keeping it. When the formatter gets it, it just breaks it up into InsertedDiff and DeletedDiff and passes them back to itself. Since formatter returns an array of lines it composes well. Something like this.
const insertedDiff: DiffObject = {
  b: diffObj.b,
  childDiffs: diffObj.bChildDiffs,
  kind: Kind.INSERTED,
  path: diffObj.path,
};
const deletedDiff: DiffObject = {
  a: diffObj.a,
  childDiffs: diffObj.aChildDiffs,
  kind: Kind.DELETED,
  path: diffObj.path,
};
return [
  ...originalFormat(deletedDiff, context, options),
  ...originalFormat(insertedDiff, context, options),
];

In the future, formatter might just show the name of the constructor only when types are unequal or be smart about it in other ways.

-  "a": 3
+  "a": Object { ... }
  1. Now asymmetricMatchers: what does the diff object of expect.any(Object) and { a: 1 } looks like? Because they are equal, but we won't have flattened object for it. So far I am doing this, but the assumption that EQUAL means two objects have the same structure and will get flattened during the diffing process does not always stand.
 if (test(a)) {
    return a.asymmetricMatch(b)
      ? diff(b, b, path, memos) // Hacky way to get flattened equal object
      : flatten({ a, b, path, kind: TYPE_UNEQUAL }); 
 } 
  1. Plugins might still have to provide serializer function, but not sure yet. Have to play around with react element plugin more.

Meanwhile, I will work to clean up the code(mostly types) and push it. But hopefully, this is enough to get the idea of some problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm not sure I follow. You are raising problems that occur when trying to flatten a diff object tree, but what I was trying to say is I would want to avoid even doing a flatten at all costs.
In my suggestion, the diff object tree would always have as much depth as the values we're comparing; it would include all parts of a and b, where a and b are equal, where a has a property that b doesn't have at all, and in any other case.
That tree would never be flattened, instead the printer would operate by traversing into the full-depth structure.

Copy link
Contributor Author

@grosto grosto Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are couple of problems with that:
First, We cannot use undefined because it will treat missing key and key which has value of undefined the same way.
Second, I tried the same approach but with different special type, but it just passed down the problem(and functions) to the next guy and makes code very hard to follow. In general diffing algorithms and traversing algorithms are quite different in our case and there is very little elegant reuse that can be achieved. I feel like I tried several approaches already and always hit the dead end.

For now it's not a problem, it might be but we can worry about it later. None of this traversal changes are catastrophic and can always be reverted.

I will try to push code ASAP now so you also can get clearer idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, couldn't it still be represented differently to make the strict equality undefined vs missing property that you mention work?
Consider the following pseudocode (of course simplified without things like path, also ignoring the other way around with keysOnlyInA and DELETED, ...) - what am I missing?

const childDiffs = [];
let commonKeys /* = ... */;
let keysOnlyInB /* = ... */;
for (commonKey of commonKeys) {
  childDiffs.push(diff(a[commonKey], b[commonKey]))
}
for (keyOnlyInB of keysOnlyInB) {
  childDiffs.push({
    ...diff(undefined, b[keyOnlyInB]),
    kind: 'INSERTED',
  })
}

This is a really interesting conversation by the way 👌 in general assertions and diffing are such an interesting topic 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's very interesting and it's great to hear your thought process about it too.

I have been working on this whenever I had time. Pushing the newest changes so you can see them, but it's not final work.
This version is recursively traversing all complex objects and marks children as inserted/deleted.
Instead of undefined I went with Symbol which marks an empty value. You can see the code in complex/object.ts in diff function.

Additionally improved typing and rebased, which was some work considering how much changed.

I am planning to work on it more actively this week.
First I will try to make CI happy and then go through skipped tests(which are mostly for missing features) and try to make them pass.

Do you have any remarks/questions regarding overall architecture?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The union type does look cool indeed 👍
I would say overall architecture is mostly comprised of how diff traverses through data to build the diff object tree, and how format traverses through the diff object tree to print it, so I think we're aligned at this point 😅
Did you, while writing the code, feel like the approach I pushed for (that relies more heavily on tree-traversal) was helping against code duplication and feeling more natural? Or were there still awkward bits?
That would be interesting to me because of course my suggestions were all coming from quite a high level and I can't know for sure if they still work well when getting into the details implementing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried and it has some quirks. Not saying no to it, but I wanted to implement some other types and see some diffs first. Since diff function API is mostly agreed on, how we implement it can be improved down the line.

```

can be rendered as

```
....more diff

- "a": Object {
"b": 1,
},

...more diff
```

or

```
....more diff

- "a": Object {...},

...more diff
```
37 changes: 37 additions & 0 deletions packages/jest-deep-diff/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"name": "jest-deep-diff",
"version": "29.0.2",
"main": "./build/index.js",
"types": "./build/index.d.ts",
"exports": {
".": {
"types": "./build/index.d.ts",
"default": "./build/index.js"
},
"./package.json": "./package.json"
},
"repository": {
"type": "git",
"url": "https://github.com/facebook/jest",
"directory": "packages/jest-deep-diff"
},
"engines": {
"node": "^14.15.0 || ^16.10.0 || >=18.0.0"
},
"bugs": {
"url": "https://github.com/facebook/jest/issues"
},
"dependencies": {
"chalk": "^4.0.0",
"diff-sequences": "workspace:^",
"expect": "workspace:^",
"pretty-format": "workspace:^"
},
"devDependencies": {
"fast-check": "^1.23.0",
"jest-diff": "workspace:^",
"strip-ansi": "^6.0.0"
},
"homepage": "https://jestjs.io/",
"license": "MIT"
}
208 changes: 208 additions & 0 deletions packages/jest-deep-diff/perf/jestDiff.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
/**
* Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/

// adapted from pretty-format/perf/test.js
const chalk = require('chalk');
const oldDiff = require('jest-diff').default;
const diff = require('../build').default;
const bigJSON = require('./world.geo.json');

const NANOSECONDS = 1000000000;
let TIMES_TO_RUN = 10000;
const LIMIT_EXECUTION_TIME = 40 * NANOSECONDS;

const deepClone = obj => JSON.parse(JSON.stringify(obj));

function testCase(name, fn) {
let error, time, total, timeout;

try {
fn();
} catch (err) {
error = err;
}

if (!error) {
const start = process.hrtime();

let i = 0;
let currentTotal;
for (; i < TIMES_TO_RUN; i++) {
const diff = process.hrtime(start);
currentTotal = diff[0] * 1e9 + diff[1];
if (currentTotal > LIMIT_EXECUTION_TIME) {
timeout = true;
break;
}
fn();
}

total = currentTotal;

time = Math.round(total / TIMES_TO_RUN);
}

return {
error,
name,
time,
timeout,
total,
};
}

function test(name, a, b) {
const oldDiffResult = testCase('Old diff', () => oldDiff(a, b));

const diffResult = testCase('Deep diff', () => diff(a, b));

const results = [oldDiffResult, diffResult].sort((a, b) => a.time - b.time);

const winner = results[0];

results.forEach((item, index) => {
item.isWinner = index === 0;
item.isLoser = index === results.length - 1;
});

function log(current) {
let message = current.name;

if (current.timeout) {
message += ` Could not complete ${TIMES_TO_RUN} iterations under ${
LIMIT_EXECUTION_TIME / NANOSECONDS
}s`;
// eslint-disable-next-line no-console
console.log(' ' + chalk.bgRed.black(message));
return;
}

if (current.time) {
message += ' - ' + String(current.time).padStart(6) + 'ns';
}
if (current.total) {
message +=
' - ' +
current.total / NANOSECONDS +
's total (' +
TIMES_TO_RUN +
' runs)';
}
if (current.error) {
message += ' - Error: ' + current.error.message;
}

message = ' ' + message + ' ';

if (current.error) {
message = chalk.dim(message);
}

const diff = current.time - winner.time;

if (diff > winner.time * 0.85) {
message = chalk.bgRed.black(message);
} else if (diff > winner.time * 0.65) {
message = chalk.bgYellow.black(message);
} else if (!current.error) {
message = chalk.bgGreen.black(message);
} else {
message = chalk.dim(message);
}

// eslint-disable-next-line no-console
console.log(' ' + message);
}

// eslint-disable-next-line no-console
console.log(name + ': ');
results.forEach(log);
// eslint-disable-next-line no-console
console.log();
}

const equalPrimitives = [
['boolean', true, true],
['string', 'a', 'a'],
['number', 24, 24],
['null', null, null],
['undefined', undefined, null],
];

for (const [type, a, b] of equalPrimitives) {
test(`equal ${type}`, a, b);
}

const unequalPrimitives = [
['boolean', true, false],
['string', 'a', 'A'],
['number', 24, 42],
['null and undefined', null, undefined],
];

for (const [type, a, b] of unequalPrimitives) {
test(`unequal ${type}`, a, b);
}

const smallJSON = {
features: {
a: 1,
b: 3,
c: {
key: 'string',
},
},
topLevel: 3,
};

const smallJSONDeepEqual = deepClone(smallJSON);

test('deep equal small objects', smallJSON, smallJSONDeepEqual);

const smallJSONUpdated = {
features: {
a: 1,
b: 4,
c: {
key2: 'string',
},
},
topLevel: 4,
};

test('updated small objects', smallJSON, smallJSONUpdated);

TIMES_TO_RUN = 100;

const mediumJSON = {
...bigJSON,
features: bigJSON.features.slice(10),
};

const changedMediumJSON = {
...bigJSON,
features: deepClone(bigJSON.features.slice(4, 14)),
};
test('Medium object with diff', mediumJSON, changedMediumJSON);

const mediumJSONDeepEqual = deepClone(mediumJSON);

test('Medium object with deep equality', mediumJSON, mediumJSONDeepEqual);

const objectWithXKeys1 = {};
const objectWithXKeys2 = {};

const keyNumber = 20;

for (let i = 0; i < keyNumber; i++) {
objectWithXKeys1['key' + i] = Math.round(Math.random());
objectWithXKeys2['key' + i] = Math.round(Math.random());
objectWithXKeys1[Math.random().toString(36)] = i;
objectWithXKeys2[Math.random().toString(36)] = i;
}

test('Object with a lot of keys', objectWithXKeys1, objectWithXKeys2);
Loading