Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Port 12078 reduce port system load #1318

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

yaelibarg
Copy link
Contributor

@yaelibarg yaelibarg commented Jan 12, 2025

Description

What - reduce the amount of upserts we send to port api

Why - many of the upserts does not contain an actual change, reduce load from port api

How - check if the entity from the third party has a change from the entity in port, and only if there is an actual change, upsert the entity

Type of change

Please leave one option from the following and delete the rest:

  • Bug fix (non-breaking change which fixes an issue)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • Integration able to create all default resources from scratch
  • Resync finishes successfully
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Scheduled resync able to abort existing resync and start a new one
  • Tested with at least 2 integrations from scratch
  • Tested with Kafka and Polling event listeners
  • Tested deletion of entities that don't pass the selector

Integration testing checklist

  • Integration able to create all default resources from scratch
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Resync finishes successfully
  • If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • Docs PR link here

Preflight checklist

  • Handled rate limiting
  • Handled pagination
  • Implemented the code in async
  • Support Multi account

@yaelibarg yaelibarg requested a review from a team as a code owner January 12, 2025 14:44
@yaelibarg yaelibarg changed the title Port 12078 reduce port system load [Core] Port 12078 reduce port system load Jan 12, 2025
@github-actions github-actions bot added size/L and removed size/M labels Jan 12, 2025
Comment on lines +160 to +165
entities_at_port_with_properties = await ocean.port_client.search_entities(
user_agent_type,
include_params=["blueprint", "identifier"] + [
f"properties.{prop}" for prop in resource.port.entity.mappings.properties
],
query=query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing relations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move it to a separate function for constructing query?

Comment on lines 167 to 168
unique_entities = get_unique_entities(objects_diff[0].entity_selector_diff.passed, entities_at_port_with_properties)
modified_objects = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unique_entities = get_unique_entities(objects_diff[0].entity_selector_diff.passed, entities_at_port_with_properties)
modified_objects = []
changed_entities = get_unique_entities(objects_diff[0].entity_selector_diff.passed, entities_at_port_with_properties)
modified_objects = []

Comment on lines +144 to +165
query = {
"combinator": "and",
"rules": [
{
"combinator": "or",
"rules": [
{
"property": "$identifier",
"operator": "=",
"value": entity.identifier,
}
for entity in objects_diff[0].entity_selector_diff.passed
]
}
]
}
entities_at_port_with_properties = await ocean.port_client.search_entities(
user_agent_type,
include_params=["blueprint", "identifier"] + [
f"properties.{prop}" for prop in resource.port.entity.mappings.properties
],
query=query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a need to search for every batch size? I think the minimum should be 20

Comment on lines 148 to 155
return [
third_party_entity
for third_party_entity in third_party_entities
if not any(
are_entities_equal(third_party_entity, port_entity)
for port_entity in port_entities
)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return [
third_party_entity
for third_party_entity in third_party_entities
if not any(
are_entities_equal(third_party_entity, port_entity)
for port_entity in port_entities
)
]
port_entity_ids = {port_entity.identifier for port_entity in port_entities}
return [
third_party_entity
for third_party_entity in third_party_entities
if third_party_entity.identifier not in port_entity_ids
or not any(
are_entities_equal(third_party_entity, port_entity)
for port_entity in port_entities
if port_entity.identifier == third_party_entity.identifier
)
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be more efficient

Comment on lines +129 to +131
diff = DeepDiff(
first_entity.properties, second_entity.properties, ignore_order=True
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about relations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about dates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are you handling decimal numbers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to decide on some size / limit of when we are stopping the compare and decide to sync it to port? as it can be quite cpu intensive for big objects

Comment on lines 171 to 174
logger.bind(
changed_entities=len(unique_entities),
total_entities=len(objects_diff[0].entity_selector_diff.passed),
).info("Upserting changed entities")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.bind(
changed_entities=len(unique_entities),
total_entities=len(objects_diff[0].entity_selector_diff.passed),
).info("Upserting changed entities")
logger.info("Upserting changed entities", changed_entities=len(unique_entities),
total_entities=len(objects_diff[0].entity_selector_diff.passed))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants