diff --git a/docs/source/basics/building_a_pipeline.md b/docs/source/basics/building_a_pipeline.md index e667b34573..65fadb0cf6 100644 --- a/docs/source/basics/building_a_pipeline.md +++ b/docs/source/basics/building_a_pipeline.md @@ -64,6 +64,8 @@ Configuring Pipeline via CLI Starting pipeline via CLI... Ctrl+C to Quit Config: { + "_model_max_batch_size": 8, + "_pipeline_batch_size": 256, "ae": null, "class_labels": [], "debug": false, @@ -75,31 +77,23 @@ Config: "log_config_file": null, "log_level": 10, "mode": "OTHER", - "model_max_batch_size": 8, "num_threads": 64, - "pipeline_batch_size": 256, "plugins": [] } CPP Enabled: True -====Registering Pipeline==== -====Building Pipeline==== -====Building Pipeline Complete!==== -Starting! Time: 1689786614.4988477 -====Registering Pipeline Complete!==== ====Starting Pipeline==== +====Pipeline Started==== ====Building Segment: linear_segment_0==== -Added source: +Added source: └─> morpheus.MessageMeta -Added stage: - └─ morpheus.MessageMeta -> morpheus.MultiMessage -Added stage: - └─ morpheus.MultiMessage -> morpheus.MessageMeta +Added stage: + └─ morpheus.MessageMeta -> morpheus.ControlMessage +Added stage: + └─ morpheus.ControlMessage -> morpheus.MessageMeta Added stage: └─ morpheus.MessageMeta -> morpheus.MessageMeta ====Building Segment Complete!==== -====Pipeline Started==== ====Pipeline Complete==== -Pipeline visualization saved to .tmp/simple_identity.png ``` ### Pipeline Build Checks @@ -113,10 +107,10 @@ morpheus --log_level=DEBUG run pipeline-other \ Then the following error displays: ``` -RuntimeError: The to-file stage cannot handle input of . Accepted input types: (,) +RuntimeError: The to-file stage cannot handle input of . Accepted input types: (,) ``` -This indicates that the ``to-file`` stage cannot accept the input type of `morpheus.messages.multi_message.MultiMessage`. This is because the ``to-file`` stage has no idea how to write that class to a file; it only knows how to write instances of `morpheus.messages.message_meta.MessageMeta`. To ensure you have a valid pipeline, examine the `Accepted input types: (,)` portion of the message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `MultiMessage`, to `MessageMeta`, which is exactly what the `serialize` stage does. +This indicates that the ``to-file`` stage cannot accept the input type of `morpheus.messages.ControlMessage`. This is because the ``to-file`` stage has no idea how to write that class to a file; it only knows how to write instances of `morpheus.messages.message_meta.MessageMeta`. To ensure you have a valid pipeline, examine the `Accepted input types: (,)` portion of the message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `ControlMessage`, to `MessageMeta`, which is exactly what the `serialize` stage does. ### Kafka Source Example The above example essentially just copies a file. However, it is an important to note that most Morpheus pipelines are similar in structure, in that they begin with a source stage (`from-file`) followed by a `deserialize` stage, end with a `serialize` stage followed by a sink stage (`to-file`), with the actual training or inference logic occurring in between. diff --git a/docs/source/developer_guide/guides/10_modular_pipeline_digital_fingerprinting.md b/docs/source/developer_guide/guides/10_modular_pipeline_digital_fingerprinting.md index 968119b08f..e48b8c6df2 100644 --- a/docs/source/developer_guide/guides/10_modular_pipeline_digital_fingerprinting.md +++ b/docs/source/developer_guide/guides/10_modular_pipeline_digital_fingerprinting.md @@ -345,7 +345,7 @@ Source: `morpheus/modules/mlflow_model_writer.py` The `mlflow_model_writer` module is responsible for uploading trained models to the MLflow server. -For each `MultiAEMessage` received, containing a trained model, the function uploads the model to MLflow along with associated metadata such as experiment name, run name, parameters, metrics, and the model signature. If the MLflow server is running on Databricks, the function also applies the required permissions to the registered model. +For each `ControlMessage` received, containing a trained model, the function uploads the model to MLflow along with associated metadata such as experiment name, run name, parameters, metrics, and the model signature. If the MLflow server is running on Databricks, the function also applies the required permissions to the registered model. For a complete reference, refer to: [MLflow Model Writer](../../modules/core/mlflow_model_writer.md) @@ -460,9 +460,9 @@ For a complete reference, refer to: [DFP Post Processing](../../modules/examples Source: `morpheus/modules/serialize.py` -The serialize module function is responsible for filtering columns from a `MultiMessage` object and emitting a `MessageMeta` object. +The serialize module function is responsible for filtering columns from a `ControlMessage` object and emitting a `MessageMeta` object. -The `convert_to_df` function converts a DataFrame to JSON lines. It takes a `MultiMessage` instance, `include_columns` (a pattern for columns to include), `exclude_columns` (a list of patterns for columns to exclude), and `columns` (a list of columns to include). The function filters the columns of the input DataFrame based on the include and exclude patterns and retrieves the metadata of the filtered columns. +The `convert_to_df` function converts a DataFrame to JSON lines. It takes a `ControlMessage` instance, `include_columns` (a pattern for columns to include), `exclude_columns` (a list of patterns for columns to exclude), and `columns` (a list of columns to include). The function filters the columns of the input DataFrame based on the include and exclude patterns and retrieves the metadata of the filtered columns. The module function compiles the include and exclude patterns into regular expressions. It then creates a node using the `convert_to_df` function with the compiled include and exclude patterns and the specified columns. diff --git a/docs/source/developer_guide/guides/1_simple_python_stage.md b/docs/source/developer_guide/guides/1_simple_python_stage.md index fe9c901de4..0ed1a08d59 100644 --- a/docs/source/developer_guide/guides/1_simple_python_stage.md +++ b/docs/source/developer_guide/guides/1_simple_python_stage.md @@ -108,7 +108,7 @@ There are four methods that need to be defined in our new subclass to implement return "pass-thru" ``` -The `accepted_types` method returns a tuple of message classes that this stage is able to accept as input. Morpheus uses this to validate that the parent of this stage emits a message that this stage can accept. Since our stage is a pass through, we will declare that we can accept any incoming message type. Note that production stages will often declare only a single Morpheus message class such as `MessageMeta` or `MultiMessage` (refer to the message classes defined in `morpheus.pipeline.messages` for a complete list). +The `accepted_types` method returns a tuple of message classes that this stage is able to accept as input. Morpheus uses this to validate that the parent of this stage emits a message that this stage can accept. Since our stage is a pass through, we will declare that we can accept any incoming message type. Note that production stages will often declare only a single Morpheus message class such as `MessageMeta` or `ControlMessage` (refer to the message classes defined in `morpheus.messages` for a complete list). ```python def accepted_types(self) -> tuple: return (typing.Any,) diff --git a/docs/source/developer_guide/guides/2_real_world_phishing.md b/docs/source/developer_guide/guides/2_real_world_phishing.md index 4043e863ec..a7776c8dc4 100644 --- a/docs/source/developer_guide/guides/2_real_world_phishing.md +++ b/docs/source/developer_guide/guides/2_real_world_phishing.md @@ -460,7 +460,7 @@ pipeline.add_stage(AddScoresStage(config, labels=["is_phishing"])) Lastly, we will save our results to disk. For this purpose, we are using two stages that are often used in conjunction with each other: `SerializeStage` and `WriteToFileStage`. -The `SerializeStage` is used to include and exclude columns as desired in the output. Importantly, it also handles conversion from the `MultiMessage`-derived output type to the `MessageMeta` class that is expected as input by the `WriteToFileStage`. +The `SerializeStage` is used to include and exclude columns as desired in the output. Importantly, it also handles conversion from `ControlMessage` output type to the `MessageMeta` class that is expected as input by the `WriteToFileStage`. The `WriteToFileStage` will append message data to the output file as messages are received. Note however that for performance reasons the `WriteToFileStage` does not flush its contents out to disk every time a message is received. Instead, it relies on the underlying [buffered output stream](https://gcc.gnu.org/onlinedocs/libstdc++/manual/streambufs.html) to flush as needed, and then will close the file handle on shutdown. @@ -889,7 +889,7 @@ class RabbitMQSourceStage(PreallocatorMixin, SingleOutputSource): ``` ### Function Based Approach -Similar to the `stage` decorator used in previous examples Morpheus provides a `source` decorator which wraps a generator function to be used as a source stage. In the class based approach we explicitly added the `PreallocatorMixin`, when using the `source` decorator the return type annotation will be inspected and a stage will be created with the `PreallocatorMixin` if the return type is a `DataFrame` type or a message which contains a `DataFrame` (`MessageMeta` and `MultiMessage`). +Similar to the `stage` decorator used in previous examples Morpheus provides a `source` decorator which wraps a generator function to be used as a source stage. In the class based approach we explicitly added the `PreallocatorMixin`, when using the `source` decorator the return type annotation will be inspected and a stage will be created with the `PreallocatorMixin` if the return type is a `DataFrame` type or a message which contains a `DataFrame` (`MessageMeta` and `ControlMessage`). The code for the function will first perform the same setup as was used in the class constructor, then entering a nearly identical loop as that in the `source_generator` method. diff --git a/docs/source/developer_guide/guides/3_simple_cpp_stage.md b/docs/source/developer_guide/guides/3_simple_cpp_stage.md index 206b4eb13e..f21317475d 100644 --- a/docs/source/developer_guide/guides/3_simple_cpp_stage.md +++ b/docs/source/developer_guide/guides/3_simple_cpp_stage.md @@ -84,16 +84,17 @@ Both the `PythonSource` and `PythonNode` classes are defined in the `pymrc/node. As in our Python guide, we will start with a simple pass through stage which can be used as a starting point for future development of other stages. Note that by convention, C++ classes in Morpheus have the same name as their corresponding Python classes and are located under a directory named `_lib`. We will be following that convention. To start, we will create a `_lib` directory and a new empty `__init__.py` file. -While our Python implementation accepts messages of any type (in the form of Python objects), on the C++ side we don't have that flexibility since our node is subject to C++ static typing rules. In practice, this isn't a limitation as we usually know which specific message types we need to work with. For this example we will be working with the `MultiMessage` as our input and output type, it is also a common base type for many other Morpheus message classes. This means that at build time our Python stage implementation is able to build a C++ node when the incoming type is a subclass of `MultiMessage`, while falling back to the existing Python implementation otherwise. +While our Python implementation accepts messages of any type (in the form of Python objects), on the C++ side we don't have that flexibility since our node is subject to C++ static typing rules. In practice, this isn't a limitation as we usually know which specific message types we need to work with. For this example we will be working with the `ControlMessage` as our input and output type, it is also a common base type for many other Morpheus message classes. This means that at build time our Python stage implementation is able to build a C++ node when the incoming type is `ControlMessage`, while falling back to the existing Python implementation otherwise. To start with, we have our Morpheus and MRC-specific includes: ```cpp -#include -#include // for MultiMessage +#include // for exporting symbols +#include // for ControlMessage #include // for Segment Builder #include // for Segment Object #include // for PythonNode +#include ``` We'll want to define our stage in its own namespace. In this case, we will name it `morpheus_example`, giving us a namespace and class definition like: @@ -104,10 +105,11 @@ namespace morpheus_example { using namespace morpheus; // pybind11 sets visibility to hidden by default; we want to export our symbols -class MORPHEUS_EXPORT PassThruStage : public mrc::pymrc::PythonNode, std::shared_ptr> +class MORPHEUS_EXPORT PassThruStage + : public mrc::pymrc::PythonNode, std::shared_ptr> { public: - using base_t = mrc::pymrc::PythonNode, std::shared_ptr>; + using base_t = mrc::pymrc::PythonNode, std::shared_ptr>; using base_t::sink_type_t; using base_t::source_type_t; using base_t::subscribe_fn_t; @@ -126,13 +128,13 @@ Then adding `MORPHEUS_EXPORT`, which is defined in `/build/autogenerated/include This is due to a pybind11 requirement for module implementations to default symbol visibility to hidden (`-fvisibility=hidden`). More details about this can be found in the [pybind11 documentation](https://pybind11.readthedocs.io/en/stable/faq.html#someclass-declared-with-greater-visibility-than-the-type-of-its-field-someclass-member-wattributes). Any object, struct, or function that is intended to be exported should have `MORPHEUS_EXPORT` included in the definition. -For simplicity, we defined `base_t` as an alias for our base class type because the definition can be quite long. Our base class type also defines a few additional type aliases for us: `subscribe_fn_t`, `sink_type_t` and `source_type_t`. The `sink_type_t` and `source_type_t` aliases are shortcuts for the sink and source types that this stage will be reading and writing. In this case both the `sink_type_t` and `source_type_t` resolve to `std::shared_ptr`. `subscribe_fn_t` (read as "subscribe function type") is an alias for: +For simplicity, we defined `base_t` as an alias for our base class type because the definition can be quite long. Our base class type also defines a few additional type aliases for us: `subscribe_fn_t`, `sink_type_t` and `source_type_t`. The `sink_type_t` and `source_type_t` aliases are shortcuts for the sink and source types that this stage will be reading and writing. In this case both the `sink_type_t` and `source_type_t` resolve to `std::shared_ptr`. `subscribe_fn_t` (read as "subscribe function type") is an alias for: ```cpp std::function, rxcpp::subscriber)> ``` -This means that an MRC subscribe function accepts an `rxcpp::observable` of type `InputT` and `rxcpp::subscriber` of type `OutputT` and returns a subscription. In our case, both `InputT` and `OutputT` are `std::shared_ptr`. +This means that an MRC subscribe function accepts an `rxcpp::observable` of type `InputT` and `rxcpp::subscriber` of type `OutputT` and returns a subscription. In our case, both `InputT` and `OutputT` are `std::shared_ptr`. All Morpheus C++ stages receive an instance of an MRC Segment Builder and a name (Typically this is the Python class' `unique_name` property) when constructed from Python. Note that C++ stages don't receive an instance of the Morpheus `Config` object. Therefore, if there are any attributes in the `Config` needed by the C++ class, it is the responsibility of the Python class to extract them and pass them in as parameters to the C++ class. @@ -152,23 +154,29 @@ struct MORPHEUS_EXPORT PassThruStageInterfaceProxy #pragma once #include // for exporting symbols -#include // for MultiMessage +#include // for ControlMessage #include // for Segment Builder #include // for Segment Object #include // for PythonNode +#include #include #include +#include + +// IWYU pragma: no_include "morpheus/objects/data_table.hpp" +// IWYU pragma: no_include namespace morpheus_example { using namespace morpheus; // pybind11 sets visibility to hidden by default; we want to export our symbols -class MORPHEUS_EXPORT PassThruStage : public mrc::pymrc::PythonNode, std::shared_ptr> +class MORPHEUS_EXPORT PassThruStage + : public mrc::pymrc::PythonNode, std::shared_ptr> { public: - using base_t = mrc::pymrc::PythonNode, std::shared_ptr>; + using base_t = mrc::pymrc::PythonNode, std::shared_ptr>; using base_t::sink_type_t; using base_t::source_type_t; using base_t::subscribe_fn_t; @@ -355,7 +363,7 @@ def supports_cpp_node(self): return True ``` -Next, as mentioned previously, our Python implementation can support messages of any type, however the C++ implementation can only support instances of `MultiMessage` and its subclasses. To do this we will override the `compute_schema` method to store the input type. +Next, as mentioned previously, our Python implementation can support messages of any type, however the C++ implementation can only support instances of `ControlMessage`. To do this we will override the `compute_schema` method to store the input type. ```python def compute_schema(self, schema: StageSchema): super().compute_schema(schema) # Call PassThruTypeMixin's compute_schema method @@ -363,11 +371,11 @@ def compute_schema(self, schema: StageSchema): ``` > **Note**: We are still using the `PassThruTypeMixin` to handle the requirements of setting the output type. -As mentioned in the previous section, our `_build_single` method needs to be updated to build a C++ node when the input type is `MultiMessage` and when `morpheus.config.CppConfig.get_should_use_cpp()` is `True` using the `self._build_cpp_node()` method. The `_build_cpp_node()` method compares both `morpheus.config.CppConfig.get_should_use_cpp()` and `supports_cpp_node()` and returns `True` only when both methods return `True`. +As mentioned in the previous section, our `_build_single` method needs to be updated to build a C++ node when the input type is `ControlMessage` and when `morpheus.config.CppConfig.get_should_use_cpp()` is `True` using the `self._build_cpp_node()` method. The `_build_cpp_node()` method compares both `morpheus.config.CppConfig.get_should_use_cpp()` and `supports_cpp_node()` and returns `True` only when both methods return `True`. ```python def _build_single(self, builder: mrc.Builder, input_node: mrc.SegmentObject) -> mrc.SegmentObject: - if self._build_cpp_node() and issubclass(self._input_type, MultiMessage): + if self._build_cpp_node() and isinstance(self._input_type, ControlMessage): from ._lib import pass_thru_cpp node = pass_thru_cpp.PassThruStage(builder, self.unique_name) @@ -389,7 +397,7 @@ from mrc.core import operators as ops from morpheus.cli.register_stage import register_stage from morpheus.config import Config -from morpheus.messages import MultiMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.pass_thru_type_mixin import PassThruTypeMixin from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.pipeline.stage_schema import StageSchema @@ -421,7 +429,7 @@ class PassThruStage(PassThruTypeMixin, SinglePortStage): return message def _build_single(self, builder: mrc.Builder, input_node: mrc.SegmentObject) -> mrc.SegmentObject: - if self._build_cpp_node() and issubclass(self._input_type, MultiMessage): + if self._build_cpp_node() and isinstance(self._input_type, ControlMessage): from ._lib import pass_thru_cpp node = pass_thru_cpp.PassThruStage(builder, self.unique_name) @@ -430,10 +438,11 @@ class PassThruStage(PassThruTypeMixin, SinglePortStage): builder.make_edge(input_node, node) return node + ``` ## Testing the Stage -To test the updated stage we will build a simple pipeline using the Morpheus command line tool. In order to illustrate the stage building a C++ node only when the input type is a `MultiMessage` we will insert the `pass-thru` stage in twice in the pipeline. In the first instance the input type will be `MessageMeta` and the stage will fallback to using a Python node, and in the second instance the input type will be a `MultiMessage` and the stage will build a C++ node. +To test the updated stage we will build a simple pipeline using the Morpheus command line tool. In order to illustrate the stage building a C++ node only when the input type is a `ControlMessage` we will insert the `pass-thru` stage in twice in the pipeline. In the first instance the input type will be `MessageMeta` and the stage will fallback to using a Python node, and in the second instance the input type will be a `ControlMessage` and the stage will build a C++ node. ```bash PYTHONPATH="examples/developer_guide/3_simple_cpp_stage/src" \ diff --git a/docs/source/developer_guide/guides/6_digital_fingerprinting_reference.md b/docs/source/developer_guide/guides/6_digital_fingerprinting_reference.md index b3c0213192..19965504d7 100644 --- a/docs/source/developer_guide/guides/6_digital_fingerprinting_reference.md +++ b/docs/source/developer_guide/guides/6_digital_fingerprinting_reference.md @@ -319,7 +319,7 @@ After training the generic model, individual user models can be trained. Individ ### Training Stages #### Training Stage (`DFPTraining`) -The {py:obj}`~dfp.stages.dfp_training.DFPTraining` trains a model for each incoming `DataFrame` and emits an instance of `morpheus.messages.multi_ae_message.MultiAEMessage` containing the trained model. +The {py:obj}`~dfp.stages.dfp_training.DFPTraining` trains a model for each incoming `DataFrame` and emits an instance of `morpheus.messages.ControlMessage` containing the trained model. | Argument | Type | Description | | -------- | ---- | ----------- | diff --git a/docs/source/getting_started.md b/docs/source/getting_started.md index a8c55c2741..b4d2b04cab 100644 --- a/docs/source/getting_started.md +++ b/docs/source/getting_started.md @@ -279,9 +279,9 @@ The output should contain lines similar to: Added source: └─> morpheus.MessageMeta Added stage: - └─ morpheus.MessageMeta -> morpheus.MultiMessage + └─ morpheus.MessageMeta -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiMessage -> morpheus.MessageMeta + └─ morpheus.ControlMessage -> morpheus.MessageMeta Added stage: └─ morpheus.MessageMeta -> morpheus.MessageMeta ====Building Segment Complete!==== @@ -295,10 +295,10 @@ $ morpheus run pipeline-nlp from-kafka --bootstrap_servers localhost:9092 --inpu Configuring Pipeline via CLI Starting pipeline via CLI... Ctrl+C to Quit E20221214 14:53:17.425515 452045 controller.cpp:62] exception caught while performing update - this is fatal - issuing kill -E20221214 14:53:17.425714 452045 context.cpp:125] rank: 0; size: 1; tid: 140065439217216; fid: 0x7f6144041000: set_exception issued; issuing kill to current runnable. Exception msg: RuntimeError: The to-file stage cannot handle input of . Accepted input types: (,) +E20221214 14:53:17.425714 452045 context.cpp:125] rank: 0; size: 1; tid: 140065439217216; fid: 0x7f6144041000: set_exception issued; issuing kill to current runnable. Exception msg: RuntimeError: The to-file stage cannot handle input of . Accepted input types: (,) ``` -This indicates that the `to-file` stage cannot accept the input type of `morpheus.messages.multi_message.MultiMessage`. This is because the `to-file` stage has no idea how to write that class to a file; it only knows how to write messages of type `morpheus.messages.message_meta.MessageMeta`. To ensure you have a valid pipeline, examine at the `Accepted input types: (,)` portion of the error message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `morpheus.messages.multi_message.MultiMessage`, to `morpheus.messages.message_meta.MessageMeta`, which is exactly what the `serialize` stage does. +This indicates that the `to-file` stage cannot accept the input type of `morpheus.messages.ControlMessage`. This is because the `to-file` stage has no idea how to write that class to a file; it only knows how to write messages of type `morpheus.messages.message_meta.MessageMeta`. To ensure you have a valid pipeline, examine at the `Accepted input types: (,)` portion of the error message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `morpheus.messages.ControlMessage`, to `morpheus.messages.message_meta.MessageMeta`, which is exactly what the `serialize` stage does. #### Pipeline Stages diff --git a/docs/source/modules/core/serialize.md b/docs/source/modules/core/serialize.md index ea7528a1eb..e7ba02d775 100644 --- a/docs/source/modules/core/serialize.md +++ b/docs/source/modules/core/serialize.md @@ -17,7 +17,7 @@ limitations under the License. ## Serialize Module -This module filters columns from a `MultiMessage` object, emitting a `MessageMeta`. +This module filters columns from a `ControlMessage` object, emitting a `MessageMeta`. ### Configurable Parameters diff --git a/examples/abp_nvsmi_detection/README.md b/examples/abp_nvsmi_detection/README.md index fd63821568..eab83358e2 100644 --- a/examples/abp_nvsmi_detection/README.md +++ b/examples/abp_nvsmi_detection/README.md @@ -203,17 +203,17 @@ CPP Enabled: True Added source: └─> morpheus.MessageMeta Added stage: - └─ morpheus.MessageMeta -> morpheus.MultiMessage + └─ morpheus.MessageMeta -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiMessage -> morpheus.MultiInferenceFILMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiInferenceFILMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MessageMeta + └─ morpheus.ControlMessage -> morpheus.MessageMeta Added stage: └─ morpheus.MessageMeta -> morpheus.MessageMeta ====Building Pipeline Complete!==== diff --git a/examples/developer_guide/3_simple_cpp_stage/src/run.py b/examples/developer_guide/3_simple_cpp_stage/src/run.py index 9f00ff00a2..a6fdf220d5 100755 --- a/examples/developer_guide/3_simple_cpp_stage/src/run.py +++ b/examples/developer_guide/3_simple_cpp_stage/src/run.py @@ -47,7 +47,7 @@ def run_pipeline(): pipeline.add_stage(DeserializeStage(config)) - # Add a PassThruStage where the input type is MultiMessage + # Add a PassThruStage where the input type is ControlMessage pipeline.add_stage(PassThruStage(config)) # Add monitor to record the performance of our new stage diff --git a/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/_lib/pass_thru.cpp b/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/_lib/pass_thru.cpp index 3d3c824870..adc9463465 100644 --- a/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/_lib/pass_thru.cpp +++ b/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/_lib/pass_thru.cpp @@ -17,7 +17,6 @@ #include "pass_thru.hpp" -#include #include #include #include // for pymrc::import @@ -25,6 +24,8 @@ #include #include +// IWYU pragma: no_include "morpheus/messages/control.hpp" + namespace morpheus_example { PassThruStage::PassThruStage() : PythonNode(base_t::op_factory_from_sub_fn(build_operator())) {} diff --git a/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/_lib/pass_thru.hpp b/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/_lib/pass_thru.hpp index 94b80a761c..9aec2e30fc 100644 --- a/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/_lib/pass_thru.hpp +++ b/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/_lib/pass_thru.hpp @@ -18,7 +18,7 @@ #pragma once #include // for exporting symbols -#include // for MultiMessage +#include // for ControlMessage #include // for Segment Builder #include // for Segment Object #include // for PythonNode @@ -37,10 +37,10 @@ using namespace morpheus; // pybind11 sets visibility to hidden by default; we want to export our symbols class MORPHEUS_EXPORT PassThruStage - : public mrc::pymrc::PythonNode, std::shared_ptr> + : public mrc::pymrc::PythonNode, std::shared_ptr> { public: - using base_t = mrc::pymrc::PythonNode, std::shared_ptr>; + using base_t = mrc::pymrc::PythonNode, std::shared_ptr>; using base_t::sink_type_t; using base_t::source_type_t; using base_t::subscribe_fn_t; diff --git a/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/pass_thru.py b/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/pass_thru.py index 3d22d25b8a..3b71aa727f 100644 --- a/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/pass_thru.py +++ b/examples/developer_guide/3_simple_cpp_stage/src/simple_cpp_stage/pass_thru.py @@ -20,7 +20,7 @@ from morpheus.cli.register_stage import register_stage from morpheus.config import Config -from morpheus.messages import MultiMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.pass_thru_type_mixin import PassThruTypeMixin from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.pipeline.stage_schema import StageSchema @@ -52,7 +52,7 @@ def on_data(self, message: typing.Any): return message def _build_single(self, builder: mrc.Builder, input_node: mrc.SegmentObject) -> mrc.SegmentObject: - if self._build_cpp_node() and issubclass(self._input_type, MultiMessage): + if self._build_cpp_node() and isinstance(self._input_type, ControlMessage): from ._lib import pass_thru_cpp node = pass_thru_cpp.PassThruStage(builder, self.unique_name) diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/messages/dfp_message_meta.py b/examples/digital_fingerprinting/production/morpheus/dfp/messages/dfp_message_meta.py new file mode 100644 index 0000000000..49b8c98ba9 --- /dev/null +++ b/examples/digital_fingerprinting/production/morpheus/dfp/messages/dfp_message_meta.py @@ -0,0 +1,42 @@ +# Copyright (c) 2021-2024, NVIDIA CORPORATION. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import dataclasses +import logging + +import pandas as pd + +from morpheus.messages.message_meta import MessageMeta + +logger = logging.getLogger(__name__) + + +@dataclasses.dataclass(init=False) +class DFPMessageMeta(MessageMeta, cpp_class=None): + """ + This class extends MessageMeta to also hold userid corresponding to batched metadata. + + Parameters + ---------- + df : pandas.DataFrame + Input rows in dataframe. + user_id : str + User id. + + """ + user_id: str + + def __init__(self, df: pd.DataFrame, user_id: str) -> None: + super().__init__(df) + self.user_id = user_id diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/messages/multi_dfp_message.py b/examples/digital_fingerprinting/production/morpheus/dfp/messages/multi_dfp_message.py deleted file mode 100644 index 1f5290893e..0000000000 --- a/examples/digital_fingerprinting/production/morpheus/dfp/messages/multi_dfp_message.py +++ /dev/null @@ -1,90 +0,0 @@ -# Copyright (c) 2021-2024, NVIDIA CORPORATION. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import dataclasses -import logging -import typing - -import pandas as pd - -from morpheus.messages.message_meta import MessageMeta -from morpheus.messages.multi_message import MultiMessage - -logger = logging.getLogger(__name__) - - -@dataclasses.dataclass(init=False) -class DFPMessageMeta(MessageMeta, cpp_class=None): - """ - This class extends MessageMeta to also hold userid corresponding to batched metadata. - - Parameters - ---------- - df : pandas.DataFrame - Input rows in dataframe. - user_id : str - User id. - - """ - user_id: str - - def __init__(self, df: pd.DataFrame, user_id: str) -> None: - super().__init__(df) - self.user_id = user_id - - def get_df(self): - return self.df - - def set_df(self, df): - self._df = df - - -@dataclasses.dataclass -class MultiDFPMessage(MultiMessage): - - def __init__(self, *, meta: MessageMeta, mess_offset: int = 0, mess_count: int = -1): - - if (not isinstance(meta, DFPMessageMeta)): - raise ValueError(f"`meta` must be an instance of `DFPMessageMeta` when creating {self.__class__.__name__}") - - super().__init__(meta=meta, mess_offset=mess_offset, mess_count=mess_count) - - @property - def user_id(self): - return typing.cast(DFPMessageMeta, self.meta).user_id - - def get_meta_dataframe(self): - return typing.cast(DFPMessageMeta, self.meta).get_df() - - def set_meta_dataframe(self, columns: typing.Union[None, str, typing.List[str]], value): - - df = typing.cast(DFPMessageMeta, self.meta).get_df() - - if (columns is None): - # Set all columns - df[list(value.columns)] = value - else: - # If its a single column or list of columns, this is the same - df[columns] = value - - typing.cast(DFPMessageMeta, self.meta).set_df(df) - - def copy_ranges(self, ranges: typing.List[typing.Tuple[int, int]]): - - sliced_rows = self.copy_meta_ranges(ranges) - - return self.from_message(self, - meta=DFPMessageMeta(sliced_rows, self.user_id), - mess_offset=0, - mess_count=len(sliced_rows)) diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_inference.py b/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_inference.py index a527e74b1c..48e6e03568 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_inference.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_inference.py @@ -27,8 +27,7 @@ from morpheus.utils.module_ids import MORPHEUS_MODULE_NAMESPACE from morpheus.utils.module_utils import register_module -from ..messages.multi_dfp_message import DFPMessageMeta -from ..messages.multi_dfp_message import MultiDFPMessage +from ..messages.dfp_message_meta import DFPMessageMeta from ..utils.module_ids import DFP_INFERENCE logger = logging.getLogger(f"morpheus.{__name__}") @@ -82,7 +81,7 @@ def get_model(user: str) -> ModelCache: fallback_user_ids=[fallback_user], timeout=model_fetch_timeout) - def process_task(control_message: ControlMessage): + def process_task(control_message: ControlMessage) -> ControlMessage: start_time = time.time() user_id = control_message.get_metadata("user_id") @@ -113,11 +112,11 @@ def process_task(control_message: ControlMessage): output_df = cudf.concat([payload.df, results_df[results_cols]], axis=1) # Create an output message to allow setting meta - output_message = MultiDFPMessage(meta=DFPMessageMeta(output_df, user_id=user_id), - mess_offset=0, - mess_count=payload.count) + output_message = ControlMessage() + output_message.payload(DFPMessageMeta(output_df, user_id=user_id)) - output_message.set_meta('model_version', f"{model_cache.reg_model_name}:{model_cache.reg_model_version}") + output_message.payload().set_data('model_version', + f"{model_cache.reg_model_name}:{model_cache.reg_model_version}") if logger.isEnabledFor(logging.DEBUG): load_model_duration = (post_model_time - start_time) * 1000.0 @@ -132,7 +131,7 @@ def process_task(control_message: ControlMessage): return output_message - def on_data(control_message: ControlMessage): + def on_data(control_message: ControlMessage) -> list[ControlMessage]: if (control_message is None): return None diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_postprocessing.py b/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_postprocessing.py index b0a8ce464c..908c0d61c5 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_postprocessing.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_postprocessing.py @@ -19,7 +19,7 @@ import mrc from mrc.core import operators as ops -from morpheus.messages.multi_ae_message import MultiAEMessage +from morpheus.messages import ControlMessage from morpheus.utils.module_ids import MORPHEUS_MODULE_NAMESPACE from morpheus.utils.module_utils import register_module @@ -48,17 +48,17 @@ def dfp_postprocessing(builder: mrc.Builder): timestamp_column_name = config.get("timestamp_column_name", "timestamp") - def process_events(message: MultiAEMessage): + def process_events(message: ControlMessage): # Assume that a filter stage preceedes this stage # df = message.get_meta() # df['event_time'] = datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ') # df.replace(np.nan, 'NaN', regex=True, inplace=True) # TODO(Devin): figure out why we are not able to set meta for a whole dataframe, but works for single column. # message.set_meta(None, df) - message.set_meta("event_time", datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ')) + message.payload().set_data("event_time", datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ')) - def on_data(message: MultiAEMessage): - if (not message or message.mess_count == 0): + def on_data(message: ControlMessage): + if (not message or message.payload().count == 0): return None start_time = time.time() @@ -69,11 +69,11 @@ def on_data(message: MultiAEMessage): if logger.isEnabledFor(logging.DEBUG): logger.debug("Completed postprocessing for user %s in %s ms. Event count: %s. Start: %s, End: %s", - message.meta.user_id, + message.get_metadata("user_id"), duration, - message.mess_count, - message.get_meta(timestamp_column_name).min(), - message.get_meta(timestamp_column_name).max()) + message.payload().count, + message.payload().get_data(timestamp_column_name).min(), + message.payload().get_data(timestamp_column_name).max()) return message diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_rolling_window.py b/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_rolling_window.py index 54a793b253..bfdbe13e2c 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_rolling_window.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_rolling_window.py @@ -99,7 +99,6 @@ def get_user_cache(user_id: str): def try_build_window(message: MessageMeta, user_id: str) -> typing.Union[MessageMeta, None]: with get_user_cache(user_id) as user_cache: - # incoming_df = message.get_df() with message.mutable_dataframe() as dfm: incoming_df = dfm.to_pandas() diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_training.py b/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_training.py index 4965a1d609..0ea7283d03 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_training.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/modules/dfp_training.py @@ -21,13 +21,11 @@ import cudf from morpheus.messages import ControlMessage -from morpheus.messages.multi_ae_message import MultiAEMessage from morpheus.models.dfencoder import AutoEncoder from morpheus.utils.module_ids import MORPHEUS_MODULE_NAMESPACE from morpheus.utils.module_utils import register_module -from ..messages.multi_dfp_message import DFPMessageMeta -from ..messages.multi_dfp_message import MultiDFPMessage +from ..messages.dfp_message_meta import DFPMessageMeta from ..utils.module_ids import DFP_TRAINING logger = logging.getLogger(f"morpheus.{__name__}") @@ -69,7 +67,7 @@ def dfp_training(builder: mrc.Builder): raise ValueError(f"validation_size={validation_size} should be a positive float in the " "(0, 1) range") - def on_data(control_message: ControlMessage): + def on_data(control_message: ControlMessage) -> list[ControlMessage]: if (control_message is None): return None @@ -101,13 +99,12 @@ def on_data(control_message: ControlMessage): logger.debug("Training AE model for user: '%s'... Complete.", user_id) dfp_mm = DFPMessageMeta(cudf.from_pandas(final_df), user_id=user_id) - multi_message = MultiDFPMessage(meta=dfp_mm, mess_offset=0, mess_count=len(final_df)) - output_message = MultiAEMessage(meta=multi_message.meta, - mess_offset=multi_message.mess_offset, - mess_count=multi_message.mess_count, - model=model, - train_scores_mean=0.0, - train_scores_std=1.0) + + output_message = ControlMessage() + output_message.payload(dfp_mm) + output_message.set_metadata("model", model) + output_message.set_metadata("train_scores_mean", 0.0) + output_message.set_metadata("train_scores_std", 1.0) output_messages.append(output_message) diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_inference_stage.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_inference_stage.py index c9e08a0842..7d37c9514d 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_inference_stage.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_inference_stage.py @@ -22,11 +22,10 @@ from mrc.core import operators as ops from morpheus.config import Config -from morpheus.messages.multi_ae_message import MultiAEMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.pipeline.stage_schema import StageSchema -from ..messages.multi_dfp_message import MultiDFPMessage from ..utils.model_cache import ModelCache from ..utils.model_cache import ModelManager @@ -70,10 +69,10 @@ def supports_cpp_node(self): def accepted_types(self) -> typing.Tuple: """Accepted input types.""" - return (MultiDFPMessage, ) + return (ControlMessage, ) def compute_schema(self, schema: StageSchema): - schema.output_schema.set_type(MultiAEMessage) + schema.output_schema.set_type(ControlMessage) def get_model(self, user: str) -> ModelCache: """ @@ -82,15 +81,15 @@ def get_model(self, user: str) -> ModelCache: """ return self._model_manager.load_user_model(self._client, user_id=user, fallback_user_ids=[self._fallback_user]) - def on_data(self, message: MultiDFPMessage) -> MultiDFPMessage: + def on_data(self, message: ControlMessage) -> ControlMessage: """Perform inference on the input data.""" - if (not message or message.mess_count == 0): + if (not message or message.payload().count == 0): return None start_time = time.time() - df_user = message.get_meta() - user_id = message.user_id + user_df = message.payload().df.to_pandas() + user_id = message.get_metadata("user_id") try: model_cache = self.get_model(user_id) @@ -106,16 +105,17 @@ def on_data(self, message: MultiDFPMessage) -> MultiDFPMessage: post_model_time = time.time() - results_df = loaded_model.get_results(df_user, return_abs=True) + results_df = loaded_model.get_results(user_df, return_abs=True) # Create an output message to allow setting meta - output_message = MultiDFPMessage(meta=message.meta, - mess_offset=message.mess_offset, - mess_count=message.mess_count) + output_message = ControlMessage() + output_message.payload(message.payload()) - output_message.set_meta(list(results_df.columns), results_df) + for col in list(results_df.columns): + output_message.payload().set_data(col, results_df[col]) - output_message.set_meta('model_version', f"{model_cache.reg_model_name}:{model_cache.reg_model_version}") + output_message.payload().set_data('model_version', + f"{model_cache.reg_model_name}:{model_cache.reg_model_version}") if logger.isEnabledFor(logging.DEBUG): load_model_duration = (post_model_time - start_time) * 1000.0 @@ -125,8 +125,8 @@ def on_data(self, message: MultiDFPMessage) -> MultiDFPMessage: user_id, load_model_duration, get_anomaly_duration, - df_user[self._config.ae.timestamp_column_name].min(), - df_user[self._config.ae.timestamp_column_name].max()) + user_df[self._config.ae.timestamp_column_name].min(), + user_df[self._config.ae.timestamp_column_name].max()) return output_message diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_mlflow_model_writer.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_mlflow_model_writer.py index 7d96144164..1983c31dbe 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_mlflow_model_writer.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_mlflow_model_writer.py @@ -21,7 +21,7 @@ from morpheus.config import Config from morpheus.controllers.mlflow_model_writer_controller import MLFlowModelWriterController -from morpheus.messages.multi_ae_message import MultiAEMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.pass_thru_type_mixin import PassThruTypeMixin from morpheus.pipeline.single_port_stage import SinglePortStage @@ -83,7 +83,7 @@ def supports_cpp_node(self): def accepted_types(self) -> typing.Tuple: """Types accepted by this stage""" - return (MultiAEMessage, ) + return (ControlMessage, ) def _build_single(self, builder: mrc.Builder, input_node: mrc.SegmentObject) -> mrc.SegmentObject: node = builder.make_node(self.unique_name, ops.map(self._controller.on_data)) diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_postprocessing_stage.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_postprocessing_stage.py index f69a64c69e..37a4bb8524 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_postprocessing_stage.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_postprocessing_stage.py @@ -19,12 +19,11 @@ from datetime import datetime import mrc -import numpy as np from mrc.core import operators as ops from morpheus.common import TypeId from morpheus.config import Config -from morpheus.messages.multi_ae_message import MultiAEMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.pass_thru_type_mixin import PassThruTypeMixin from morpheus.pipeline.single_port_stage import SinglePortStage @@ -57,18 +56,16 @@ def supports_cpp_node(self): def accepted_types(self) -> typing.Tuple: """Accepted input types.""" - return (MultiAEMessage, ) + return (ControlMessage, ) - def _process_events(self, message: MultiAEMessage): + def _process_events(self, message: ControlMessage): # Assume that a filter stage preceedes this stage - df = message.get_meta() - df['event_time'] = datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ') - df.replace(np.nan, 'NaN', regex=True, inplace=True) - message.set_meta(None, df) + with message.payload().mutable_dataframe() as df: + df['event_time'] = datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ') - def on_data(self, message: MultiAEMessage): + def on_data(self, message: ControlMessage): """Process a message.""" - if (not message or message.mess_count == 0): + if (not message or message.payload().count == 0): return None start_time = time.time() @@ -79,11 +76,11 @@ def on_data(self, message: MultiAEMessage): if logger.isEnabledFor(logging.DEBUG): logger.debug("Completed postprocessing for user %s in %s ms. Event count: %s. Start: %s, End: %s", - message.meta.user_id, + message.get_metadata("user_id"), duration, - message.mess_count, - message.get_meta(self._config.ae.timestamp_column_name).min(), - message.get_meta(self._config.ae.timestamp_column_name).max()) + message.payload().count, + message.payload().get_data(self._config.ae.timestamp_column_name).min(), + message.payload().get_data(self._config.ae.timestamp_column_name).max()) return message diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_preprocessing_stage.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_preprocessing_stage.py index e221fe8574..795308b7b6 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_preprocessing_stage.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_preprocessing_stage.py @@ -20,13 +20,12 @@ from mrc.core import operators as ops from morpheus.config import Config +from morpheus.messages import ControlMessage from morpheus.pipeline.pass_thru_type_mixin import PassThruTypeMixin from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.utils.column_info import DataFrameInputSchema from morpheus.utils.column_info import process_dataframe -from ..messages.multi_dfp_message import MultiDFPMessage - logger = logging.getLogger("morpheus.{__name__}") @@ -55,27 +54,28 @@ def supports_cpp_node(self): return False def accepted_types(self) -> typing.Tuple: - return (MultiDFPMessage, ) + return (ControlMessage, ) - def process_features(self, message: MultiDFPMessage): + def process_features(self, message: ControlMessage): if (message is None): return None start_time = time.time() # Process the columns - df_processed = process_dataframe(message.get_meta_dataframe(), self._input_schema) + df_processed = process_dataframe(message.payload().get_data(), self._input_schema) # Apply the new dataframe, only the rows in the offset - message.set_meta_dataframe(list(df_processed.columns), df_processed) + with message.payload().mutable_dataframe() as df: + df[list(df_processed.columns)] = df_processed if logger.isEnabledFor(logging.DEBUG): duration = (time.time() - start_time) * 1000.0 logger.debug("Preprocessed %s data for logs in %s to %s in %s ms", - message.mess_count, - message.get_meta(self._config.ae.timestamp_column_name).min(), - message.get_meta(self._config.ae.timestamp_column_name).max(), + message.payload().count, + message.payload().get_data(self._config.ae.timestamp_column_name).min(), + message.payload().get_data(self._config.ae.timestamp_column_name).max(), duration) return message diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_rolling_window_stage.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_rolling_window_stage.py index 7de0e63e44..59b98a57df 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_rolling_window_stage.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_rolling_window_stage.py @@ -23,11 +23,11 @@ from mrc.core import operators as ops from morpheus.config import Config +from morpheus.messages import ControlMessage from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.pipeline.stage_schema import StageSchema -from ..messages.multi_dfp_message import DFPMessageMeta -from ..messages.multi_dfp_message import MultiDFPMessage +from ..messages.dfp_message_meta import DFPMessageMeta from ..utils.cached_user_window import CachedUserWindow from ..utils.logging_timer import log_time @@ -92,7 +92,7 @@ def accepted_types(self) -> typing.Tuple: return (DFPMessageMeta, ) def compute_schema(self, schema: StageSchema): - schema.output_schema.set_type(MultiDFPMessage) + schema.output_schema.set_type(ControlMessage) @contextmanager def _get_user_cache(self, user_id: str) -> typing.Generator[CachedUserWindow, None, None]: @@ -116,13 +116,13 @@ def _get_user_cache(self, user_id: str) -> typing.Generator[CachedUserWindow, No # # When it returns, make sure to save # user_cache.save() - def _build_window(self, message: DFPMessageMeta) -> MultiDFPMessage: + def _build_window(self, message: DFPMessageMeta) -> ControlMessage: user_id = message.user_id with self._get_user_cache(user_id) as user_cache: - incoming_df = message.get_df() + incoming_df = message.get_data() # existing_df = user_cache.df if (not user_cache.append_dataframe(incoming_df=incoming_df)): @@ -161,11 +161,13 @@ def _build_window(self, message: DFPMessageMeta) -> MultiDFPMessage: "Rolling history can only be used with non-overlapping batches")) # Otherwise return a new message - return MultiDFPMessage(meta=DFPMessageMeta(df=train_df, user_id=user_id), - mess_offset=0, - mess_count=len(train_df)) + response_msg = ControlMessage() + response_msg.payload(DFPMessageMeta(df=train_df, user_id=user_id)) + response_msg.set_metadata("user_id", user_id) - def on_data(self, message: DFPMessageMeta) -> MultiDFPMessage: + return response_msg + + def on_data(self, message: DFPMessageMeta) -> ControlMessage: """ Emits a new message containing the rolling window for the user if and only if the history requirments are met, returns `None` otherwise. @@ -183,9 +185,9 @@ def on_data(self, message: DFPMessageMeta) -> MultiDFPMessage: len(message.df), message.df[self._config.ae.timestamp_column_name].min(), message.df[self._config.ae.timestamp_column_name].max(), - result.mess_count, - result.get_meta(self._config.ae.timestamp_column_name).min(), - result.get_meta(self._config.ae.timestamp_column_name).max(), + result.payload().count, + result.payload().get_data(self._config.ae.timestamp_column_name).min(), + result.payload().get_data(self._config.ae.timestamp_column_name).max(), ) else: # Dont print anything diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_split_users_stage.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_split_users_stage.py index af79fea01d..9a6a448bd5 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_split_users_stage.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_split_users_stage.py @@ -28,7 +28,7 @@ from morpheus.pipeline.stage_schema import StageSchema from morpheus.utils.type_aliases import DataFrameType -from ..messages.multi_dfp_message import DFPMessageMeta +from ..messages.dfp_message_meta import DFPMessageMeta from ..utils.logging_timer import log_time logger = logging.getLogger(f"morpheus.{__name__}") diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_training.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_training.py index a011b18ceb..7cd46b34ef 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_training.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_training.py @@ -12,25 +12,21 @@ # See the License for the specific language governing permissions and # limitations under the License. """Training stage for the DFP pipeline.""" -import base64 import logging -import pickle import typing import mrc from mrc.core import operators as ops from sklearn.model_selection import train_test_split +import cudf + from morpheus.config import Config from morpheus.messages import ControlMessage -from morpheus.messages.multi_ae_message import MultiAEMessage from morpheus.models.dfencoder import AutoEncoder from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.pipeline.stage_schema import StageSchema -from ..messages.multi_dfp_message import DFPMessageMeta -from ..messages.multi_dfp_message import MultiDFPMessage - logger = logging.getLogger(f"morpheus.{__name__}") @@ -91,57 +87,25 @@ def supports_cpp_node(self): def accepted_types(self) -> typing.Tuple: """Indicate which input message types this stage accepts.""" - return ( - ControlMessage, - MultiDFPMessage, - ) + return (ControlMessage, ) def compute_schema(self, schema: StageSchema): output_type = schema.input_type - if (output_type == MultiDFPMessage): - output_type = MultiAEMessage schema.output_schema.set_type(output_type) - def _dfp_multimessage_from_control_message(self, - control_message: ControlMessage) -> typing.Union[MultiDFPMessage, None]: - """Create a MultiDFPMessage from a ControlMessage.""" - ctrl_msg_user_id = control_message.get_metadata("user_id") - message_meta = control_message.payload() - - if (ctrl_msg_user_id is None or message_meta is None): - return None - - with message_meta.mutable_dataframe() as dfm: - msg_meta_df = dfm.to_pandas() - - msg_meta = DFPMessageMeta(msg_meta_df, user_id=str(ctrl_msg_user_id)) - message = MultiDFPMessage(meta=msg_meta, mess_offset=0, mess_count=len(msg_meta_df)) - - return message - - @typing.overload def on_data(self, message: ControlMessage) -> ControlMessage: - ... - - @typing.overload - def on_data(self, message: MultiDFPMessage) -> MultiAEMessage: - ... - - def on_data(self, message): """Train the model and attach it to the output message.""" - received_control_message = False - if (isinstance(message, ControlMessage)): - message = self._dfp_multimessage_from_control_message(message) - received_control_message = True - - if (message is None or message.mess_count == 0): + if (message is None or message.payload().count == 0): return None - user_id = message.user_id + user_id = message.get_metadata("user_id") model = AutoEncoder(**self._model_kwargs) - train_df = message.get_meta_dataframe() + train_df = message.payload().copy_dataframe() + + if isinstance(train_df, cudf.DataFrame): + train_df = train_df.to_pandas() # Only train on the feature columns train_df = train_df[train_df.columns.intersection(self._config.ae.feature_columns)] @@ -157,18 +121,10 @@ def on_data(self, message): model.fit(train_df, epochs=self._epochs, validation_data=validation_df, run_validation=run_validation) logger.debug("Training AE model for user: '%s'... Complete.", user_id) - if (received_control_message): - output_message = ControlMessage(message.meta) - output_message.set_metadata("user_id", user_id) - - pickled_model_bytes = pickle.dumps(model) - pickled_model_base64_str = base64.b64encode(pickled_model_bytes).decode('utf-8') - output_message.set_metadata("model", pickled_model_base64_str) - else: - output_message = MultiAEMessage(meta=message.meta, - mess_offset=message.mess_offset, - mess_count=message.mess_count, - model=model) + output_message = ControlMessage() + output_message.payload(message.payload()) + output_message.set_metadata("user_id", user_id) + output_message.set_metadata("model", model) return output_message diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_viz_postproc.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_viz_postproc.py index e8d747932f..0144c6e798 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_viz_postproc.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_viz_postproc.py @@ -23,7 +23,7 @@ from morpheus.config import Config from morpheus.io import serializers -from morpheus.messages.multi_ae_message import MultiAEMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.pass_thru_type_mixin import PassThruTypeMixin from morpheus.pipeline.single_port_stage import SinglePortStage @@ -71,33 +71,34 @@ def accepted_types(self) -> typing.Tuple: Returns ------- - typing.Tuple[`morpheus.pipeline.messages.MultiAEMessage`, ] + typing.Tuple[`morpheus.messages.ControlMessage`, ] Accepted input types. """ - return (MultiAEMessage, ) + return (ControlMessage, ) def supports_cpp_node(self): """Whether this stage supports a C++ node.""" return False - def _postprocess(self, x: MultiAEMessage) -> pd.DataFrame: + def _postprocess(self, msg: ControlMessage) -> pd.DataFrame: + pdf = msg.payload().copy_dataframe().to_pandas() viz_pdf = pd.DataFrame() - viz_pdf[["user", "time"]] = x.get_meta([self._user_column_name, self._timestamp_column]) + viz_pdf[["user", "time"]] = pdf[[self._user_column_name, self._timestamp_column]] datetimes = pd.to_datetime(viz_pdf["time"], errors='coerce') viz_pdf["period"] = datetimes.dt.to_period(self._period) for f in self._feature_columns: - viz_pdf[f + "_score"] = x.get_meta(f + "_z_loss") + viz_pdf[f + "_score"] = pdf[f + "_z_loss"] - viz_pdf["anomalyScore"] = x.get_meta("mean_abs_z") + viz_pdf["anomalyScore"] = pdf["mean_abs_z"] return viz_pdf - def _write_to_files(self, x: MultiAEMessage): + def _write_to_files(self, msg: ControlMessage): - df = self._postprocess(x) + df = self._postprocess(msg) unique_periods = df["period"].unique() @@ -116,7 +117,7 @@ def _write_to_files(self, x: MultiAEMessage): with open(output_file, "a", encoding='UTF-8') as out_file: out_file.writelines(lines) - return x + return msg def _build_single(self, builder: mrc.Builder, input_node: mrc.SegmentObject) -> mrc.SegmentObject: dfp_viz_postproc = builder.make_node(self.unique_name, ops.map(self._write_to_files)) diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/utils/config_generator.py b/examples/digital_fingerprinting/production/morpheus/dfp/utils/config_generator.py index ecd2143167..927c9bfbb7 100644 --- a/examples/digital_fingerprinting/production/morpheus/dfp/utils/config_generator.py +++ b/examples/digital_fingerprinting/production/morpheus/dfp/utils/config_generator.py @@ -25,7 +25,7 @@ from morpheus.config import Config from morpheus.config import ConfigAutoEncoder from morpheus.config import CppConfig -from morpheus.messages.multi_message import MultiMessage +from morpheus.messages import ControlMessage from morpheus.utils.module_ids import MORPHEUS_MODULE_NAMESPACE @@ -37,7 +37,7 @@ def __init__(self, config: Config, dfp_arg_parser: DFPArgParser, schema: Schema, self._encoding = encoding self._source_schema_str = pyobj2str(schema.source, encoding=encoding) self._preprocess_schema_str = pyobj2str(schema.preprocess, encoding=encoding) - self._input_message_type = pyobj2str(MultiMessage, encoding) + self._input_message_type = pyobj2str(ControlMessage, encoding) self._start_time_str = self._dfp_arg_parser.time_fields.start_time.isoformat() self._end_time_str = self._dfp_arg_parser.time_fields.end_time.isoformat() diff --git a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_inference.ipynb b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_inference.ipynb index 6dc4d4ff76..8e5413f71c 100644 --- a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_inference.ipynb +++ b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_inference.ipynb @@ -17,7 +17,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "b6c1cb50-74f2-445d-b865-8c22c3b3798b", "metadata": {}, "outputs": [], @@ -33,32 +33,10 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "102ce011-3ca3-4f96-a72d-de28fad32003", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/opt/conda/envs/morpheus/lib/python3.10/site-packages/merlin/dtypes/mappings/tf.py:52: UserWarning: Tensorflow dtype mappings did not load successfully due to an error: No module named 'tensorflow'\n", - " warn(f\"Tensorflow dtype mappings did not load successfully due to an error: {exc.msg}\")\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import functools\n", "import logging\n", @@ -131,7 +109,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "9ee00703-75c5-46fc-890c-86733da906c4", "metadata": {}, "outputs": [], @@ -178,7 +156,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "5ea82337", "metadata": {}, "outputs": [], @@ -207,7 +185,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "01abd537-9162-49dc-8e83-d9465592f1d5", "metadata": {}, "outputs": [], @@ -224,7 +202,7 @@ "config.ae = ConfigAutoEncoder()\n", "\n", "config.ae.feature_columns = [\n", - " \"appDisplayName\", \"clientAppUsed\", \"deviceDetailbrowser\", \"deviceDetaildisplayName\", \"deviceDetailoperatingSystem\", \"statusfailureReason\", \"appincrement\", \"locincrement\", \"logcount\", \n", + " \"appDisplayName\", \"clientAppUsed\", \"deviceDetailbrowser\", \"deviceDetaildisplayName\", \"deviceDetailoperatingSystem\", \"statusfailureReason\", \"appincrement\", \"locincrement\", \"logcount\",\n", "]\n", "config.ae.userid_column_name = \"username\"\n", "config.ae.timestamp_column_name = \"timestamp\"" @@ -232,7 +210,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "a73a4d53-32b6-4ab8-a5d7-c0104b31c69b", "metadata": {}, "outputs": [], @@ -264,7 +242,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "f7a0cb0a-e65a-444a-a06c-a4525d543790", "metadata": {}, "outputs": [], @@ -420,158 +398,10 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "id": "825390ad-ce64-4949-b324-33039ffdf264", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mUpdating list of available models...\u001b[0m\n", - "\u001b[2mUpdating list of available models... Done.\u001b[0m\n", - "====Registering Pipeline====\u001b[0m\n", - "====Building Pipeline====\u001b[0m\n", - "====Building Pipeline Complete!====\u001b[0m\n", - "\u001b[2mStarting! Time: 1689884489.742004\u001b[0m\n", - "====Registering Pipeline Complete!====\u001b[0m\n", - "====Starting Pipeline====\u001b[0m\n", - "====Pipeline Started====\u001b[0m\n", - "====Building Segment: linear_segment_0====\u001b[0m\n", - "Added source: \n", - " └─> fsspec.OpenFiles\u001b[0m\n", - "Added stage: , filename_regex=re.compile('(?P\\\\d{4})-(?P\\\\d{1,2})-(?P\\\\d{1,2})T(?P\\\\d{1,2})(:|_|\\\\.)(?P\\\\d{1,2})(:|_|\\\\.)(?P\\\\d{1,2})(?P\\\\.\\\\d{1,6})?Z')), period=D, sampling_rate_s=None, start_time=None, end_time=None, sampling=None)>\n", - " └─ fsspec.OpenFiles -> Tuple[fsspec.core.OpenFiles, int]\u001b[0m\n", - "Added stage: \n", - " └─ Tuple[fsspec.core.OpenFiles, int] -> pandas.DataFrame\u001b[0m\n", - "Added stage: \n", - " └─ pandas.DataFrame -> dfp.DFPMessageMeta\u001b[0m\n", - "Added stage: \n", - " └─ dfp.DFPMessageMeta -> dfp.MultiDFPMessage\u001b[0m\n", - "Added stage: \n", - " └─ dfp.MultiDFPMessage -> dfp.MultiDFPMessage\u001b[0m\n", - "Added stage: \n", - " └─ dfp.MultiDFPMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MultiAEMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MultiAEMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MultiAEMessage -> morpheus.MessageMeta\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MessageMeta -> morpheus.MessageMeta\u001b[0m\n", - "====Building Segment Complete!====\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 119, Cache: hit, Duration: 5.987644195556641 ms, Rate: 19874.260412518914 rows/s\u001b[0m\n", - "\u001b[2mPreallocating column event_time[TypeId.STRING]\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 209, Cache: hit, Duration: 26.411056518554688 ms, Rate: 7913.3524951252975 rows/s\u001b[0m\n", - "\u001b[2mPreallocating column event_time[TypeId.STRING]\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 119 rows from 2022-08-30 00:17:05.561523 to 2022-08-30 23:58:05.567378. Output: 16 users, rows/user min: 1, max: 18, avg: 7.44. Duration: 19.73 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 209 rows from 2022-08-31 00:21:46.153050 to 2022-08-31 23:54:50.435683. Output: 17 users, rows/user min: 1, max: 106, avg: 12.29. Duration: 37.80 ms\u001b[0m\n", - "\u001b[2mRolling window complete for acole@domain.com in 56.51 ms. Input: 13 rows from 2022-08-30 01:54:26.639083 to 2022-08-30 23:58:05.567378. Output: 13 rows from 2022-08-30 01:54:26.639083 to 2022-08-30 23:58:05.567378\u001b[0m\n", - "\u001b[2mRolling window complete for attacktarget@domain.com in 39.78 ms. Input: 18 rows from 2022-08-30 01:23:39.156080 to 2022-08-30 23:05:47.146155. Output: 18 rows from 2022-08-30 01:23:39.156080 to 2022-08-30 23:05:47.146155\u001b[0m\n", - "\u001b[2mRolling window complete for cfernandez@domain.com in 39.31 ms. Input: 5 rows from 2022-08-30 05:42:19.778470 to 2022-08-30 18:59:24.984779. Output: 5 rows from 2022-08-30 05:42:19.778470 to 2022-08-30 18:59:24.984779\u001b[0m\n", - "\u001b[2mRolling window complete for cperry@domain.com in 32.08 ms. Input: 13 rows from 2022-08-30 00:17:05.561523 to 2022-08-30 23:08:41.474570. Output: 13 rows from 2022-08-30 00:17:05.561523 to 2022-08-30 23:08:41.474570\u001b[0m\n", - "\u001b[2mRolling window complete for djohnson@domain.com in 36.43 ms. Input: 4 rows from 2022-08-30 03:48:15.868637 to 2022-08-30 23:49:02.282976. Output: 4 rows from 2022-08-30 03:48:15.868637 to 2022-08-30 23:49:02.282976\u001b[0m\n", - "\u001b[2mRolling window complete for jgonzalez@domain.com in 30.15 ms. Input: 4 rows from 2022-08-30 08:20:52.146591 to 2022-08-30 19:07:44.917975. Output: 4 rows from 2022-08-30 08:20:52.146591 to 2022-08-30 19:07:44.917975\u001b[0m\n", - "\u001b[2mRolling window complete for jmeyers@domain.com in 15.03 ms. Input: 3 rows from 2022-08-30 03:46:30.304629 to 2022-08-30 06:53:19.650392. Output: 3 rows from 2022-08-30 03:46:30.304629 to 2022-08-30 06:53:19.650392\u001b[0m\n", - "\u001b[2mRolling window complete for jtaylor@domain.com in 16.49 ms. Input: 11 rows from 2022-08-30 02:36:39.981855 to 2022-08-30 23:30:09.312791. Output: 11 rows from 2022-08-30 02:36:39.981855 to 2022-08-30 23:30:09.312791\u001b[0m\n", - "\u001b[2mRolling window complete for jwatson@domain.com in 19.03 ms. Input: 5 rows from 2022-08-30 01:33:19.402330 to 2022-08-30 18:39:54.214210. Output: 5 rows from 2022-08-30 01:33:19.402330 to 2022-08-30 18:39:54.214210\u001b[0m\n", - "\u001b[2mRolling window complete for khowell@domain.com in 16.43 ms. Input: 2 rows from 2022-08-30 05:54:21.257941 to 2022-08-30 08:11:17.157376. Output: 2 rows from 2022-08-30 05:54:21.257941 to 2022-08-30 08:11:17.157376\u001b[0m\n", - "\u001b[2mRolling window complete for ksheppard@domain.com in 19.81 ms. Input: 9 rows from 2022-08-30 07:42:51.522461 to 2022-08-30 23:03:40.411836. Output: 9 rows from 2022-08-30 07:42:51.522461 to 2022-08-30 23:03:40.411836\u001b[0m\n", - "\u001b[2mRolling window complete for mmartin@domain.com in 24.50 ms. Input: 9 rows from 2022-08-30 00:50:13.640088 to 2022-08-30 23:43:17.639540. Output: 9 rows from 2022-08-30 00:50:13.640088 to 2022-08-30 23:43:17.639540\u001b[0m\n", - "\u001b[2mRolling window complete for nblack@domain.com in 30.15 ms. Input: 1 rows from 2022-08-30 08:52:11.647522 to 2022-08-30 08:52:11.647522. Output: 1 rows from 2022-08-30 08:52:11.647522 to 2022-08-30 08:52:11.647522\u001b[0m\n", - "\u001b[2mRolling window complete for tprice@domain.com in 27.78 ms. Input: 18 rows from 2022-08-30 00:17:54.840684 to 2022-08-30 21:29:52.981331. Output: 18 rows from 2022-08-30 00:17:54.840684 to 2022-08-30 21:29:52.981331\u001b[0m\n", - "\u001b[2mRolling window complete for tproctor@domain.com in 21.65 ms. Input: 2 rows from 2022-08-30 04:41:31.606683 to 2022-08-30 19:11:59.114436. Output: 2 rows from 2022-08-30 04:41:31.606683 to 2022-08-30 19:11:59.114436\u001b[0m\n", - "\u001b[2mRolling window complete for vramirez@domain.com in 21.97 ms. Input: 2 rows from 2022-08-30 14:15:07.746372 to 2022-08-30 15:29:51.074456. Output: 2 rows from 2022-08-30 14:15:07.746372 to 2022-08-30 15:29:51.074456\u001b[0m\n", - "\u001b[2mRolling window complete for aanderson@domain.com in 25.80 ms. Input: 1 rows from 2022-08-31 05:52:54.160038 to 2022-08-31 05:52:54.160038. Output: 1 rows from 2022-08-31 05:52:54.160038 to 2022-08-31 05:52:54.160038\u001b[0m\n", - "\u001b[2mRolling window complete for acole@domain.com in 40.76 ms. Input: 15 rows from 2022-08-31 03:24:21.299927 to 2022-08-31 23:36:39.523067. Output: 16 rows from 2022-08-30 23:58:05.567378 to 2022-08-31 23:36:39.523067\u001b[0m\n", - "\u001b[2mRolling window complete for attacktarget@domain.com in 20.22 ms. Input: 106 rows from 2022-08-31 00:21:46.153050 to 2022-08-31 23:54:50.435683. Output: 106 rows from 2022-08-31 00:21:46.153050 to 2022-08-31 23:54:50.435683\u001b[0m\n", - "\u001b[2mRolling window complete for cfernandez@domain.com in 15.80 ms. Input: 7 rows from 2022-08-31 02:34:42.807200 to 2022-08-31 21:38:46.557841. Output: 7 rows from 2022-08-31 02:34:42.807200 to 2022-08-31 21:38:46.557841\u001b[0m\n", - "\u001b[2mRolling window complete for cperry@domain.com in 16.10 ms. Input: 13 rows from 2022-08-31 04:41:40.465981 to 2022-08-31 22:54:52.727400. Output: 14 rows from 2022-08-30 23:08:41.474570 to 2022-08-31 22:54:52.727400\u001b[0m\n", - "\u001b[2mRolling window complete for djohnson@domain.com in 23.19 ms. Input: 5 rows from 2022-08-31 06:20:09.178836 to 2022-08-31 22:43:42.029787. Output: 6 rows from 2022-08-30 23:49:02.282976 to 2022-08-31 22:43:42.029787\u001b[0m\n", - "\u001b[2mRolling window complete for jgonzalez@domain.com in 17.31 ms. Input: 4 rows from 2022-08-31 04:30:10.985203 to 2022-08-31 23:19:25.545084. Output: 4 rows from 2022-08-31 04:30:10.985203 to 2022-08-31 23:19:25.545084\u001b[0m\n", - "\u001b[2mRolling window complete for jmeyers@domain.com in 19.41 ms. Input: 5 rows from 2022-08-31 01:13:31.298799 to 2022-08-31 20:13:01.877714. Output: 5 rows from 2022-08-31 01:13:31.298799 to 2022-08-31 20:13:01.877714\u001b[0m\n", - "\u001b[2mRolling window complete for jtaylor@domain.com in 16.45 ms. Input: 8 rows from 2022-08-31 00:40:18.664488 to 2022-08-31 19:15:54.087216. Output: 12 rows from 2022-08-30 20:57:14.784182 to 2022-08-31 19:15:54.087216\u001b[0m\n", - "\u001b[2mRolling window complete for jwatson@domain.com in 15.58 ms. Input: 7 rows from 2022-08-31 00:45:11.336629 to 2022-08-31 22:21:08.584673. Output: 7 rows from 2022-08-31 00:45:11.336629 to 2022-08-31 22:21:08.584673\u001b[0m\n", - "\u001b[2mRolling window complete for khowell@domain.com in 18.24 ms. Input: 8 rows from 2022-08-31 03:35:26.367587 to 2022-08-31 23:17:42.874154. Output: 8 rows from 2022-08-31 03:35:26.367587 to 2022-08-31 23:17:42.874154\u001b[0m\n", - "\u001b[2mRolling window complete for ksheppard@domain.com in 15.90 ms. Input: 4 rows from 2022-08-31 13:44:00.222145 to 2022-08-31 21:34:47.485540. Output: 5 rows from 2022-08-30 23:03:40.411836 to 2022-08-31 21:34:47.485540\u001b[0m\n", - "\u001b[2mRolling window complete for mmartin@domain.com in 15.99 ms. Input: 5 rows from 2022-08-31 02:15:16.347789 to 2022-08-31 14:42:54.953795. Output: 8 rows from 2022-08-30 20:23:39.816191 to 2022-08-31 14:42:54.953795\u001b[0m\n", - "\u001b[2mRolling window complete for rrojas@domain.com in 15.63 ms. Input: 1 rows from 2022-08-31 20:24:29.786977 to 2022-08-31 20:24:29.786977. Output: 1 rows from 2022-08-31 20:24:29.786977 to 2022-08-31 20:24:29.786977\u001b[0m\n", - "\u001b[2mRolling window complete for tprice@domain.com in 15.75 ms. Input: 14 rows from 2022-08-31 01:19:23.476142 to 2022-08-31 23:34:38.002221. Output: 14 rows from 2022-08-31 01:19:23.476142 to 2022-08-31 23:34:38.002221\u001b[0m\n", - "\u001b[2mRolling window complete for tproctor@domain.com in 15.59 ms. Input: 3 rows from 2022-08-31 04:08:07.249641 to 2022-08-31 23:53:39.607409. Output: 3 rows from 2022-08-31 04:08:07.249641 to 2022-08-31 23:53:39.607409\u001b[0m\n", - "\u001b[2mRolling window complete for vramirez@domain.com in 18.51 ms. Input: 3 rows from 2022-08-31 19:38:39.256240 to 2022-08-31 20:40:37.586442. Output: 3 rows from 2022-08-31 19:38:39.256240 to 2022-08-31 20:40:37.586442\u001b[0m\n", - "\u001b[2mPreprocessed 13 data for logs in 2022-08-30 01:54:26.639083 to 2022-08-30 23:58:05.567378 in 3335.460424423218 ms\u001b[0m\n", - "\u001b[2mPreprocessed 18 data for logs in 2022-08-30 01:23:39.156080 to 2022-08-30 23:05:47.146155 in 253.5254955291748 ms\u001b[0m\n", - "\u001b[2mPreprocessed 5 data for logs in 2022-08-30 05:42:19.778470 to 2022-08-30 18:59:24.984779 in 207.64708518981934 ms\u001b[0m\n", - "\u001b[2mPreprocessed 13 data for logs in 2022-08-30 00:17:05.561523 to 2022-08-30 23:08:41.474570 in 197.0674991607666 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 03:48:15.868637 to 2022-08-30 23:49:02.282976 in 194.39959526062012 ms\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-azure-generic_user:52' in 931.4601421356201 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 08:20:52.146591 to 2022-08-30 19:07:44.917975 in 603.0228137969971 ms\u001b[0m\n", - "\u001b[2mPreprocessed 3 data for logs in 2022-08-30 03:46:30.304629 to 2022-08-30 06:53:19.650392 in 339.5671844482422 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user acole@domain.com. Model load: 961.7404937744141 ms, Model infer: 969.8805809020996 ms. Start: 2022-08-30 01:54:26.639083, End: 2022-08-30 23:58:05.567378\u001b[0m\n", - "\u001b[2mPreprocessed 11 data for logs in 2022-08-30 02:36:39.981855 to 2022-08-30 23:30:09.312791 in 272.36056327819824 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user attacktarget@domain.com. Model load: 1.6188621520996094 ms, Model infer: 260.5094909667969 ms. Start: 2022-08-30 01:23:39.156080, End: 2022-08-30 23:05:47.146155\u001b[0m\n", - "\u001b[2mPreprocessed 5 data for logs in 2022-08-30 01:33:19.402330 to 2022-08-30 18:39:54.214210 in 255.89466094970703 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user cfernandez@domain.com. Model load: 0.980377197265625 ms, Model infer: 294.6741580963135 ms. Start: 2022-08-30 05:42:19.778470, End: 2022-08-30 18:59:24.984779\u001b[0m\n", - "\u001b[2mPreprocessed 2 data for logs in 2022-08-30 05:54:21.257941 to 2022-08-30 08:11:17.157376 in 284.97791290283203 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user cperry@domain.com. Model load: 1.4879703521728516 ms, Model infer: 325.8018493652344 ms. Start: 2022-08-30 00:17:05.561523, End: 2022-08-30 23:08:41.474570\u001b[0m\n", - "\u001b[2mPreprocessed 9 data for logs in 2022-08-30 07:42:51.522461 to 2022-08-30 23:03:40.411836 in 252.02345848083496 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user djohnson@domain.com. Model load: 2.850055694580078 ms, Model infer: 268.95689964294434 ms. Start: 2022-08-30 03:48:15.868637, End: 2022-08-30 23:49:02.282976\u001b[0m\n", - "\u001b[2mPreprocessed 9 data for logs in 2022-08-30 00:50:13.640088 to 2022-08-30 23:43:17.639540 in 250.1983642578125 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user jgonzalez@domain.com. Model load: 1.3751983642578125 ms, Model infer: 222.09739685058594 ms. Start: 2022-08-30 08:20:52.146591, End: 2022-08-30 19:07:44.917975\u001b[0m\n", - "\u001b[2mPreprocessed 1 data for logs in 2022-08-30 08:52:11.647522 to 2022-08-30 08:52:11.647522 in 283.4293842315674 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user jmeyers@domain.com. Model load: 2.2835731506347656 ms, Model infer: 235.66031455993652 ms. Start: 2022-08-30 03:46:30.304629, End: 2022-08-30 06:53:19.650392\u001b[0m\n", - "\u001b[2mPreprocessed 18 data for logs in 2022-08-30 00:17:54.840684 to 2022-08-30 21:29:52.981331 in 267.27938652038574 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user jtaylor@domain.com. Model load: 17.877817153930664 ms, Model infer: 245.54777145385742 ms. Start: 2022-08-30 02:36:39.981855, End: 2022-08-30 23:30:09.312791\u001b[0m\n", - "\u001b[2mPreprocessed 2 data for logs in 2022-08-30 04:41:31.606683 to 2022-08-30 19:11:59.114436 in 251.60717964172363 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user jwatson@domain.com. Model load: 1.8315315246582031 ms, Model infer: 291.37492179870605 ms. Start: 2022-08-30 01:33:19.402330, End: 2022-08-30 18:39:54.214210\u001b[0m\n", - "\u001b[2mPreprocessed 2 data for logs in 2022-08-30 14:15:07.746372 to 2022-08-30 15:29:51.074456 in 342.7729606628418 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user khowell@domain.com. Model load: 8.133172988891602 ms, Model infer: 290.1947498321533 ms. Start: 2022-08-30 05:54:21.257941, End: 2022-08-30 08:11:17.157376\u001b[0m\n", - "\u001b[2mPreprocessed 1 data for logs in 2022-08-31 05:52:54.160038 to 2022-08-31 05:52:54.160038 in 272.71080017089844 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user ksheppard@domain.com. Model load: 6.1511993408203125 ms, Model infer: 277.3275375366211 ms. Start: 2022-08-30 07:42:51.522461, End: 2022-08-30 23:03:40.411836\u001b[0m\n", - "\u001b[2mPreprocessed 16 data for logs in 2022-08-30 23:58:05.567378 to 2022-08-31 23:36:39.523067 in 256.84309005737305 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user mmartin@domain.com. Model load: 2.0291805267333984 ms, Model infer: 317.22211837768555 ms. Start: 2022-08-30 00:50:13.640088, End: 2022-08-30 23:43:17.639540\u001b[0m\n", - "\u001b[2mPreprocessed 106 data for logs in 2022-08-31 00:21:46.153050 to 2022-08-31 23:54:50.435683 in 280.6124687194824 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user nblack@domain.com. Model load: 5.743741989135742 ms, Model infer: 243.32642555236816 ms. Start: 2022-08-30 08:52:11.647522, End: 2022-08-30 08:52:11.647522\u001b[0m\n", - "\u001b[2mPreprocessed 7 data for logs in 2022-08-31 02:34:42.807200 to 2022-08-31 21:38:46.557841 in 246.90985679626465 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user tprice@domain.com. Model load: 1.8532276153564453 ms, Model infer: 387.65978813171387 ms. Start: 2022-08-30 00:17:54.840684, End: 2022-08-30 21:29:52.981331\u001b[0m\n", - "\u001b[2mPreprocessed 14 data for logs in 2022-08-30 23:08:41.474570 to 2022-08-31 22:54:52.727400 in 394.8497772216797 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user tproctor@domain.com. Model load: 1.3818740844726562 ms, Model infer: 235.9616756439209 ms. Start: 2022-08-30 04:41:31.606683, End: 2022-08-30 19:11:59.114436\u001b[0m\n", - "\u001b[2mPreprocessed 6 data for logs in 2022-08-30 23:49:02.282976 to 2022-08-31 22:43:42.029787 in 254.03809547424316 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user vramirez@domain.com. Model load: 1.1930465698242188 ms, Model infer: 219.1905975341797 ms. Start: 2022-08-30 14:15:07.746372, End: 2022-08-30 15:29:51.074456\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-31 04:30:10.985203 to 2022-08-31 23:19:25.545084 in 238.00039291381836 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user aanderson@domain.com. Model load: 1.260519027709961 ms, Model infer: 229.4771671295166 ms. Start: 2022-08-31 05:52:54.160038, End: 2022-08-31 05:52:54.160038\u001b[0m\n", - "\u001b[2mPreprocessed 5 data for logs in 2022-08-31 01:13:31.298799 to 2022-08-31 20:13:01.877714 in 256.80041313171387 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user acole@domain.com. Model load: 0.9512901306152344 ms, Model infer: 303.81274223327637 ms. Start: 2022-08-30 23:58:05.567378, End: 2022-08-31 23:36:39.523067\u001b[0m\n", - "\u001b[2mPreprocessed 12 data for logs in 2022-08-30 20:57:14.784182 to 2022-08-31 19:15:54.087216 in 239.56894874572754 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user attacktarget@domain.com. Model load: 1.0290145874023438 ms, Model infer: 307.4831962585449 ms. Start: 2022-08-31 00:21:46.153050, End: 2022-08-31 23:54:50.435683\u001b[0m\n", - "\u001b[2mCompleted postprocessing for user attacktarget@domain.com in 17.421483993530273 ms. Event count: 93. Start: 2022-08-31 03:51:46.250270, End: 2022-08-31 23:54:50.435683\u001b[0m\n", - "\u001b[2mPreprocessed 7 data for logs in 2022-08-31 00:45:11.336629 to 2022-08-31 22:21:08.584673 in 295.06611824035645 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user cfernandez@domain.com. Model load: 3.189563751220703 ms, Model infer: 296.0059642791748 ms. Start: 2022-08-31 02:34:42.807200, End: 2022-08-31 21:38:46.557841\u001b[0m\n", - "\u001b[2mPreprocessed 8 data for logs in 2022-08-31 03:35:26.367587 to 2022-08-31 23:17:42.874154 in 265.52557945251465 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user cperry@domain.com. Model load: 3.6423206329345703 ms, Model infer: 281.1152935028076 ms. Start: 2022-08-30 23:08:41.474570, End: 2022-08-31 22:54:52.727400\u001b[0m\n", - "\u001b[2mPreprocessed 5 data for logs in 2022-08-30 23:03:40.411836 to 2022-08-31 21:34:47.485540 in 308.09760093688965 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user djohnson@domain.com. Model load: 4.4879913330078125 ms, Model infer: 256.29544258117676 ms. Start: 2022-08-30 23:49:02.282976, End: 2022-08-31 22:43:42.029787\u001b[0m\n", - "\u001b[2mPreprocessed 8 data for logs in 2022-08-30 20:23:39.816191 to 2022-08-31 14:42:54.953795 in 268.75758171081543 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user jgonzalez@domain.com. Model load: 2.095937728881836 ms, Model infer: 248.9147186279297 ms. Start: 2022-08-31 04:30:10.985203, End: 2022-08-31 23:19:25.545084\u001b[0m\n", - "\u001b[2mPreprocessed 1 data for logs in 2022-08-31 20:24:29.786977 to 2022-08-31 20:24:29.786977 in 243.15190315246582 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user jmeyers@domain.com. Model load: 0.9484291076660156 ms, Model infer: 275.9404182434082 ms. Start: 2022-08-31 01:13:31.298799, End: 2022-08-31 20:13:01.877714\u001b[0m\n", - "\u001b[2mPreprocessed 14 data for logs in 2022-08-31 01:19:23.476142 to 2022-08-31 23:34:38.002221 in 253.52907180786133 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user jtaylor@domain.com. Model load: 1.3632774353027344 ms, Model infer: 281.60905838012695 ms. Start: 2022-08-30 20:57:14.784182, End: 2022-08-31 19:15:54.087216\u001b[0m\n", - "\u001b[2mPreprocessed 3 data for logs in 2022-08-31 04:08:07.249641 to 2022-08-31 23:53:39.607409 in 278.83005142211914 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user jwatson@domain.com. Model load: 1.7066001892089844 ms, Model infer: 236.19461059570312 ms. Start: 2022-08-31 00:45:11.336629, End: 2022-08-31 22:21:08.584673\u001b[0m\n", - "\u001b[2mPreprocessed 3 data for logs in 2022-08-31 19:38:39.256240 to 2022-08-31 20:40:37.586442 in 239.87770080566406 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user khowell@domain.com. Model load: 0.9975433349609375 ms, Model infer: 72.16072082519531 ms. Start: 2022-08-31 03:35:26.367587, End: 2022-08-31 23:17:42.874154\u001b[0m\n", - "\u001b[2mCompleted inference for user ksheppard@domain.com. Model load: 0.8676052093505859 ms, Model infer: 66.96581840515137 ms. Start: 2022-08-30 23:03:40.411836, End: 2022-08-31 21:34:47.485540\u001b[0m\n", - "\u001b[2mCompleted inference for user mmartin@domain.com. Model load: 0.9112358093261719 ms, Model infer: 66.51043891906738 ms. Start: 2022-08-30 20:23:39.816191, End: 2022-08-31 14:42:54.953795\u001b[0m\n", - "\u001b[2mCompleted inference for user rrojas@domain.com. Model load: 0.9071826934814453 ms, Model infer: 64.89896774291992 ms. Start: 2022-08-31 20:24:29.786977, End: 2022-08-31 20:24:29.786977\u001b[0m\n", - "\u001b[2mCompleted inference for user tprice@domain.com. Model load: 0.9150505065917969 ms, Model infer: 67.11506843566895 ms. Start: 2022-08-31 01:19:23.476142, End: 2022-08-31 23:34:38.002221\u001b[0m\n", - "\u001b[2mCompleted inference for user tproctor@domain.com. Model load: 0.9348392486572266 ms, Model infer: 64.44740295410156 ms. Start: 2022-08-31 04:08:07.249641, End: 2022-08-31 23:53:39.607409\u001b[0m\n", - "\u001b[2mCompleted inference for user vramirez@domain.com. Model load: 1.6913414001464844 ms, Model infer: 62.154293060302734 ms. Start: 2022-08-31 19:38:39.256240, End: 2022-08-31 20:40:37.586442\u001b[0m\n", - "====Pipeline Complete====\u001b[0m\n" - ] - } - ], + "outputs": [], "source": [ "# Create a linear pipeline object\n", "pipeline = LinearPipeline(config)\n", @@ -645,9 +475,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python [conda env:morpheus] *", + "display_name": "morpheus", "language": "python", - "name": "conda-env-morpheus-py" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -659,12 +489,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" - }, - "vscode": { - "interpreter": { - "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" - } + "version": "3.10.14" } }, "nbformat": 4, diff --git a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_integrated_training.ipynb b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_integrated_training.ipynb index 1d77c364d9..75c14999a6 100644 --- a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_integrated_training.ipynb +++ b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_integrated_training.ipynb @@ -17,7 +17,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "b6c1cb50-74f2-445d-b865-8c22c3b3798b", "metadata": {}, "outputs": [], @@ -33,39 +33,17 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "102ce011-3ca3-4f96-a72d-de28fad32003", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/opt/conda/envs/morpheus/lib/python3.10/site-packages/merlin/dtypes/mappings/tf.py:52: UserWarning: Tensorflow dtype mappings did not load successfully due to an error: No module named 'tensorflow'\n", - " warn(f\"Tensorflow dtype mappings did not load successfully due to an error: {exc.msg}\")\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import logging\n", "import typing\n", "import cudf\n", "from datetime import datetime\n", "\n", - "# When segment modules are imported, they're added to the module registry. \n", + "# When segment modules are imported, they're added to the module registry.\n", "# To avoid flake8 warnings about unused code, the noqa flag is used during import.\n", "import dfp.modules # noqa: F401\n", "from dfp.utils.config_generator import ConfigGenerator\n", @@ -116,7 +94,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "9ee00703-75c5-46fc-890c-86733da906c4", "metadata": {}, "outputs": [], @@ -175,7 +153,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "6f5b67b8", "metadata": {}, "outputs": [], @@ -200,7 +178,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "01abd537-9162-49dc-8e83-d9465592f1d5", "metadata": {}, "outputs": [], @@ -357,7 +335,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "a73a4d53-32b6-4ab8-a5d7-c0104b31c69b", "metadata": {}, "outputs": [], @@ -398,7 +376,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "825390ad-ce64-4949-b324-33039ffdf264", "metadata": {}, "outputs": [], @@ -478,439 +456,12 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "id": "960f14ef", "metadata": { "tags": [] }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Unnamed: 0timestampusernameappDisplayNameclientAppUseddeviceDetailbrowserdeviceDetaildisplayNamedeviceDetailoperatingSystemstatusfailureReasonlogcount...locincrement_losslogcount_z_lossclientAppUsed_predclientAppUsed_lossappDisplayName_lossstatusfailureReason_lossappincrement_lossappDisplayName_z_lossmodel_versionevent_time
0102022-08-30T08:31:49.739107000Ztprice@domain.comSD ECDNBrowserRich Client 3.19.8.16603THOMASPRICE-LTWindows 10<NA>3...0.6106700.027359Mobile Apps and Desktop clients1.1748572.7803671.6055600.7359841.979742DFP-azure-tprice@domain.com:12023-07-20T20:24:22Z
1182022-08-31T00:21:46.153050000Zattacktarget@domain.comCitrix ShareFileMobile Apps and Desktop clientsChrome 100.0.4896ATTACKTARGET-LTWindows 10<NA>0...5.9148630.633756Mobile Apps and Desktop clients0.5913061.6978180.6075442.2778892.968441DFP-azure-attacktarget@domain.com:22023-07-20T20:25:09Z
2202022-08-31T00:27:44.169328000Zattacktarget@domain.comAltouraMobile Apps and Desktop clientsChrome 100.0.4896ATTACKTARGET-LTWindows 10<NA>2...5.9361030.109216Mobile Apps and Desktop clients0.6039261.7131310.6213003.9882652.393491DFP-azure-attacktarget@domain.com:22023-07-20T20:25:09Z
3212022-08-31T00:44:58.235390000Zattacktarget@domain.comLumAppsMobile Apps and Desktop clientsChrome 100.0.4896ATTACKTARGET-LTWindows 10<NA>3...5.9422590.376158Mobile Apps and Desktop clients0.5996711.6970260.61422514.0265832.998179DFP-azure-attacktarget@domain.com:22023-07-20T20:25:09Z
4222022-08-31T00:47:59.808993000Zattacktarget@domain.comLinux Foundation TrainingMobile Apps and Desktop clientsChrome 100.0.4896ATTACKTARGET-LTWindows 10<NA>4...5.9365290.562854Mobile Apps and Desktop clients0.5894491.6834570.59991630.2737123.507625DFP-azure-attacktarget@domain.com:22023-07-20T20:25:09Z
..................................................................
1041232022-08-31T23:54:50.435683000Zattacktarget@domain.comBoxMobile Apps and Desktop clientsChrome 100.0.4896ATTACKTARGET-LTWindows 10<NA>16...6.0325032.419621Mobile Apps and Desktop clients0.4181211.6081950.420876619.9979256.333486DFP-azure-attacktarget@domain.com:22023-07-20T20:26:07Z
1053122022-08-31T22:23:06.806213000Zattacktarget@domain.comStormboardMobile Apps and Desktop clientsChrome 100.0.4896ATTACKTARGET-LTWindows 10<NA>9...0.5030850.252396Mobile Apps and Desktop clients0.8073814.4789061.35857931.5628991.308199DFP-azure-generic_user:532023-07-20T20:26:10Z
1063132022-08-31T22:43:42.029787000Zdjohnson@domain.comBoxMobile Apps and Desktop clientsEdge 87.11424DAVIDJOHNSON-LTWindows 10<NA>0...0.5146290.155598Mobile Apps and Desktop clients0.7997284.3847701.43004436.9379810.934225DFP-azure-generic_user:532023-07-20T20:26:10Z
1073142022-08-31T22:43:47.300043000Zattacktarget@domain.comCallPleaseMobile Apps and Desktop clientsChrome 100.0.4896ATTACKTARGET-LTWindows 10<NA>10...0.5352880.019980Mobile Apps and Desktop clients0.7839104.3783471.34596243.1526680.908708DFP-azure-generic_user:532023-07-20T20:26:10Z
1083152022-08-31T22:45:37.459815000Zacole@domain.comFoko RetailBrowserRich Client 4.39.0.0<NA>Windows10<NA>0...0.5078790.192614Mobile Apps and Desktop clients1.2921414.2077781.42407042.7514690.231083DFP-azure-generic_user:532023-07-20T20:26:10Z
\n", - "

109 rows × 44 columns

\n", - "
" - ], - "text/plain": [ - " Unnamed: 0 timestamp username \\\n", - "0 10 2022-08-30T08:31:49.739107000Z tprice@domain.com \n", - "1 18 2022-08-31T00:21:46.153050000Z attacktarget@domain.com \n", - "2 20 2022-08-31T00:27:44.169328000Z attacktarget@domain.com \n", - "3 21 2022-08-31T00:44:58.235390000Z attacktarget@domain.com \n", - "4 22 2022-08-31T00:47:59.808993000Z attacktarget@domain.com \n", - ".. ... ... ... \n", - "104 123 2022-08-31T23:54:50.435683000Z attacktarget@domain.com \n", - "105 312 2022-08-31T22:23:06.806213000Z attacktarget@domain.com \n", - "106 313 2022-08-31T22:43:42.029787000Z djohnson@domain.com \n", - "107 314 2022-08-31T22:43:47.300043000Z attacktarget@domain.com \n", - "108 315 2022-08-31T22:45:37.459815000Z acole@domain.com \n", - "\n", - " appDisplayName clientAppUsed \\\n", - "0 SD ECDN Browser \n", - "1 Citrix ShareFile Mobile Apps and Desktop clients \n", - "2 Altoura Mobile Apps and Desktop clients \n", - "3 LumApps Mobile Apps and Desktop clients \n", - "4 Linux Foundation Training Mobile Apps and Desktop clients \n", - ".. ... ... \n", - "104 Box Mobile Apps and Desktop clients \n", - "105 Stormboard Mobile Apps and Desktop clients \n", - "106 Box Mobile Apps and Desktop clients \n", - "107 CallPlease Mobile Apps and Desktop clients \n", - "108 Foko Retail Browser \n", - "\n", - " deviceDetailbrowser deviceDetaildisplayName \\\n", - "0 Rich Client 3.19.8.16603 THOMASPRICE-LT \n", - "1 Chrome 100.0.4896 ATTACKTARGET-LT \n", - "2 Chrome 100.0.4896 ATTACKTARGET-LT \n", - "3 Chrome 100.0.4896 ATTACKTARGET-LT \n", - "4 Chrome 100.0.4896 ATTACKTARGET-LT \n", - ".. ... ... \n", - "104 Chrome 100.0.4896 ATTACKTARGET-LT \n", - "105 Chrome 100.0.4896 ATTACKTARGET-LT \n", - "106 Edge 87.11424 DAVIDJOHNSON-LT \n", - "107 Chrome 100.0.4896 ATTACKTARGET-LT \n", - "108 Rich Client 4.39.0.0 \n", - "\n", - " deviceDetailoperatingSystem statusfailureReason logcount ... \\\n", - "0 Windows 10 3 ... \n", - "1 Windows 10 0 ... \n", - "2 Windows 10 2 ... \n", - "3 Windows 10 3 ... \n", - "4 Windows 10 4 ... \n", - ".. ... ... ... ... \n", - "104 Windows 10 16 ... \n", - "105 Windows 10 9 ... \n", - "106 Windows 10 0 ... \n", - "107 Windows 10 10 ... \n", - "108 Windows10 0 ... \n", - "\n", - " locincrement_loss logcount_z_loss clientAppUsed_pred \\\n", - "0 0.610670 0.027359 Mobile Apps and Desktop clients \n", - "1 5.914863 0.633756 Mobile Apps and Desktop clients \n", - "2 5.936103 0.109216 Mobile Apps and Desktop clients \n", - "3 5.942259 0.376158 Mobile Apps and Desktop clients \n", - "4 5.936529 0.562854 Mobile Apps and Desktop clients \n", - ".. ... ... ... \n", - "104 6.032503 2.419621 Mobile Apps and Desktop clients \n", - "105 0.503085 0.252396 Mobile Apps and Desktop clients \n", - "106 0.514629 0.155598 Mobile Apps and Desktop clients \n", - "107 0.535288 0.019980 Mobile Apps and Desktop clients \n", - "108 0.507879 0.192614 Mobile Apps and Desktop clients \n", - "\n", - " clientAppUsed_loss appDisplayName_loss statusfailureReason_loss \\\n", - "0 1.174857 2.780367 1.605560 \n", - "1 0.591306 1.697818 0.607544 \n", - "2 0.603926 1.713131 0.621300 \n", - "3 0.599671 1.697026 0.614225 \n", - "4 0.589449 1.683457 0.599916 \n", - ".. ... ... ... \n", - "104 0.418121 1.608195 0.420876 \n", - "105 0.807381 4.478906 1.358579 \n", - "106 0.799728 4.384770 1.430044 \n", - "107 0.783910 4.378347 1.345962 \n", - "108 1.292141 4.207778 1.424070 \n", - "\n", - " appincrement_loss appDisplayName_z_loss \\\n", - "0 0.735984 1.979742 \n", - "1 2.277889 2.968441 \n", - "2 3.988265 2.393491 \n", - "3 14.026583 2.998179 \n", - "4 30.273712 3.507625 \n", - ".. ... ... \n", - "104 619.997925 6.333486 \n", - "105 31.562899 1.308199 \n", - "106 36.937981 0.934225 \n", - "107 43.152668 0.908708 \n", - "108 42.751469 0.231083 \n", - "\n", - " model_version event_time \n", - "0 DFP-azure-tprice@domain.com:1 2023-07-20T20:24:22Z \n", - "1 DFP-azure-attacktarget@domain.com:2 2023-07-20T20:25:09Z \n", - "2 DFP-azure-attacktarget@domain.com:2 2023-07-20T20:25:09Z \n", - "3 DFP-azure-attacktarget@domain.com:2 2023-07-20T20:25:09Z \n", - "4 DFP-azure-attacktarget@domain.com:2 2023-07-20T20:25:09Z \n", - ".. ... ... \n", - "104 DFP-azure-attacktarget@domain.com:2 2023-07-20T20:26:07Z \n", - "105 DFP-azure-generic_user:53 2023-07-20T20:26:10Z \n", - "106 DFP-azure-generic_user:53 2023-07-20T20:26:10Z \n", - "107 DFP-azure-generic_user:53 2023-07-20T20:26:10Z \n", - "108 DFP-azure-generic_user:53 2023-07-20T20:26:10Z \n", - "\n", - "[109 rows x 44 columns]" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df = cudf.read_csv(\"dfp_detections_azure.csv\")\n", "df" diff --git a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_training.ipynb b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_training.ipynb index 8bc19d88b0..fc49b736b6 100644 --- a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_training.ipynb +++ b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_azure_training.ipynb @@ -17,7 +17,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "b6c1cb50-74f2-445d-b865-8c22c3b3798b", "metadata": {}, "outputs": [], @@ -33,32 +33,10 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "102ce011-3ca3-4f96-a72d-de28fad32003", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/opt/conda/envs/morpheus/lib/python3.10/site-packages/merlin/dtypes/mappings/tf.py:52: UserWarning: Tensorflow dtype mappings did not load successfully due to an error: No module named 'tensorflow'\n", - " warn(f\"Tensorflow dtype mappings did not load successfully due to an error: {exc.msg}\")\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import functools\n", "import logging\n", @@ -125,7 +103,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "9ee00703-75c5-46fc-890c-86733da906c4", "metadata": {}, "outputs": [], @@ -175,7 +153,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "30c80a17", "metadata": {}, "outputs": [], @@ -204,7 +182,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "01abd537-9162-49dc-8e83-d9465592f1d5", "metadata": {}, "outputs": [], @@ -221,7 +199,7 @@ "config.ae = ConfigAutoEncoder()\n", "\n", "config.ae.feature_columns = [\n", - " \"appDisplayName\", \"clientAppUsed\", \"deviceDetailbrowser\", \"deviceDetaildisplayName\", \"deviceDetailoperatingSystem\", \"statusfailureReason\", \"appincrement\", \"locincrement\", \"logcount\", \n", + " \"appDisplayName\", \"clientAppUsed\", \"deviceDetailbrowser\", \"deviceDetaildisplayName\", \"deviceDetailoperatingSystem\", \"statusfailureReason\", \"appincrement\", \"locincrement\", \"logcount\",\n", "]\n", "config.ae.userid_column_name = \"username\"\n", "config.ae.timestamp_column_name = \"timestamp\"" @@ -229,7 +207,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "a73a4d53-32b6-4ab8-a5d7-c0104b31c69b", "metadata": {}, "outputs": [], @@ -261,7 +239,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "f7a0cb0a-e65a-444a-a06c-a4525d543790", "metadata": {}, "outputs": [], @@ -392,97 +370,10 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "id": "825390ad-ce64-4949-b324-33039ffdf264", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "====Registering Pipeline====\u001b[0m\n", - "====Building Pipeline====\u001b[0m\n", - "====Building Pipeline Complete!====\u001b[0m\n", - "\u001b[2mStarting! Time: 1689884454.0088277\u001b[0m\n", - "====Registering Pipeline Complete!====\u001b[0m\n", - "====Starting Pipeline====\u001b[0m\n", - "====Pipeline Started====\u001b[0m\n", - "====Building Segment: linear_segment_0====\u001b[0m\n", - "Added source: \n", - " └─> fsspec.OpenFiles\u001b[0m\n", - "Added stage: , filename_regex=re.compile('(?P\\\\d{4})-(?P\\\\d{1,2})-(?P\\\\d{1,2})T(?P\\\\d{1,2})(:|_|\\\\.)(?P\\\\d{1,2})(:|_|\\\\.)(?P\\\\d{1,2})(?P\\\\.\\\\d{1,6})?Z')), period=D, sampling_rate_s=None, start_time=None, end_time=None, sampling=None)>\n", - " └─ fsspec.OpenFiles -> Tuple[fsspec.core.OpenFiles, int]\u001b[0m\n", - "Added stage: \n", - " └─ Tuple[fsspec.core.OpenFiles, int] -> pandas.DataFrame\u001b[0m\n", - "Added stage: \n", - " └─ pandas.DataFrame -> dfp.DFPMessageMeta\u001b[0m\n", - "Added stage: \n", - " └─ dfp.DFPMessageMeta -> dfp.MultiDFPMessage\u001b[0m\n", - "Added stage: \n", - " └─ dfp.MultiDFPMessage -> dfp.MultiDFPMessage\u001b[0m\n", - "Added stage: \n", - " └─ dfp.MultiDFPMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MultiAEMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "====Building Segment Complete!====\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 88, Cache: hit, Duration: 3.065347671508789 ms, Rate: 28707.999688885433 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 88 rows from 2022-08-01 00:03:56.207532 to 2022-08-01 23:54:11.248402. Output: 20 users, rows/user min: 1, max: 88, avg: 8.80. Duration: 4.29 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 110, Cache: hit, Duration: 16.76177978515625 ms, Rate: 6562.548930359581 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 110 rows from 2022-08-02 00:03:57.781586 to 2022-08-02 23:58:42.803775. Output: 19 users, rows/user min: 1, max: 110, avg: 11.58. Duration: 9.93 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 97, Cache: hit, Duration: 33.37359428405762 ms, Rate: 2906.489459133156 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 97 rows from 2022-08-03 00:10:42.770060 to 2022-08-03 23:23:43.932133. Output: 16 users, rows/user min: 1, max: 97, avg: 12.12. Duration: 4.68 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 126, Cache: hit, Duration: 22.55558967590332 ms, Rate: 5586.198446170921 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 126 rows from 2022-08-04 00:47:51.564611 to 2022-08-04 23:50:26.072379. Output: 21 users, rows/user min: 1, max: 126, avg: 12.00. Duration: 5.77 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 109, Cache: hit, Duration: 20.061969757080078 ms, Rate: 5433.165402990041 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 109 rows from 2022-08-05 00:14:48.503160 to 2022-08-05 23:45:07.826898. Output: 16 users, rows/user min: 1, max: 109, avg: 13.62. Duration: 10.13 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 107, Cache: hit, Duration: 29.167890548706055 ms, Rate: 3668.4174956473407 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 107 rows from 2022-08-06 00:08:48.348649 to 2022-08-06 23:53:00.392382. Output: 17 users, rows/user min: 1, max: 107, avg: 12.59. Duration: 10.10 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 107, Cache: hit, Duration: 47.4393367767334 ms, Rate: 2255.5121397160447 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 107 rows from 2022-08-07 00:00:23.959795 to 2022-08-07 23:56:24.809043. Output: 17 users, rows/user min: 1, max: 107, avg: 12.59. Duration: 16.20 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 119, Cache: hit, Duration: 23.746728897094727 ms, Rate: 5011.216513890423 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 119 rows from 2022-08-08 00:16:13.439012 to 2022-08-08 23:58:43.815912. Output: 18 users, rows/user min: 1, max: 119, avg: 13.22. Duration: 8.41 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 102, Cache: hit, Duration: 21.516084671020508 ms, Rate: 4740.63945924982 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 102 rows from 2022-08-09 00:23:17.790393 to 2022-08-09 23:59:49.626250. Output: 16 users, rows/user min: 2, max: 102, avg: 12.75. Duration: 14.24 ms\u001b[0m\n", - "\u001b[2mRolling window complete for generic_user in 18.05 ms. Input: 126 rows from 2022-08-04 00:47:51.564611 to 2022-08-04 23:50:26.072379. Output: 421 rows from 2022-08-01 00:03:56.207532 to 2022-08-04 23:50:26.072379\u001b[0m\n", - "\u001b[2mRolling window complete for generic_user in 14.97 ms. Input: 107 rows from 2022-08-07 00:00:23.959795 to 2022-08-07 23:56:24.809043. Output: 744 rows from 2022-08-01 00:03:56.207532 to 2022-08-07 23:56:24.809043\u001b[0m\n", - "\u001b[2mPreprocessed 421 data for logs in 2022-08-01 00:03:56.207532 to 2022-08-04 23:50:26.072379 in 3386.120080947876 ms\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'...\u001b[0m\n", - "\u001b[2mPreprocessed 744 data for logs in 2022-08-01 00:03:56.207532 to 2022-08-07 23:56:24.809043 in 906.4126014709473 ms\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:21:00 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-azure-generic_user, version 51\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: generic_user:DFP-azure-generic_user:51\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'... Complete.\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:21:02 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-azure-generic_user, version 52\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: generic_user:DFP-azure-generic_user:52\u001b[0m\n", - "====Pipeline Complete====\u001b[0m\n" - ] - } - ], + "outputs": [], "source": [ "# Create a linear pipeline object\n", "pipeline = LinearPipeline(config)\n", @@ -551,9 +442,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python [conda env:morpheus] *", + "display_name": "morpheus", "language": "python", - "name": "conda-env-morpheus-py" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -565,12 +456,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" - }, - "vscode": { - "interpreter": { - "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" - } + "version": "3.10.14" } }, "nbformat": 4, diff --git a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_inference.ipynb b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_inference.ipynb index 1d4adf907a..8bb35d5f78 100644 --- a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_inference.ipynb +++ b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_inference.ipynb @@ -17,7 +17,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "b6c1cb50-74f2-445d-b865-8c22c3b3798b", "metadata": {}, "outputs": [], @@ -33,32 +33,10 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "102ce011-3ca3-4f96-a72d-de28fad32003", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/opt/conda/envs/morpheus/lib/python3.10/site-packages/merlin/dtypes/mappings/tf.py:52: UserWarning: Tensorflow dtype mappings did not load successfully due to an error: No module named 'tensorflow'\n", - " warn(f\"Tensorflow dtype mappings did not load successfully due to an error: {exc.msg}\")\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import functools\n", "import logging\n", @@ -130,7 +108,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "9ee00703-75c5-46fc-890c-86733da906c4", "metadata": {}, "outputs": [], @@ -177,7 +155,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "2a01ceb8", "metadata": {}, "outputs": [], @@ -206,7 +184,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "01abd537-9162-49dc-8e83-d9465592f1d5", "metadata": {}, "outputs": [], @@ -231,7 +209,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "a73a4d53-32b6-4ab8-a5d7-c0104b31c69b", "metadata": {}, "outputs": [], @@ -266,7 +244,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "f7a0cb0a-e65a-444a-a06c-a4525d543790", "metadata": {}, "outputs": [], @@ -419,158 +397,10 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "id": "825390ad-ce64-4949-b324-33039ffdf264", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mUpdating list of available models...\u001b[0m\n", - "\u001b[2mUpdating list of available models... Done.\u001b[0m\n", - "====Registering Pipeline====\u001b[0m\n", - "====Building Pipeline====\u001b[0m\n", - "====Building Pipeline Complete!====\u001b[0m\n", - "\u001b[2mStarting! Time: 1689884131.5528107\u001b[0m\n", - "====Registering Pipeline Complete!====\u001b[0m\n", - "====Starting Pipeline====\u001b[0m\n", - "====Pipeline Started====\u001b[0m\n", - "====Building Segment: linear_segment_0====\u001b[0m\n", - "Added source: \n", - " └─> fsspec.OpenFiles\u001b[0m\n", - "Added stage: , filename_regex=re.compile('(?P\\\\d{4})-(?P\\\\d{1,2})-(?P\\\\d{1,2})T(?P\\\\d{1,2})(:|_|\\\\.)(?P\\\\d{1,2})(:|_|\\\\.)(?P\\\\d{1,2})(?P\\\\.\\\\d{1,6})?Z')), period=D, sampling_rate_s=None, start_time=None, end_time=None, sampling=None)>\n", - " └─ fsspec.OpenFiles -> Tuple[fsspec.core.OpenFiles, int]\u001b[0m\n", - "Added stage: \n", - " └─ Tuple[fsspec.core.OpenFiles, int] -> pandas.DataFrame\u001b[0m\n", - "Added stage: \n", - " └─ pandas.DataFrame -> dfp.DFPMessageMeta\u001b[0m\n", - "Added stage: \n", - " └─ dfp.DFPMessageMeta -> dfp.MultiDFPMessage\u001b[0m\n", - "Added stage: \n", - " └─ dfp.MultiDFPMessage -> dfp.MultiDFPMessage\u001b[0m\n", - "Added stage: \n", - " └─ dfp.MultiDFPMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MultiAEMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MultiAEMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MultiAEMessage -> morpheus.MessageMeta\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MessageMeta -> morpheus.MessageMeta\u001b[0m\n", - "====Building Segment Complete!====\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 97, Cache: hit, Duration: 3.4716129302978516 ms, Rate: 27940.902959961542 rows/s\u001b[0m\n", - "\u001b[2mPreallocating column event_time[TypeId.STRING]\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 97 rows from 2022-08-30 00:26:18 to 2022-08-30 22:54:43. Output: 14 users, rows/user min: 2, max: 16, avg: 6.93. Duration: 3.33 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 569, Cache: hit, Duration: 12.414693832397461 ms, Rate: 45832.78554281654 rows/s\u001b[0m\n", - "\u001b[2mPreallocating column event_time[TypeId.STRING]\u001b[0m\n", - "\u001b[2mRolling window complete for anthony in 17.77 ms. Input: 16 rows from 2022-08-30 00:32:47 to 2022-08-30 21:59:35. Output: 16 rows from 2022-08-30 00:32:47 to 2022-08-30 21:59:35\u001b[0m\n", - "\u001b[2mRolling window complete for attacktarget in 17.02 ms. Input: 15 rows from 2022-08-30 02:00:05 to 2022-08-30 22:35:30. Output: 15 rows from 2022-08-30 02:00:05 to 2022-08-30 22:35:30\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 569 rows from 2022-08-31 00:01:31 to 2022-08-31 23:56:43. Output: 14 users, rows/user min: 1, max: 512, avg: 40.64. Duration: 31.87 ms\u001b[0m\n", - "\u001b[2mRolling window complete for bailey in 17.40 ms. Input: 10 rows from 2022-08-30 00:47:16 to 2022-08-30 22:54:43. Output: 10 rows from 2022-08-30 00:47:16 to 2022-08-30 22:54:43\u001b[0m\n", - "\u001b[2mRolling window complete for benjamin in 13.13 ms. Input: 11 rows from 2022-08-30 01:15:59 to 2022-08-30 22:43:23. Output: 11 rows from 2022-08-30 01:15:59 to 2022-08-30 22:43:23\u001b[0m\n", - "\u001b[2mRolling window complete for briana in 17.26 ms. Input: 4 rows from 2022-08-30 04:53:31 to 2022-08-30 19:55:42. Output: 4 rows from 2022-08-30 04:53:31 to 2022-08-30 19:55:42\u001b[0m\n", - "\u001b[2mRolling window complete for cassandra in 12.44 ms. Input: 5 rows from 2022-08-30 02:50:54 to 2022-08-30 22:03:43. Output: 5 rows from 2022-08-30 02:50:54 to 2022-08-30 22:03:43\u001b[0m\n", - "\u001b[2mRolling window complete for christopher in 13.45 ms. Input: 4 rows from 2022-08-30 02:51:39 to 2022-08-30 20:45:12. Output: 4 rows from 2022-08-30 02:51:39 to 2022-08-30 20:45:12\u001b[0m\n", - "\u001b[2mRolling window complete for debbie in 19.31 ms. Input: 4 rows from 2022-08-30 02:16:03 to 2022-08-30 20:52:29. Output: 4 rows from 2022-08-30 02:16:03 to 2022-08-30 20:52:29\u001b[0m\n", - "\u001b[2mRolling window complete for erik in 14.43 ms. Input: 2 rows from 2022-08-30 08:34:06 to 2022-08-30 14:11:00. Output: 2 rows from 2022-08-30 08:34:06 to 2022-08-30 14:11:00\u001b[0m\n", - "\u001b[2mRolling window complete for gregory in 14.23 ms. Input: 2 rows from 2022-08-30 06:38:13 to 2022-08-30 06:43:27. Output: 2 rows from 2022-08-30 06:38:13 to 2022-08-30 06:43:27\u001b[0m\n", - "\u001b[2mRolling window complete for juan in 13.74 ms. Input: 13 rows from 2022-08-30 00:33:39 to 2022-08-30 21:02:48. Output: 13 rows from 2022-08-30 00:33:39 to 2022-08-30 21:02:48\u001b[0m\n", - "\u001b[2mRolling window complete for patrick in 13.37 ms. Input: 4 rows from 2022-08-30 01:07:11 to 2022-08-30 22:38:43. Output: 4 rows from 2022-08-30 01:07:11 to 2022-08-30 22:38:43\u001b[0m\n", - "\u001b[2mRolling window complete for paul in 14.28 ms. Input: 4 rows from 2022-08-30 00:26:18 to 2022-08-30 12:12:56. Output: 4 rows from 2022-08-30 00:26:18 to 2022-08-30 12:12:56\u001b[0m\n", - "\u001b[2mRolling window complete for shannon in 13.91 ms. Input: 3 rows from 2022-08-30 03:33:23 to 2022-08-30 21:17:26. Output: 3 rows from 2022-08-30 03:33:23 to 2022-08-30 21:17:26\u001b[0m\n", - "\u001b[2mRolling window complete for amber in 13.10 ms. Input: 4 rows from 2022-08-31 10:28:44 to 2022-08-31 21:22:10. Output: 4 rows from 2022-08-31 10:28:44 to 2022-08-31 21:22:10\u001b[0m\n", - "\u001b[2mRolling window complete for anthony in 13.08 ms. Input: 9 rows from 2022-08-31 01:07:27 to 2022-08-31 21:22:05. Output: 10 rows from 2022-08-30 21:59:35 to 2022-08-31 21:22:05\u001b[0m\n", - "\u001b[2mRolling window complete for attacktarget in 23.57 ms. Input: 512 rows from 2022-08-31 00:01:31 to 2022-08-31 23:56:43. Output: 512 rows from 2022-08-31 00:01:31 to 2022-08-31 23:56:43\u001b[0m\n", - "\u001b[2mRolling window complete for bailey in 15.20 ms. Input: 11 rows from 2022-08-31 00:18:16 to 2022-08-31 19:06:18. Output: 16 rows from 2022-08-30 19:26:40 to 2022-08-31 19:06:18\u001b[0m\n", - "\u001b[2mRolling window complete for benjamin in 13.56 ms. Input: 9 rows from 2022-08-31 04:42:57 to 2022-08-31 17:13:08. Output: 13 rows from 2022-08-30 17:14:31 to 2022-08-31 17:13:08\u001b[0m\n", - "\u001b[2mRolling window complete for briana in 15.51 ms. Input: 3 rows from 2022-08-31 08:54:35 to 2022-08-31 20:59:39. Output: 3 rows from 2022-08-31 08:54:35 to 2022-08-31 20:59:39\u001b[0m\n", - "\u001b[2mRolling window complete for cassandra in 14.42 ms. Input: 3 rows from 2022-08-31 05:47:45 to 2022-08-31 16:38:13. Output: 4 rows from 2022-08-30 22:03:43 to 2022-08-31 16:38:13\u001b[0m\n", - "\u001b[2mRolling window complete for debbie in 13.46 ms. Input: 3 rows from 2022-08-31 06:53:07 to 2022-08-31 13:04:02. Output: 4 rows from 2022-08-30 20:52:29 to 2022-08-31 13:04:02\u001b[0m\n", - "\u001b[2mRolling window complete for erik in 13.69 ms. Input: 4 rows from 2022-08-31 04:32:45 to 2022-08-31 18:15:47. Output: 4 rows from 2022-08-31 04:32:45 to 2022-08-31 18:15:47\u001b[0m\n", - "\u001b[2mRolling window complete for gregory in 14.36 ms. Input: 1 rows from 2022-08-31 01:06:11 to 2022-08-31 01:06:11. Output: 3 rows from 2022-08-30 06:38:13 to 2022-08-31 01:06:11\u001b[0m\n", - "\u001b[2mRolling window complete for juan in 13.53 ms. Input: 3 rows from 2022-08-31 07:19:09 to 2022-08-31 08:29:45. Output: 10 rows from 2022-08-30 08:34:47 to 2022-08-31 08:29:45\u001b[0m\n", - "\u001b[2mRolling window complete for patrick in 14.06 ms. Input: 2 rows from 2022-08-31 07:24:40 to 2022-08-31 13:09:00. Output: 4 rows from 2022-08-30 17:39:27 to 2022-08-31 13:09:00\u001b[0m\n", - "\u001b[2mRolling window complete for paul in 14.35 ms. Input: 3 rows from 2022-08-31 00:55:31 to 2022-08-31 07:30:12. Output: 6 rows from 2022-08-30 08:09:44 to 2022-08-31 07:30:12\u001b[0m\n", - "\u001b[2mRolling window complete for shannon in 14.20 ms. Input: 2 rows from 2022-08-31 05:23:34 to 2022-08-31 08:16:36. Output: 4 rows from 2022-08-30 11:42:57 to 2022-08-31 08:16:36\u001b[0m\n", - "\u001b[2mPreprocessed 16 data for logs in 2022-08-30 00:32:47 to 2022-08-30 21:59:35 in 2787.977933883667 ms\u001b[0m\n", - "\u001b[2mPreprocessed 15 data for logs in 2022-08-30 02:00:05 to 2022-08-30 22:35:30 in 174.47781562805176 ms\u001b[0m\n", - "\u001b[2mPreprocessed 10 data for logs in 2022-08-30 00:47:16 to 2022-08-30 22:54:43 in 167.16289520263672 ms\u001b[0m\n", - "\u001b[2mPreprocessed 11 data for logs in 2022-08-30 01:15:59 to 2022-08-30 22:43:23 in 140.0158405303955 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 04:53:31 to 2022-08-30 19:55:42 in 160.80236434936523 ms\u001b[0m\n", - "\u001b[2mPreprocessed 5 data for logs in 2022-08-30 02:50:54 to 2022-08-30 22:03:43 in 153.3806324005127 ms\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-anthony:11' in 770.7610130310059 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 02:51:39 to 2022-08-30 20:45:12 in 584.4273567199707 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 02:16:03 to 2022-08-30 20:52:29 in 344.77710723876953 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user anthony. Model load: 803.2019138336182 ms, Model infer: 1001.3659000396729 ms. Start: 2022-08-30 00:32:47, End: 2022-08-30 21:59:35\u001b[0m\n", - "\u001b[2mPreprocessed 2 data for logs in 2022-08-30 08:34:06 to 2022-08-30 14:11:00 in 191.02168083190918 ms\u001b[0m\n", - "\u001b[2mPreprocessed 2 data for logs in 2022-08-30 06:38:13 to 2022-08-30 06:43:27 in 158.5981845855713 ms\u001b[0m\n", - "\u001b[2mPreprocessed 13 data for logs in 2022-08-30 00:33:39 to 2022-08-30 21:02:48 in 160.74442863464355 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 01:07:11 to 2022-08-30 22:38:43 in 154.65855598449707 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 00:26:18 to 2022-08-30 12:12:56 in 142.5483226776123 ms\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-attacktarget:11' in 826.9455432891846 ms\u001b[0m\n", - "\u001b[2mPreprocessed 3 data for logs in 2022-08-30 03:33:23 to 2022-08-30 21:17:26 in 206.42590522766113 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user attacktarget. Model load: 868.920087814331 ms, Model infer: 195.8291530609131 ms. Start: 2022-08-30 02:00:05, End: 2022-08-30 22:35:30\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-31 10:28:44 to 2022-08-31 21:22:10 in 163.65337371826172 ms\u001b[0m\n", - "\u001b[2mPreprocessed 10 data for logs in 2022-08-30 21:59:35 to 2022-08-31 21:22:05 in 150.89797973632812 ms\u001b[0m\n", - "\u001b[2mPreprocessed 512 data for logs in 2022-08-31 00:01:31 to 2022-08-31 23:56:43 in 164.02316093444824 ms\u001b[0m\n", - "\u001b[2mPreprocessed 16 data for logs in 2022-08-30 19:26:40 to 2022-08-31 19:06:18 in 164.55817222595215 ms\u001b[0m\n", - "\u001b[2mPreprocessed 13 data for logs in 2022-08-30 17:14:31 to 2022-08-31 17:13:08 in 177.25563049316406 ms\u001b[0m\n", - "\u001b[2mPreprocessed 3 data for logs in 2022-08-31 08:54:35 to 2022-08-31 20:59:39 in 176.85627937316895 ms\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-bailey:6' in 959.5987796783447 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 22:03:43 to 2022-08-31 16:38:13 in 188.13014030456543 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user bailey. Model load: 996.5829849243164 ms, Model infer: 201.33042335510254 ms. Start: 2022-08-30 00:47:16, End: 2022-08-30 22:54:43\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 20:52:29 to 2022-08-31 13:04:02 in 167.58084297180176 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-31 04:32:45 to 2022-08-31 18:15:47 in 157.8218936920166 ms\u001b[0m\n", - "\u001b[2mPreprocessed 3 data for logs in 2022-08-30 06:38:13 to 2022-08-31 01:06:11 in 152.5866985321045 ms\u001b[0m\n", - "\u001b[2mPreprocessed 10 data for logs in 2022-08-30 08:34:47 to 2022-08-31 08:29:45 in 153.0296802520752 ms\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 17:39:27 to 2022-08-31 13:09:00 in 282.3915481567383 ms\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-benjamin:11' in 893.4438228607178 ms\u001b[0m\n", - "\u001b[2mPreprocessed 6 data for logs in 2022-08-30 08:09:44 to 2022-08-31 07:30:12 in 162.02831268310547 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user benjamin. Model load: 927.8731346130371 ms, Model infer: 202.12268829345703 ms. Start: 2022-08-30 01:15:59, End: 2022-08-30 22:43:23\u001b[0m\n", - "\u001b[2mPreprocessed 4 data for logs in 2022-08-30 11:42:57 to 2022-08-31 08:16:36 in 184.88121032714844 ms\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-briana:6' in 160.63833236694336 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user briana. Model load: 186.61856651306152 ms, Model infer: 65.44113159179688 ms. Start: 2022-08-30 04:53:31, End: 2022-08-30 19:55:42\u001b[0m\n", - "\u001b[2mCompleted postprocessing for user briana in 13.521909713745117 ms. Event count: 1. Start: 2022-08-30 04:53:31, End: 2022-08-30 04:53:31\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-cassandra:6' in 141.81804656982422 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user cassandra. Model load: 174.58534240722656 ms, Model infer: 53.93695831298828 ms. Start: 2022-08-30 02:50:54, End: 2022-08-30 22:03:43\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-christopher:6' in 132.56025314331055 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user christopher. Model load: 143.65291595458984 ms, Model infer: 51.56564712524414 ms. Start: 2022-08-30 02:51:39, End: 2022-08-30 20:45:12\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-debbie:6' in 157.55987167358398 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user debbie. Model load: 169.3258285522461 ms, Model infer: 51.676273345947266 ms. Start: 2022-08-30 02:16:03, End: 2022-08-30 20:52:29\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-erik:6' in 127.00200080871582 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user erik. Model load: 138.45252990722656 ms, Model infer: 51.18727684020996 ms. Start: 2022-08-30 08:34:06, End: 2022-08-30 14:11:00\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-gregory:6' in 135.29348373413086 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user gregory. Model load: 149.23477172851562 ms, Model infer: 50.29773712158203 ms. Start: 2022-08-30 06:38:13, End: 2022-08-30 06:43:27\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-juan:11' in 151.63159370422363 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user juan. Model load: 162.81437873840332 ms, Model infer: 55.87410926818848 ms. Start: 2022-08-30 00:33:39, End: 2022-08-30 21:02:48\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-patrick:6' in 140.61880111694336 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user patrick. Model load: 152.35590934753418 ms, Model infer: 51.64742469787598 ms. Start: 2022-08-30 01:07:11, End: 2022-08-30 22:38:43\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-paul:6' in 138.47756385803223 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user paul. Model load: 149.69849586486816 ms, Model infer: 51.1174201965332 ms. Start: 2022-08-30 00:26:18, End: 2022-08-30 12:12:56\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-shannon:6' in 130.6006908416748 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user shannon. Model load: 142.17257499694824 ms, Model infer: 51.09739303588867 ms. Start: 2022-08-30 03:33:23, End: 2022-08-30 21:17:26\u001b[0m\n", - "\u001b[2mDownloaded model 'DFP-duo-amber:6' in 130.63478469848633 ms\u001b[0m\n", - "\u001b[2mCompleted inference for user amber. Model load: 144.36626434326172 ms, Model infer: 58.923959732055664 ms. Start: 2022-08-31 10:28:44, End: 2022-08-31 21:22:10\u001b[0m\n", - "\u001b[2mCompleted inference for user anthony. Model load: 0.9205341339111328 ms, Model infer: 59.65065956115723 ms. Start: 2022-08-30 21:59:35, End: 2022-08-31 21:22:05\u001b[0m\n", - "\u001b[2mCompleted inference for user attacktarget. Model load: 1.8947124481201172 ms, Model infer: 65.05608558654785 ms. Start: 2022-08-31 00:01:31, End: 2022-08-31 23:56:43\u001b[0m\n", - "\u001b[2mCompleted postprocessing for user attacktarget in 14.00613784790039 ms. Event count: 512. Start: 2022-08-31 00:01:31, End: 2022-08-31 23:56:43\u001b[0m\n", - "\u001b[2mCompleted inference for user bailey. Model load: 0.8950233459472656 ms, Model infer: 90.78335762023926 ms. Start: 2022-08-30 19:26:40, End: 2022-08-31 19:06:18\u001b[0m\n", - "\u001b[2mCompleted inference for user benjamin. Model load: 1.0097026824951172 ms, Model infer: 59.13424491882324 ms. Start: 2022-08-30 17:14:31, End: 2022-08-31 17:13:08\u001b[0m\n", - "\u001b[2mCompleted inference for user briana. Model load: 1.065969467163086 ms, Model infer: 55.05728721618652 ms. Start: 2022-08-31 08:54:35, End: 2022-08-31 20:59:39\u001b[0m\n", - "\u001b[2mCompleted inference for user cassandra. Model load: 0.8597373962402344 ms, Model infer: 49.3316650390625 ms. Start: 2022-08-30 22:03:43, End: 2022-08-31 16:38:13\u001b[0m\n", - "\u001b[2mCompleted inference for user debbie. Model load: 1.3566017150878906 ms, Model infer: 50.60410499572754 ms. Start: 2022-08-30 20:52:29, End: 2022-08-31 13:04:02\u001b[0m\n", - "\u001b[2mCompleted inference for user erik. Model load: 0.8389949798583984 ms, Model infer: 49.55720901489258 ms. Start: 2022-08-31 04:32:45, End: 2022-08-31 18:15:47\u001b[0m\n", - "\u001b[2mCompleted inference for user gregory. Model load: 0.8082389831542969 ms, Model infer: 51.20587348937988 ms. Start: 2022-08-30 06:38:13, End: 2022-08-31 01:06:11\u001b[0m\n", - "\u001b[2mCompleted inference for user juan. Model load: 1.3794898986816406 ms, Model infer: 49.3009090423584 ms. Start: 2022-08-30 08:34:47, End: 2022-08-31 08:29:45\u001b[0m\n", - "\u001b[2mCompleted inference for user patrick. Model load: 0.9067058563232422 ms, Model infer: 50.65441131591797 ms. Start: 2022-08-30 17:39:27, End: 2022-08-31 13:09:00\u001b[0m\n", - "\u001b[2mCompleted inference for user paul. Model load: 1.5044212341308594 ms, Model infer: 49.71170425415039 ms. Start: 2022-08-30 08:09:44, End: 2022-08-31 07:30:12\u001b[0m\n", - "\u001b[2mCompleted inference for user shannon. Model load: 1.4891624450683594 ms, Model infer: 51.0106086730957 ms. Start: 2022-08-30 11:42:57, End: 2022-08-31 08:16:36\u001b[0m\n", - "====Pipeline Complete====\u001b[0m\n" - ] - } - ], + "outputs": [], "source": [ "# Create a linear pipeline object\n", "pipeline = LinearPipeline(config)\n", @@ -644,9 +474,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python [conda env:morpheus] *", + "display_name": "morpheus", "language": "python", - "name": "conda-env-morpheus-py" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -658,7 +488,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" + "version": "3.10.14" } }, "nbformat": 4, diff --git a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_integrated_training.ipynb b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_integrated_training.ipynb index 16fff565cc..db57a85a1e 100644 --- a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_integrated_training.ipynb +++ b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_integrated_training.ipynb @@ -17,7 +17,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "b6c1cb50-74f2-445d-b865-8c22c3b3798b", "metadata": {}, "outputs": [], @@ -33,39 +33,17 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "102ce011-3ca3-4f96-a72d-de28fad32003", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/opt/conda/envs/morpheus/lib/python3.10/site-packages/merlin/dtypes/mappings/tf.py:52: UserWarning: Tensorflow dtype mappings did not load successfully due to an error: No module named 'tensorflow'\n", - " warn(f\"Tensorflow dtype mappings did not load successfully due to an error: {exc.msg}\")\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import logging\n", "import typing\n", "import cudf\n", "from datetime import datetime\n", "\n", - "# When segment modules are imported, they're added to the module registry. \n", + "# When segment modules are imported, they're added to the module registry.\n", "# To avoid flake8 warnings about unused code, the noqa flag is used during import.\n", "import dfp.modules # noqa: F401\n", "from morpheus import modules # noqa: F401\n", @@ -118,7 +96,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "9ee00703-75c5-46fc-890c-86733da906c4", "metadata": {}, "outputs": [], @@ -177,7 +155,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "6f5b67b8", "metadata": {}, "outputs": [], @@ -202,7 +180,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "01abd537-9162-49dc-8e83-d9465592f1d5", "metadata": {}, "outputs": [], @@ -359,7 +337,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "a73a4d53-32b6-4ab8-a5d7-c0104b31c69b", "metadata": {}, "outputs": [], @@ -400,7 +378,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "825390ad-ce64-4949-b324-33039ffdf264", "metadata": {}, "outputs": [], @@ -480,413 +458,12 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "id": "960f14ef", "metadata": { "tags": [] }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Unnamed: 0timestampusernameaccessdevicebrowseraccessdeviceosauthdevicenameresultreasonlogcountlocincrement...logcount_lossaccessdevicebrowser_predlocincrement_predmax_abs_zresult_predauthdevicename_lossaccessdevicebrowser_z_lossauthdevicename_predmodel_versionevent_time
0152022-08-31T00:01:31.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode01...1.823248<NA>2.0702446.703687True0.9705186.703687<NA>DFP-duo-attacktarget:122023-07-20T20:18:56Z
1162022-08-31T00:03:25.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode11...1.248076<NA>2.0708856.696225True0.9723196.696225<NA>DFP-duo-attacktarget:122023-07-20T20:18:56Z
2172022-08-31T00:10:20.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode21...0.781832<NA>2.0706996.741863True0.9735706.674241<NA>DFP-duo-attacktarget:122023-07-20T20:18:56Z
3182022-08-31T00:12:13.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode31...0.422212<NA>2.0698486.844139True0.9751076.626240<NA>DFP-duo-attacktarget:122023-07-20T20:18:56Z
4192022-08-31T00:18:26.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode41...0.171310<NA>2.0683746.959468False0.9763216.544739<NA>DFP-duo-attacktarget:122023-07-20T20:18:56Z
..................................................................
10066612022-08-31T23:42:07.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode5071...19528.199220<NA>1.63453312834.770510False0.21381633.274307<NA>DFP-duo-generic_user:502023-07-20T20:18:56Z
10076622022-08-31T23:42:19.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode5082...19606.263670<NA>1.63855712886.080080False0.21032533.348270<NA>DFP-duo-generic_user:502023-07-20T20:18:56Z
10086632022-08-31T23:50:20.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode5092...19684.128910<NA>1.63828512937.258790False0.20913633.416397<NA>DFP-duo-generic_user:502023-07-20T20:18:56Z
10096642022-08-31T23:55:22.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode5101...19761.792970<NA>1.63371412988.305660False0.21021433.478695<NA>DFP-duo-generic_user:502023-07-20T20:18:56Z
10106652022-08-31T23:56:43.000000000ZattacktargetSafariWindows<NA>Falseinvalid_passcode5111...19839.966800<NA>1.63344013039.687500False0.20902733.546825<NA>DFP-duo-generic_user:502023-07-20T20:18:56Z
\n", - "

1011 rows × 36 columns

\n", - "
" - ], - "text/plain": [ - " Unnamed: 0 timestamp username \\\n", - "0 15 2022-08-31T00:01:31.000000000Z attacktarget \n", - "1 16 2022-08-31T00:03:25.000000000Z attacktarget \n", - "2 17 2022-08-31T00:10:20.000000000Z attacktarget \n", - "3 18 2022-08-31T00:12:13.000000000Z attacktarget \n", - "4 19 2022-08-31T00:18:26.000000000Z attacktarget \n", - "... ... ... ... \n", - "1006 661 2022-08-31T23:42:07.000000000Z attacktarget \n", - "1007 662 2022-08-31T23:42:19.000000000Z attacktarget \n", - "1008 663 2022-08-31T23:50:20.000000000Z attacktarget \n", - "1009 664 2022-08-31T23:55:22.000000000Z attacktarget \n", - "1010 665 2022-08-31T23:56:43.000000000Z attacktarget \n", - "\n", - " accessdevicebrowser accessdeviceos authdevicename result \\\n", - "0 Safari Windows False \n", - "1 Safari Windows False \n", - "2 Safari Windows False \n", - "3 Safari Windows False \n", - "4 Safari Windows False \n", - "... ... ... ... ... \n", - "1006 Safari Windows False \n", - "1007 Safari Windows False \n", - "1008 Safari Windows False \n", - "1009 Safari Windows False \n", - "1010 Safari Windows False \n", - "\n", - " reason logcount locincrement ... logcount_loss \\\n", - "0 invalid_passcode 0 1 ... 1.823248 \n", - "1 invalid_passcode 1 1 ... 1.248076 \n", - "2 invalid_passcode 2 1 ... 0.781832 \n", - "3 invalid_passcode 3 1 ... 0.422212 \n", - "4 invalid_passcode 4 1 ... 0.171310 \n", - "... ... ... ... ... ... \n", - "1006 invalid_passcode 507 1 ... 19528.199220 \n", - "1007 invalid_passcode 508 2 ... 19606.263670 \n", - "1008 invalid_passcode 509 2 ... 19684.128910 \n", - "1009 invalid_passcode 510 1 ... 19761.792970 \n", - "1010 invalid_passcode 511 1 ... 19839.966800 \n", - "\n", - " accessdevicebrowser_pred locincrement_pred max_abs_z result_pred \\\n", - "0 2.070244 6.703687 True \n", - "1 2.070885 6.696225 True \n", - "2 2.070699 6.741863 True \n", - "3 2.069848 6.844139 True \n", - "4 2.068374 6.959468 False \n", - "... ... ... ... ... \n", - "1006 1.634533 12834.770510 False \n", - "1007 1.638557 12886.080080 False \n", - "1008 1.638285 12937.258790 False \n", - "1009 1.633714 12988.305660 False \n", - "1010 1.633440 13039.687500 False \n", - "\n", - " authdevicename_loss accessdevicebrowser_z_loss authdevicename_pred \\\n", - "0 0.970518 6.703687 \n", - "1 0.972319 6.696225 \n", - "2 0.973570 6.674241 \n", - "3 0.975107 6.626240 \n", - "4 0.976321 6.544739 \n", - "... ... ... ... \n", - "1006 0.213816 33.274307 \n", - "1007 0.210325 33.348270 \n", - "1008 0.209136 33.416397 \n", - "1009 0.210214 33.478695 \n", - "1010 0.209027 33.546825 \n", - "\n", - " model_version event_time \n", - "0 DFP-duo-attacktarget:12 2023-07-20T20:18:56Z \n", - "1 DFP-duo-attacktarget:12 2023-07-20T20:18:56Z \n", - "2 DFP-duo-attacktarget:12 2023-07-20T20:18:56Z \n", - "3 DFP-duo-attacktarget:12 2023-07-20T20:18:56Z \n", - "4 DFP-duo-attacktarget:12 2023-07-20T20:18:56Z \n", - "... ... ... \n", - "1006 DFP-duo-generic_user:50 2023-07-20T20:18:56Z \n", - "1007 DFP-duo-generic_user:50 2023-07-20T20:18:56Z \n", - "1008 DFP-duo-generic_user:50 2023-07-20T20:18:56Z \n", - "1009 DFP-duo-generic_user:50 2023-07-20T20:18:56Z \n", - "1010 DFP-duo-generic_user:50 2023-07-20T20:18:56Z \n", - "\n", - "[1011 rows x 36 columns]" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df = cudf.read_csv(\"dfp_detections_duo.csv\")\n", "df" diff --git a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_training.ipynb b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_training.ipynb index c45065ae09..1b1837d3e5 100644 --- a/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_training.ipynb +++ b/examples/digital_fingerprinting/production/morpheus/notebooks/dfp_duo_training.ipynb @@ -17,7 +17,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "b6c1cb50-74f2-445d-b865-8c22c3b3798b", "metadata": {}, "outputs": [], @@ -33,32 +33,10 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "102ce011-3ca3-4f96-a72d-de28fad32003", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/opt/conda/envs/morpheus/lib/python3.10/site-packages/merlin/dtypes/mappings/tf.py:52: UserWarning: Tensorflow dtype mappings did not load successfully due to an error: No module named 'tensorflow'\n", - " warn(f\"Tensorflow dtype mappings did not load successfully due to an error: {exc.msg}\")\n" - ] - }, - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "import functools\n", "import logging\n", @@ -126,7 +104,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "9ee00703-75c5-46fc-890c-86733da906c4", "metadata": {}, "outputs": [], @@ -176,7 +154,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "586e6b5e", "metadata": {}, "outputs": [], @@ -205,7 +183,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "01abd537-9162-49dc-8e83-d9465592f1d5", "metadata": {}, "outputs": [], @@ -230,7 +208,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "id": "a73a4d53-32b6-4ab8-a5d7-c0104b31c69b", "metadata": {}, "outputs": [], @@ -265,7 +243,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "id": "f7a0cb0a-e65a-444a-a06c-a4525d543790", "metadata": {}, "outputs": [], @@ -389,299 +367,10 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "id": "825390ad-ce64-4949-b324-33039ffdf264", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "====Registering Pipeline====\u001b[0m\n", - "====Building Pipeline====\u001b[0m\n", - "====Building Pipeline Complete!====\u001b[0m\n", - "\u001b[2mStarting! Time: 1689884085.1433885\u001b[0m\n", - "====Registering Pipeline Complete!====\u001b[0m\n", - "====Starting Pipeline====\u001b[0m\n", - "====Pipeline Started====\u001b[0m\n", - "====Building Segment: linear_segment_0====\u001b[0m\n", - "Added source: \n", - " └─> fsspec.OpenFiles\u001b[0m\n", - "Added stage: , filename_regex=re.compile('(?P\\\\d{4})-(?P\\\\d{1,2})-(?P\\\\d{1,2})T(?P\\\\d{1,2})(:|_|\\\\.)(?P\\\\d{1,2})(:|_|\\\\.)(?P\\\\d{1,2})(?P\\\\.\\\\d{1,6})?Z')), period=D, sampling_rate_s=None, start_time=None, end_time=None, sampling=None)>\n", - " └─ fsspec.OpenFiles -> Tuple[fsspec.core.OpenFiles, int]\u001b[0m\n", - "Added stage: \n", - " └─ Tuple[fsspec.core.OpenFiles, int] -> pandas.DataFrame\u001b[0m\n", - "Added stage: \n", - " └─ pandas.DataFrame -> dfp.DFPMessageMeta\u001b[0m\n", - "Added stage: \n", - " └─ dfp.DFPMessageMeta -> dfp.MultiDFPMessage\u001b[0m\n", - "Added stage: \n", - " └─ dfp.MultiDFPMessage -> dfp.MultiDFPMessage\u001b[0m\n", - "Added stage: \n", - " └─ dfp.MultiDFPMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "Added stage: \n", - " └─ morpheus.MultiAEMessage -> morpheus.MultiAEMessage\u001b[0m\n", - "====Building Segment Complete!====\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 79, Cache: hit, Duration: 4.380464553833008 ms, Rate: 18034.61688347031 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 79 rows from 2022-08-01 03:02:04 to 2022-08-01 23:46:51. Output: 17 users, rows/user min: 2, max: 79, avg: 9.29. Duration: 5.04 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 102, Cache: hit, Duration: 12.609720230102539 ms, Rate: 8088.997863449867 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 102 rows from 2022-08-02 00:37:03 to 2022-08-02 23:30:48. Output: 17 users, rows/user min: 1, max: 102, avg: 12.00. Duration: 12.84 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 100, Cache: hit, Duration: 17.3947811126709 ms, Rate: 5748.850724379446 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 100 rows from 2022-08-03 00:01:48 to 2022-08-03 23:58:52. Output: 16 users, rows/user min: 2, max: 100, avg: 12.50. Duration: 6.92 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 88, Cache: hit, Duration: 29.299020767211914 ms, Rate: 3003.5133494454344 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 88 rows from 2022-08-04 00:16:45 to 2022-08-04 23:50:20. Output: 17 users, rows/user min: 1, max: 88, avg: 10.35. Duration: 7.86 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 79, Cache: hit, Duration: 17.94719696044922 ms, Rate: 4401.801583506031 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 79 rows from 2022-08-05 00:39:29 to 2022-08-05 23:48:15. Output: 17 users, rows/user min: 1, max: 79, avg: 9.29. Duration: 4.53 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 111, Cache: hit, Duration: 42.70601272583008 ms, Rate: 2599.1656189636115 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 111 rows from 2022-08-06 00:04:07 to 2022-08-06 23:55:21. Output: 20 users, rows/user min: 1, max: 111, avg: 11.10. Duration: 5.32 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 72, Cache: hit, Duration: 15.46335220336914 ms, Rate: 4656.1702180147395 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 72 rows from 2022-08-07 01:30:43 to 2022-08-07 23:13:44. Output: 16 users, rows/user min: 1, max: 72, avg: 9.00. Duration: 3.87 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 87, Cache: hit, Duration: 29.29830551147461 ms, Rate: 2969.4550070797322 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 85, Cache: hit, Duration: 17.986774444580078 ms, Rate: 4725.69444076244 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 87 rows from 2022-08-08 00:03:47 to 2022-08-08 23:53:14. Output: 18 users, rows/user min: 1, max: 87, avg: 9.67. Duration: 7.40 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 85 rows from 2022-08-09 00:02:38 to 2022-08-09 23:36:30. Output: 17 users, rows/user min: 1, max: 85, avg: 10.00. Duration: 14.84 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 105, Cache: hit, Duration: 49.715518951416016 ms, Rate: 2112.016573790775 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 105 rows from 2022-08-10 00:06:51 to 2022-08-10 23:49:15. Output: 19 users, rows/user min: 1, max: 105, avg: 11.05. Duration: 8.98 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 103, Cache: hit, Duration: 25.65789222717285 ms, Rate: 4014.359367014505 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 93, Cache: hit, Duration: 28.0303955078125 ms, Rate: 3317.826891671203 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 89, Cache: hit, Duration: 22.166013717651367 ms, Rate: 4015.1558658076174 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 87, Cache: hit, Duration: 4.830837249755859 ms, Rate: 18009.300562629553 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 68, Cache: hit, Duration: 17.812252044677734 ms, Rate: 3817.5970017400614 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 103 rows from 2022-08-11 00:14:46 to 2022-08-11 23:41:19. Output: 17 users, rows/user min: 1, max: 103, avg: 12.12. Duration: 8.70 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 89, Cache: hit, Duration: 28.520584106445312 ms, Rate: 3120.5532000267503 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 90, Cache: hit, Duration: 16.939640045166016 ms, Rate: 5312.981843771991 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 101, Cache: hit, Duration: 28.87582778930664 ms, Rate: 3497.735224664366 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 90, Cache: hit, Duration: 14.493703842163086 ms, Rate: 6209.592867365235 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 108, Cache: hit, Duration: 11.70969009399414 ms, Rate: 9223.130512684775 rows/s\u001b[0m\n", - "\u001b[2mRolling window complete for generic_user in 19.36 ms. Input: 88 rows from 2022-08-04 00:16:45 to 2022-08-04 23:50:20. Output: 369 rows from 2022-08-01 03:02:04 to 2022-08-04 23:50:20\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 100, Cache: hit, Duration: 24.25980567932129 ms, Rate: 4122.044558882785 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 93 rows from 2022-08-12 00:00:36 to 2022-08-12 23:48:12. Output: 16 users, rows/user min: 1, max: 93, avg: 11.62. Duration: 17.04 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 92, Cache: hit, Duration: 24.21116828918457 ms, Rate: 3799.899240760618 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 65, Cache: hit, Duration: 18.159151077270508 ms, Rate: 3579.4624827676753 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 97, Cache: hit, Duration: 14.293909072875977 ms, Rate: 6786.107250679699 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 88, Cache: hit, Duration: 25.652408599853516 ms, Rate: 3430.4770898005463 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 89 rows from 2022-08-13 00:01:19 to 2022-08-13 23:34:01. Output: 18 users, rows/user min: 1, max: 89, avg: 9.89. Duration: 8.38 ms\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 90, Cache: hit, Duration: 35.315513610839844 ms, Rate: 2548.455078177743 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 88, Cache: hit, Duration: 28.81002426147461 ms, Rate: 3054.492394776478 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 99, Cache: hit, Duration: 18.238544464111328 ms, Rate: 5428.064733718528 rows/s\u001b[0m\n", - "\u001b[2mS3 objects to DF complete. Rows: 99, Cache: hit, Duration: 33.796072006225586 ms, Rate: 2929.3345091039923 rows/s\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 87 rows from 2022-08-14 00:28:59 to 2022-08-14 23:42:24. Output: 18 users, rows/user min: 1, max: 87, avg: 9.67. Duration: 3.66 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 68 rows from 2022-08-15 00:15:04 to 2022-08-15 23:14:56. Output: 15 users, rows/user min: 1, max: 68, avg: 9.07. Duration: 8.06 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 89 rows from 2022-08-16 00:18:15 to 2022-08-16 23:58:07. Output: 17 users, rows/user min: 1, max: 89, avg: 10.47. Duration: 12.71 ms\u001b[0m\n", - "\u001b[2mRolling window complete for generic_user in 17.32 ms. Input: 87 rows from 2022-08-08 00:03:47 to 2022-08-08 23:53:14. Output: 718 rows from 2022-08-01 03:02:04 to 2022-08-08 23:53:14\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 90 rows from 2022-08-17 00:04:52 to 2022-08-17 23:48:15. Output: 18 users, rows/user min: 1, max: 90, avg: 10.00. Duration: 9.08 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 101 rows from 2022-08-18 00:00:50 to 2022-08-18 23:53:08. Output: 17 users, rows/user min: 1, max: 101, avg: 11.88. Duration: 6.97 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 90 rows from 2022-08-19 00:01:58 to 2022-08-19 23:21:45. Output: 17 users, rows/user min: 1, max: 90, avg: 10.59. Duration: 24.54 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 108 rows from 2022-08-20 00:03:51 to 2022-08-20 23:54:30. Output: 17 users, rows/user min: 1, max: 108, avg: 12.71. Duration: 4.33 ms\u001b[0m\n", - "\u001b[2mRolling window complete for generic_user in 18.55 ms. Input: 93 rows from 2022-08-12 00:00:36 to 2022-08-12 23:48:12. Output: 1104 rows from 2022-08-01 03:02:04 to 2022-08-12 23:48:12\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 100 rows from 2022-08-21 00:12:40 to 2022-08-21 23:47:09. Output: 19 users, rows/user min: 1, max: 100, avg: 10.53. Duration: 7.92 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 92 rows from 2022-08-22 00:00:32 to 2022-08-22 23:46:41. Output: 17 users, rows/user min: 2, max: 92, avg: 10.82. Duration: 8.62 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 65 rows from 2022-08-23 00:45:51 to 2022-08-23 23:52:47. Output: 17 users, rows/user min: 1, max: 65, avg: 7.65. Duration: 13.93 ms\u001b[0m\n", - "\u001b[2mRolling window complete for generic_user in 13.41 ms. Input: 89 rows from 2022-08-16 00:18:15 to 2022-08-16 23:58:07. Output: 1437 rows from 2022-08-01 03:02:04 to 2022-08-16 23:58:07\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 97 rows from 2022-08-24 00:35:12 to 2022-08-24 23:44:42. Output: 17 users, rows/user min: 1, max: 97, avg: 11.41. Duration: 8.25 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 88 rows from 2022-08-25 00:38:52 to 2022-08-25 23:28:42. Output: 18 users, rows/user min: 1, max: 88, avg: 9.78. Duration: 10.12 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 90 rows from 2022-08-26 00:03:13 to 2022-08-26 23:33:39. Output: 18 users, rows/user min: 1, max: 90, avg: 10.00. Duration: 16.07 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 88 rows from 2022-08-27 00:17:07 to 2022-08-27 23:50:48. Output: 16 users, rows/user min: 1, max: 88, avg: 11.00. Duration: 14.66 ms\u001b[0m\n", - "\u001b[2mRolling window complete for generic_user in 13.56 ms. Input: 108 rows from 2022-08-20 00:03:51 to 2022-08-20 23:54:30. Output: 1826 rows from 2022-08-01 03:02:04 to 2022-08-20 23:54:30\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 99 rows from 2022-08-28 00:26:28 to 2022-08-28 23:45:18. Output: 17 users, rows/user min: 1, max: 99, avg: 11.65. Duration: 14.04 ms\u001b[0m\n", - "\u001b[2mBatch split users complete. Input: 99 rows from 2022-08-29 00:07:32 to 2022-08-29 23:44:47. Output: 17 users, rows/user min: 1, max: 99, avg: 11.65. Duration: 6.24 ms\u001b[0m\n", - "\u001b[2mRolling window complete for generic_user in 13.31 ms. Input: 97 rows from 2022-08-24 00:35:12 to 2022-08-24 23:44:42. Output: 2180 rows from 2022-08-01 03:02:04 to 2022-08-24 23:44:42\u001b[0m\n", - "\u001b[2mRolling window complete for attacktarget in 12.42 ms. Input: 16 rows from 2022-08-25 03:23:50 to 2022-08-25 22:31:15. Output: 302 rows from 2022-08-01 04:38:16 to 2022-08-25 22:31:15\u001b[0m\n", - "\u001b[2mRolling window complete for generic_user in 13.32 ms. Input: 99 rows from 2022-08-28 00:26:28 to 2022-08-28 23:45:18. Output: 2545 rows from 2022-08-01 03:02:04 to 2022-08-28 23:45:18\u001b[0m\n", - "\u001b[2mRolling window complete for juan in 13.01 ms. Input: 18 rows from 2022-08-28 00:26:28 to 2022-08-28 22:28:05. Output: 306 rows from 2022-08-01 04:18:06 to 2022-08-28 22:28:05\u001b[0m\n", - "\u001b[2mRolling window complete for anthony in 12.51 ms. Input: 9 rows from 2022-08-29 04:07:05 to 2022-08-29 23:33:24. Output: 303 rows from 2022-08-01 04:54:59 to 2022-08-29 23:33:24\u001b[0m\n", - "\u001b[2mRolling window complete for benjamin in 12.23 ms. Input: 11 rows from 2022-08-29 00:07:32 to 2022-08-29 23:17:15. Output: 300 rows from 2022-08-01 04:52:20 to 2022-08-29 23:17:15\u001b[0m\n", - "\u001b[2mPreprocessed 369 data for logs in 2022-08-01 03:02:04 to 2022-08-04 23:50:20 in 3404.006004333496 ms\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'...\u001b[0m\n", - "\u001b[2mPreprocessed 718 data for logs in 2022-08-01 03:02:04 to 2022-08-08 23:53:14 in 633.9912414550781 ms\u001b[0m\n", - "\u001b[2mPreprocessed 1104 data for logs in 2022-08-01 03:02:04 to 2022-08-12 23:48:12 in 668.6468124389648 ms\u001b[0m\n", - "\u001b[2mPreprocessed 1437 data for logs in 2022-08-01 03:02:04 to 2022-08-16 23:58:07 in 683.5386753082275 ms\u001b[0m\n", - "\u001b[2mPreprocessed 1826 data for logs in 2022-08-01 03:02:04 to 2022-08-20 23:54:30 in 654.6437740325928 ms\u001b[0m\n", - "\u001b[2mPreprocessed 2180 data for logs in 2022-08-01 03:02:04 to 2022-08-24 23:44:42 in 642.3821449279785 ms\u001b[0m\n", - "\u001b[2mPreprocessed 302 data for logs in 2022-08-01 04:38:16 to 2022-08-25 22:31:15 in 177.2785186767578 ms\u001b[0m\n", - "\u001b[2mPreprocessed 2545 data for logs in 2022-08-01 03:02:04 to 2022-08-28 23:45:18 in 670.7031726837158 ms\u001b[0m\n", - "\u001b[2mPreprocessed 306 data for logs in 2022-08-01 04:18:06 to 2022-08-28 22:28:05 in 190.5825138092041 ms\u001b[0m\n", - "\u001b[2mPreprocessed 303 data for logs in 2022-08-01 04:54:59 to 2022-08-29 23:33:24 in 213.50598335266113 ms\u001b[0m\n", - "\u001b[2mPreprocessed 300 data for logs in 2022-08-01 04:52:20 to 2022-08-29 23:17:15 in 201.75838470458984 ms\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'...\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:14:55 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-generic_user, version 43\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: generic_user:DFP-duo-generic_user:43\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:14:56 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-generic_user, version 44\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: generic_user:DFP-duo-generic_user:44\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:14:57 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-generic_user, version 45\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: generic_user:DFP-duo-generic_user:45\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:14:58 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-generic_user, version 46\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: generic_user:DFP-duo-generic_user:46\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:15:00 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-generic_user, version 47\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: generic_user:DFP-duo-generic_user:47\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'attacktarget'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:15:02 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-generic_user, version 48\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: generic_user:DFP-duo-generic_user:48\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'attacktarget'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:15:03 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-attacktarget, version 11\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: attacktarget:DFP-duo-attacktarget:11\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'generic_user'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'juan'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:15:05 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-generic_user, version 49\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: generic_user:DFP-duo-generic_user:49\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'juan'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'anthony'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:15:06 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-juan, version 11\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: juan:DFP-duo-juan:11\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'anthony'... Complete.\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'benjamin'...\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:15:07 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-anthony, version 11\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: anthony:DFP-duo-anthony:11\u001b[0m\n", - "\u001b[2mTraining AE model for user: 'benjamin'... Complete.\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023/07/20 20:15:08 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: DFP-duo-benjamin, version 11\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2mML Flow model upload complete: benjamin:DFP-duo-benjamin:11\u001b[0m\n", - "====Pipeline Complete====\u001b[0m\n" - ] - } - ], + "outputs": [], "source": [ "# Create a linear pipeline object\n", "pipeline = LinearPipeline(config)\n", @@ -750,9 +439,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python [conda env:morpheus] *", + "display_name": "morpheus", "language": "python", - "name": "conda-env-morpheus-py" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -764,12 +453,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.12" - }, - "vscode": { - "interpreter": { - "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" - } + "version": "3.10.14" } }, "nbformat": 4, diff --git a/examples/digital_fingerprinting/starter/README.md b/examples/digital_fingerprinting/starter/README.md index 013300ceed..89c2e60a66 100644 --- a/examples/digital_fingerprinting/starter/README.md +++ b/examples/digital_fingerprinting/starter/README.md @@ -131,13 +131,13 @@ The following table shows mapping between the main Morpheus CLI commands and und | `sort_glob` | If true the list of files matching `input_glob` will be processed in sorted order. Default is False. | `models_output_filename` | Can be used with `--train_data_glob` to save trained user models to file using provided file path. Models can be loaded later using `--pretrained_filename`. -The `PreprocessAEStage` is responsible for creating a Morpheus message that contains everything needed by the inference stage. For DFP inference, this stage must pass a `MultiInferenceAEMessage` to the inference stage. Each message will correspond to a single user and include the input feature columns, the user's model and training data anomaly scores. +The `PreprocessAEStage` is responsible for creating a Morpheus message that contains everything needed by the inference stage. For DFP inference, this stage must pass a `ControlMessage` to the inference stage. Each message will correspond to a single user and include the input feature columns, the user's model and training data anomaly scores. **Inference stage** - `AutoEncoderInferenceStage` calculates anomaly scores (specifically, reconstruction loss) and z-scores for each user input dataset. **Post-processing stage** - The DFP pipeline uses the `AddScoresStage` for post-processing to add anomaly scores and z-scores from previous inference stage with matching labels. -**Serialize stage** - `SerializeStage` is used to convert `MultiResponseMessage` from previous stage to a `MessageMeta` to make it suitable for output (for example writing to file or Kafka). +**Serialize stage** - `SerializeStage` is used to convert `ControlMessage` from previous stage to a `MessageMeta` to make it suitable for output (for example writing to file or Kafka). **Write stage** - `WriteToFileStage` writes input data with inference results to an output file path. diff --git a/examples/doca/README.md b/examples/doca/README.md index c7c5d21edf..2e79a256e4 100644 --- a/examples/doca/README.md +++ b/examples/doca/README.md @@ -151,21 +151,21 @@ Added source: └─ morpheus.MessageMeta -> morpheus.MessageMeta Added stage: - └─ morpheus.MessageMeta -> morpheus.MultiMessage + └─ morpheus.MessageMeta -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiMessage -> morpheus.MultiMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiMessage -> morpheus.MultiInferenceNLPMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiInferenceNLPMessage -> morpheus.MultiInferenceNLPMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiInferenceNLPMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage ====Building Segment Complete!==== Stopping pipeline. Please wait... Press Ctrl+C again to kill. DOCA GPUNetIO rate: 0 pkts [00:09, ? pkts/s] diff --git a/examples/gnn_fraud_detection_pipeline/README.md b/examples/gnn_fraud_detection_pipeline/README.md index 9084471400..8bb1ab1570 100644 --- a/examples/gnn_fraud_detection_pipeline/README.md +++ b/examples/gnn_fraud_detection_pipeline/README.md @@ -72,38 +72,38 @@ python examples/gnn_fraud_detection_pipeline/run.py ``` ====Registering Pipeline==== ====Building Pipeline==== -Graph construction rate: 0 messages [00:00, ? me====Building Pipeline Complete!==== -Inference rate: 0 messages [00:00, ? messages/s]====Registering Pipeline Complete!==== +====Building Pipeline Complete!==== +====Registering Pipeline Complete!==== ====Starting Pipeline==== -====Pipeline Started==== 0 messages [00:00, ? messages/s] -====Building Segment: linear_segment_0====ges/s] -Added source: +====Pipeline Started==== +====Building Segment: linear_segment_0==== +Added source: └─> morpheus.MessageMeta -Added stage: - └─ morpheus.MessageMeta -> morpheus.MultiMessage -Added stage: - └─ morpheus.MultiMessage -> stages.FraudGraphMultiMessage +Added stage: + └─ morpheus.MessageMeta -> morpheus.ControlMessage +Added stage: + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ stages.FraudGraphMultiMessage -> stages.FraudGraphMultiMessage -Added stage: - └─ stages.FraudGraphMultiMessage -> stages.GraphSAGEMultiMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage +Added stage: + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ stages.GraphSAGEMultiMessage -> stages.GraphSAGEMultiMessage -Added stage: - └─ stages.GraphSAGEMultiMessage -> morpheus.MultiMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage +Added stage: + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiMessage -> morpheus.MultiMessage -Added stage: - └─ morpheus.MultiMessage -> morpheus.MessageMeta + └─ morpheus.ControlMessage -> morpheus.ControlMessage +Added stage: + └─ morpheus.ControlMessage -> morpheus.MessageMeta Added stage: └─ morpheus.MessageMeta -> morpheus.MessageMeta Added stage: └─ morpheus.MessageMeta -> morpheus.MessageMeta ====Building Segment Complete!==== -Graph construction rate[Complete]: 265 messages [00:00, 1218.88 messages/s] -Inference rate[Complete]: 265 messages [00:01, 174.04 messages/s] -Add classification rate[Complete]: 265 messages [00:01, 170.69 messages/s] -Serialize rate[Complete]: 265 messages [00:01, 166.36 messages/s] +Graph construction rate[Complete]: 265 messages [00:00, 1016.18 messages/s] +Inference rate[Complete]: 265 messages [00:00, 545.08 messages/s] +Add classification rate[Complete]: 265 messages [00:00, 492.11 messages/s] +Serialize rate[Complete]: 265 messages [00:00, 480.77 messages/s] ====Pipeline Complete==== ``` diff --git a/examples/gnn_fraud_detection_pipeline/stages/classification_stage.py b/examples/gnn_fraud_detection_pipeline/stages/classification_stage.py index d4daad44df..013034dcef 100644 --- a/examples/gnn_fraud_detection_pipeline/stages/classification_stage.py +++ b/examples/gnn_fraud_detection_pipeline/stages/classification_stage.py @@ -13,6 +13,8 @@ # See the License for the specific language governing permissions and # limitations under the License. +import typing + import mrc from mrc.core import operators as ops @@ -22,12 +24,10 @@ from morpheus.common import TypeId from morpheus.config import Config from morpheus.config import PipelineModes -from morpheus.messages import MultiMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.pipeline.stage_schema import StageSchema -from .graph_sage_stage import GraphSAGEMultiMessage - @register_stage("gnn-fraud-classification", modes=[PipelineModes.OTHER]) class ClassificationStage(SinglePortStage): @@ -53,25 +53,29 @@ def __init__(self, c: Config, model_xgb_file: str): def name(self) -> str: return "gnn-fraud-classification" - def accepted_types(self) -> (GraphSAGEMultiMessage, ): - return (GraphSAGEMultiMessage, ) + def accepted_types(self) -> typing.Tuple: + return (ControlMessage, ) def compute_schema(self, schema: StageSchema): - schema.output_schema.set_type(MultiMessage) + schema.output_schema.set_type(ControlMessage) def supports_cpp_node(self) -> bool: return False - def _process_message(self, message: GraphSAGEMultiMessage) -> GraphSAGEMultiMessage: - ind_emb_columns = message.get_meta(message.inductive_embedding_column_names) - message.set_meta("node_id", message.node_identifiers) + def _process_message(self, message: ControlMessage) -> ControlMessage: + + inductive_embedding_column_names = message.get_metadata("inductive_embedding_column_names") + ind_emb_columns = message.payload().get_data(inductive_embedding_column_names) + + node_identifiers = message.get_metadata("node_identifiers") + message.payload().set_data("node_id", node_identifiers) # The XGBoost model is returning two probabilities for the binary classification. The first (column 0) is # probability that the transaction is in the benign class, and the second (column 1) is the probability that # the transaction is in the fraudulent class. Added together the two values will always equal 1. prediction = self._xgb_model.predict_proba(ind_emb_columns).iloc[:, 1] - message.set_meta("prediction", prediction) + message.payload().set_data("prediction", prediction) return message diff --git a/examples/gnn_fraud_detection_pipeline/stages/graph_construction_stage.py b/examples/gnn_fraud_detection_pipeline/stages/graph_construction_stage.py index da00dacf73..c7526a3d4c 100644 --- a/examples/gnn_fraud_detection_pipeline/stages/graph_construction_stage.py +++ b/examples/gnn_fraud_detection_pipeline/stages/graph_construction_stage.py @@ -13,10 +13,9 @@ # See the License for the specific language governing permissions and # limitations under the License. -import dataclasses import pathlib +import typing -import dgl import mrc import torch from mrc.core import operators as ops @@ -26,8 +25,7 @@ from morpheus.cli.register_stage import register_stage from morpheus.config import Config from morpheus.config import PipelineModes -from morpheus.messages import MultiMessage -from morpheus.messages.message_meta import MessageMeta +from morpheus.messages import ControlMessage from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.pipeline.stage_schema import StageSchema @@ -35,24 +33,6 @@ from .model import prepare_data -@dataclasses.dataclass -class FraudGraphMultiMessage(MultiMessage): - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - graph: dgl.DGLHeteroGraph, - node_features=torch.tensor, - test_index: torch.tensor): - super().__init__(meta=meta, mess_offset=mess_offset, mess_count=mess_count) - - self.graph = graph - self.node_features = node_features - self.test_index = test_index - - @register_stage("fraud-graph-construction", modes=[PipelineModes.OTHER]) class FraudGraphConstructionStage(SinglePortStage): @@ -75,18 +55,19 @@ def __init__(self, config: Config, training_file: pathlib.Path): def name(self) -> str: return "fraud-graph-construction" - def accepted_types(self) -> (MultiMessage, ): - return (MultiMessage, ) + def accepted_types(self) -> typing.Tuple: + return (ControlMessage, ) def compute_schema(self, schema: StageSchema): - schema.output_schema.set_type(FraudGraphMultiMessage) + schema.output_schema.set_type(ControlMessage) def supports_cpp_node(self) -> bool: return False - def _process_message(self, message: MultiMessage) -> FraudGraphMultiMessage: + def _process_message(self, message: ControlMessage) -> ControlMessage: - _, _, _, test_index, _, graph_data = prepare_data(self._training_data, message.get_meta(self._column_names)) + _, _, _, test_index, _, graph_data = prepare_data(self._training_data, + message.payload().get_data(self._column_names)) # meta columns to remove as node features meta_cols = ['client_node', 'merchant_node', 'index'] @@ -96,10 +77,11 @@ def _process_message(self, message: MultiMessage) -> FraudGraphMultiMessage: test_index = torch.from_dlpack(test_index.values.toDlpack()).long() node_features = node_features.float() - return FraudGraphMultiMessage.from_message(message, - graph=graph, - node_features=node_features.float(), - test_index=test_index) + message.set_metadata("graph", graph) + message.set_metadata("node_features", node_features.float()) + message.set_metadata("test_index", test_index) + + return message def _build_single(self, builder: mrc.Builder, input_node: mrc.SegmentObject) -> mrc.SegmentObject: node = builder.make_node(self.unique_name, ops.map(self._process_message)) diff --git a/examples/gnn_fraud_detection_pipeline/stages/graph_sage_stage.py b/examples/gnn_fraud_detection_pipeline/stages/graph_sage_stage.py index b3c3c360f6..44e67e1f7d 100644 --- a/examples/gnn_fraud_detection_pipeline/stages/graph_sage_stage.py +++ b/examples/gnn_fraud_detection_pipeline/stages/graph_sage_stage.py @@ -13,7 +13,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -import dataclasses +import typing import mrc from mrc.core import operators as ops @@ -23,33 +23,13 @@ from morpheus.cli.register_stage import register_stage from morpheus.config import Config from morpheus.config import PipelineModes -from morpheus.messages import MultiMessage -from morpheus.messages.message_meta import MessageMeta +from morpheus.messages import ControlMessage from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.pipeline.stage_schema import StageSchema -from .graph_construction_stage import FraudGraphMultiMessage from .model import load_model -@dataclasses.dataclass -class GraphSAGEMultiMessage(MultiMessage): - node_identifiers: list[int] - inductive_embedding_column_names: list[str] - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - node_identifiers: list[int], - inductive_embedding_column_names: list[str]): - super().__init__(meta=meta, mess_offset=mess_offset, mess_count=mess_count) - - self.node_identifiers = node_identifiers - self.inductive_embedding_column_names = inductive_embedding_column_names - - @register_stage("gnn-fraud-sage", modes=[PipelineModes.OTHER]) class GraphSAGEStage(SinglePortStage): @@ -70,23 +50,23 @@ def __init__(self, def name(self) -> str: return "gnn-fraud-sage" - def accepted_types(self) -> (FraudGraphMultiMessage, ): - return (FraudGraphMultiMessage, ) + def accepted_types(self) -> typing.Tuple: + return (ControlMessage, ) def compute_schema(self, schema: StageSchema): - schema.output_schema.set_type(GraphSAGEMultiMessage) + schema.output_schema.set_type(ControlMessage) def supports_cpp_node(self) -> bool: return False - def _process_message(self, message: FraudGraphMultiMessage) -> GraphSAGEMultiMessage: + def _process_message(self, message: ControlMessage) -> ControlMessage: - node_identifiers = list(message.get_meta(self._record_id).to_pandas()) + node_identifiers = list(message.payload().get_data(self._record_id).to_pandas()) # Perform inference - inductive_embedding, _ = self._dgl_model.inference(message.graph, - message.node_features, - message.test_index, + inductive_embedding, _ = self._dgl_model.inference(message.get_metadata("graph"), + message.get_metadata("node_features"), + message.get_metadata("test_index"), batch_size=self._batch_size) inductive_embedding = cudf.DataFrame(inductive_embedding) @@ -94,17 +74,16 @@ def _process_message(self, message: FraudGraphMultiMessage) -> GraphSAGEMultiMes # Rename the columns to be more descriptive inductive_embedding.rename(lambda x: "ind_emb_" + str(x), axis=1, inplace=True) - for col in inductive_embedding.columns.values.tolist(): - # without `to_pandas`, all values in the meta become `` - message.set_meta(col, inductive_embedding[col].to_pandas()) + with message.payload().mutable_dataframe() as df: + for col in inductive_embedding.columns.values.tolist(): + df[col] = inductive_embedding[col] + + assert (message.payload().count == len(inductive_embedding)) - assert (message.mess_count == len(inductive_embedding)) + message.set_metadata("node_identifiers", node_identifiers) + message.set_metadata("inductive_embedding_column_names", inductive_embedding.columns.values.tolist()) - return GraphSAGEMultiMessage(meta=message.meta, - node_identifiers=node_identifiers, - inductive_embedding_column_names=inductive_embedding.columns.values.tolist(), - mess_offset=message.mess_offset, - mess_count=message.mess_count) + return message def _build_single(self, builder: mrc.Builder, input_node: mrc.SegmentObject) -> mrc.SegmentObject: node = builder.make_node(self.unique_name, ops.map(self._process_message)) diff --git a/examples/gnn_fraud_detection_pipeline/stages/model.py b/examples/gnn_fraud_detection_pipeline/stages/model.py index d82419e87e..c3c8c9a8f6 100644 --- a/examples/gnn_fraud_detection_pipeline/stages/model.py +++ b/examples/gnn_fraud_detection_pipeline/stages/model.py @@ -15,6 +15,7 @@ import os import pickle +import typing import cupy import dgl @@ -472,8 +473,8 @@ def build_fsi_graph(train_data: cf.DataFrame, col_drop: list[str]) -> (dgl.DGLHe def prepare_data( - training_data: cf.DataFrame, - test_data: cf.DataFrame) -> (cf.DataFrame, cf.DataFrame, cf.Series, cf.Series, cupy.ndarray, cf.DataFrame): + training_data: cf.DataFrame, test_data: cf.DataFrame +) -> typing.Union[cf.DataFrame, cf.DataFrame, cf.Series, cf.Series, cupy.ndarray, cf.DataFrame]: """Process data for training/inference operation Parameters diff --git a/examples/nlp_si_detection/README.md b/examples/nlp_si_detection/README.md index d08df4ffed..1d24fea105 100644 --- a/examples/nlp_si_detection/README.md +++ b/examples/nlp_si_detection/README.md @@ -176,17 +176,17 @@ CPP Enabled: True Added source: └─> morpheus.MessageMeta Added stage: - └─ morpheus.MessageMeta -> morpheus.MultiMessage + └─ morpheus.MessageMeta -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiMessage -> morpheus.MultiInferenceNLPMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiInferenceNLPMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage-> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MessageMeta + └─ morpheus.ControlMessage -> morpheus.MessageMeta Added stage: └─ morpheus.MessageMeta -> morpheus.MessageMeta ====Building Pipeline Complete!==== diff --git a/examples/root_cause_analysis/README.md b/examples/root_cause_analysis/README.md index c7273f7761..5d038fa959 100644 --- a/examples/root_cause_analysis/README.md +++ b/examples/root_cause_analysis/README.md @@ -166,17 +166,17 @@ Starting! Time: 1668537665.9479523 Added source: └─> morpheus.MessageMeta Added stage: - └─ morpheus.MessageMeta -> morpheus.MultiMessage + └─ morpheus.MessageMeta -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiMessage -> morpheus.MultiInferenceNLPMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiInferenceNLPMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessage -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MultiResponseMessage + └─ morpheus.ControlMessagee -> morpheus.ControlMessage Added stage: - └─ morpheus.MultiResponseMessage -> morpheus.MessageMeta + └─ morpheus.ControlMessage -> morpheus.MessageMeta Added stage: └─ morpheus.MessageMeta -> morpheus.MessageMeta Inference rate[Complete]: 473 inf [00:01, 340.43 inf/s] diff --git a/python/morpheus/morpheus/_lib/cmake/libmorpheus.cmake b/python/morpheus/morpheus/_lib/cmake/libmorpheus.cmake index f7e8cd6860..62c33b96c4 100644 --- a/python/morpheus/morpheus/_lib/cmake/libmorpheus.cmake +++ b/python/morpheus/morpheus/_lib/cmake/libmorpheus.cmake @@ -35,13 +35,6 @@ add_library(morpheus src/messages/memory/response_memory.cpp src/messages/memory/tensor_memory.cpp src/messages/meta.cpp - src/messages/multi_inference_fil.cpp - src/messages/multi_inference_nlp.cpp - src/messages/multi_inference.cpp - src/messages/multi_response_probs.cpp - src/messages/multi_response.cpp - src/messages/multi_tensor.cpp - src/messages/multi.cpp src/messages/raw_packet.cpp src/modules/data_loader_module.cpp src/objects/data_table.cpp diff --git a/python/morpheus/morpheus/_lib/include/morpheus/messages/control.hpp b/python/morpheus/morpheus/_lib/include/morpheus/messages/control.hpp index 4cd66c6370..22e668cfe3 100644 --- a/python/morpheus/morpheus/_lib/include/morpheus/messages/control.hpp +++ b/python/morpheus/morpheus/_lib/include/morpheus/messages/control.hpp @@ -19,10 +19,10 @@ #include "morpheus/export.h" // for MORPHEUS_EXPORT #include "morpheus/messages/meta.hpp" // for MessageMeta -#include "morpheus/types.hpp" // for TensorIndex #include "morpheus/utilities/json_types.hpp" // for json_t #include // for object, dict, list +#include // IWYU pragma: keep #include // for system_clock, time_point #include // for map diff --git a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi.hpp b/python/morpheus/morpheus/_lib/include/morpheus/messages/multi.hpp deleted file mode 100644 index b4ba86b39a..0000000000 --- a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi.hpp +++ /dev/null @@ -1,392 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#pragma once - -#include "morpheus/export.h" -#include "morpheus/messages/meta.hpp" -#include "morpheus/objects/table_info.hpp" -#include "morpheus/objects/tensor_object.hpp" -#include "morpheus/types.hpp" // for TensorIndex - -#include // for MRC_PTR_CAST -#include -#include // IWYU pragma: keep - -#include -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -/****** MultiMessage****************************************/ - -/** - * @addtogroup messages - * @{ - * @file - */ - -class MORPHEUS_EXPORT MultiMessage; - -/** - * @brief All classes that are derived from MultiMessage should use this class. It will automatically add the - * `get_slice` function with the correct return type. Uses the CRTP pattern. This supports multiple base classes but in - * reality it should not be used. Multiple base classes are used to make template specialization easier. - * - * @tparam DerivedT The deriving class. Should be used like `class MyDerivedMultiMessage: public - * DerivedMultiMessage` - * @tparam BasesT The base classes that the class should derive from. If the class should function like `class - * MyDerivedMultiMessage: public MyBaseMultiMessage`, then `class MyDerivedMultiMessage: public - * DerivedMultiMessage` shoud be used. - */ -template -class MORPHEUS_EXPORT DerivedMultiMessage : public BasesT... -{ - public: - virtual ~DerivedMultiMessage() = default; - - /** - * @brief Creates a copy of the current message calculating new `mess_offset` and `mess_count` values based on the - * given `start` & `stop` values. This method is reletively light-weight as it does not copy the underlying `meta` - * and the actual slicing of the dataframe is applied later when `get_meta` is called. - * - * @param start - * @param stop - * @return std::shared_ptr - */ - std::shared_ptr get_slice(TensorIndex start, TensorIndex stop) const - { - std::shared_ptr new_message = this->clone_impl(); - - this->get_slice_impl(new_message, start, stop); - - return MRC_PTR_CAST(DerivedT, new_message); - } - - /** - * @brief Creates a deep copy of the current message along with a copy of the underlying `meta` selecting the rows - * of `meta` defined by pairs of start, stop rows expressed in the `ranges` argument. - * - * This allows for copying several non-contiguous rows from the underlying dataframe into a new dataframe, however - * this comes at a much higher cost compared to the `get_slice` method. - * - * @param ranges - * @param num_selected_rows - * @return std::shared_ptr - */ - std::shared_ptr copy_ranges(const std::vector& ranges, TensorIndex num_selected_rows) const - { - std::shared_ptr new_message = this->clone_impl(); - - this->copy_ranges_impl(new_message, ranges, num_selected_rows); - - return MRC_PTR_CAST(DerivedT, new_message); - } - - protected: - /** - * @brief Applies a slice of the attribures contained in `new_message`. Subclasses need only be concerned with their - * own attributes, and can safely avoid overriding this method if they don't add any new attributes to their base. - * - * @param new_message - * @param start - * @param stop - */ - virtual void get_slice_impl(std::shared_ptr new_message, - TensorIndex start, - TensorIndex stop) const = 0; - - /** - * @brief Similar to `get_slice_impl`, performs a copy of all attributes in `new_message` according to the rows - * specified by `ranges`. Subclasses need only be concerned with copying their own attributes, and can safely avoid - * overriding this method if they don't add any new attributes to their base. - * - * @param new_message - * @param ranges - * @param num_selected_rows - */ - virtual void copy_ranges_impl(std::shared_ptr new_message, - const std::vector& ranges, - TensorIndex num_selected_rows) const = 0; - - private: - virtual std::shared_ptr clone_impl() const - { - // Cast `this` to the derived type - auto derived_this = static_cast(this); - - // Use copy constructor to make a clone - return std::make_shared(*derived_this); - } -}; - -// Single base class version. Should be the version used by default -template -class MORPHEUS_EXPORT DerivedMultiMessage : public BaseT -{ - public: - using BaseT::BaseT; - ~DerivedMultiMessage() override = default; - - std::shared_ptr get_slice(TensorIndex start, TensorIndex stop) const - { - std::shared_ptr new_message = this->clone_impl(); - - this->get_slice_impl(new_message, start, stop); - - return MRC_PTR_CAST(DerivedT, new_message); - } - - std::shared_ptr copy_ranges(const std::vector& ranges, TensorIndex num_selected_rows) const - { - std::shared_ptr new_message = this->clone_impl(); - - this->copy_ranges_impl(new_message, ranges, num_selected_rows); - - return MRC_PTR_CAST(DerivedT, new_message); - } - - protected: - void get_slice_impl(std::shared_ptr new_message, TensorIndex start, TensorIndex stop) const override - { - return BaseT::get_slice_impl(new_message, start, stop); - } - - void copy_ranges_impl(std::shared_ptr new_message, - const std::vector& ranges, - TensorIndex num_selected_rows) const override - { - return BaseT::copy_ranges_impl(new_message, ranges, num_selected_rows); - } - - private: - std::shared_ptr clone_impl() const override - { - // Cast `this` to the derived type - auto derived_this = static_cast(this); - - // Use copy constructor to make a clone - return std::make_shared(*derived_this); - } -}; - -// No base class version. This should only be used by `MultiMessage` itself. -template -class MORPHEUS_EXPORT DerivedMultiMessage -{ - public: - virtual ~DerivedMultiMessage() = default; - - std::shared_ptr get_slice(TensorIndex start, TensorIndex stop) const - { - std::shared_ptr new_message = this->clone_impl(); - - this->get_slice_impl(new_message, start, stop); - - return MRC_PTR_CAST(DerivedT, new_message); - } - - std::shared_ptr copy_ranges(const std::vector& ranges, TensorIndex num_selected_rows) const - { - std::shared_ptr new_message = this->clone_impl(); - - this->copy_ranges_impl(new_message, ranges, num_selected_rows); - - return MRC_PTR_CAST(DerivedT, new_message); - } - - protected: - virtual void get_slice_impl(std::shared_ptr new_message, - TensorIndex start, - TensorIndex stop) const = 0; - - virtual void copy_ranges_impl(std::shared_ptr new_message, - const std::vector& ranges, - TensorIndex num_selected_rows) const = 0; - - private: - virtual std::shared_ptr clone_impl() const - { - // Cast `this` to the derived type - auto derived_this = static_cast(this); - - // Use copy constructor to make a clone - return std::make_shared(*derived_this); - } -}; - -/** - * @brief This class holds data for multiple messages (rows in a DataFrame) at a time. To avoid copying data for - slicing operations, it holds a reference to a batched metadata object and stores the offset and count into that batch. - * - */ -class MORPHEUS_EXPORT MultiMessage : public DerivedMultiMessage -{ - public: - /** - * @brief Default copy constructor - */ - MultiMessage(const MultiMessage& other) = default; - /** - * @brief Construct a new Multi Message object - * - * @param m : Deserialized messages metadata for large batch - * @param o : Offset into the metadata batch - * @param c : Messages count - */ - MultiMessage(std::shared_ptr m, TensorIndex offset = 0, TensorIndex count = -1); - - std::shared_ptr meta; - TensorIndex mess_offset{0}; - TensorIndex mess_count{0}; - - std::vector get_meta_column_names() const; - - /** - * @brief Get the meta object - * - * @return TableInfo - */ - TableInfo get_meta(); - - /** - * @brief Returns column value from a meta object. - * - * @param col_name - * @throws std::runtime_error - * @throws std::runtime_error - * @return TableInfo - */ - TableInfo get_meta(const std::string& col_name); - - /** - * @brief Returns columns value from a meta object. When `columns_names` is empty all columns are returned. - * - * @param column_names - * @throws std::runtime_error - * @return TableInfo - */ - TableInfo get_meta(const std::vector& column_names); - - /** - * @brief Set the meta object with a given column name - * - * @param col_name - * @param tensor - */ - void set_meta(const std::string& col_name, TensorObject tensor); - - /** - * @brief Set the meta object with a given column names - * - * @param column_names - * @param tensors - */ - void set_meta(const std::vector& column_names, const std::vector& tensors); - - protected: - void get_slice_impl(std::shared_ptr new_message, TensorIndex start, TensorIndex stop) const override; - - void copy_ranges_impl(std::shared_ptr new_message, - const std::vector& ranges, - TensorIndex num_selected_rows) const override; - - /** - * @brief Creates a deep copy of `meta` with the specified ranges. - * - * @param ranges - * @return std::shared_ptr - */ - virtual std::shared_ptr copy_meta_ranges(const std::vector& ranges) const; - - /** - * @brief Applies the message offset to the elements in `ranges` casting the results to `TensorIndex` - * - * @param offset - * @param ranges - * @return std::vector - */ - std::vector apply_offset_to_ranges(TensorIndex offset, const std::vector& ranges) const; -}; - -/****** MultiMessageInterfaceProxy**************************/ -/** - * @brief Interface proxy, used to insulate python bindings. - */ -struct MORPHEUS_EXPORT MultiMessageInterfaceProxy -{ - /** - * TODO(Documentation) - */ - static std::shared_ptr init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count); - - /** - * TODO(Documentation) - */ - static std::shared_ptr meta(const MultiMessage& self); - - /** - * TODO(Documentation) - */ - static TensorIndex mess_offset(const MultiMessage& self); - - /** - * TODO(Documentation) - */ - static TensorIndex mess_count(const MultiMessage& self); - - static std::vector get_meta_column_names(const MultiMessage& self); - - /** - * TODO(Documentation) - */ - static pybind11::object get_meta(MultiMessage& self); - - /** - * TODO(Documentation) - */ - static pybind11::object get_meta(MultiMessage& self, std::string col_name); - - /** - * TODO(Documentation) - */ - static pybind11::object get_meta(MultiMessage& self, std::vector columns); - - // This overload is necessary to match the python signature where you can call self.get_meta(None) - static pybind11::object get_meta(MultiMessage& self, pybind11::none none_obj); - - static pybind11::object get_meta_list(MultiMessage& self, pybind11::object col_name); - - /** - * TODO(Documentation) - */ - static void set_meta(MultiMessage& self, pybind11::object columns, pybind11::object value); - - /** - * TODO(Documentation) - */ - static std::shared_ptr get_slice(MultiMessage& self, TensorIndex start, TensorIndex stop); - - static std::shared_ptr copy_ranges(MultiMessage& self, - const std::vector& ranges, - pybind11::object num_selected_rows); -}; -/** @} */ // end of group -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_inference.hpp b/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_inference.hpp deleted file mode 100644 index bbd93785e3..0000000000 --- a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_inference.hpp +++ /dev/null @@ -1,126 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#pragma once - -#include "morpheus/export.h" -#include "morpheus/messages/memory/tensor_memory.hpp" -#include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi.hpp" -#include "morpheus/messages/multi_tensor.hpp" -#include "morpheus/objects/tensor_object.hpp" -#include "morpheus/types.hpp" // for TensorIndex - -#include -#include - -namespace morpheus { -/****** Component public implementations********************/ -/****** MultiInferenceMessage*******************************/ - -/** - * @addtogroup messages - * @{ - * @file - */ - -/** - * This is a container class that holds a pointer to an instance of the TensorMemory container and the metadata - * of the data contained within it. Builds on top of the `MultiInferenceMessage` and `MultiTensorMessage` class - * to add additional data for inferencing. - */ - -class MORPHEUS_EXPORT MultiInferenceMessage : public DerivedMultiMessage -{ - public: - /** - * @brief Default copy constructor - */ - MultiInferenceMessage(const MultiInferenceMessage& other) = default; - /** - * @brief Construct a new Multi Inference Message object - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Holds the generic tensor data in cupy arrays that will be used for inference stages - * @param offset Message offset in inference memory instance - * @param count Message count in inference memory instance - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - */ - MultiInferenceMessage(std::shared_ptr meta, - TensorIndex mess_offset = 0, - TensorIndex mess_count = -1, - std::shared_ptr memory = nullptr, - TensorIndex offset = 0, - TensorIndex count = -1, - std::string id_tensor_name = "seq_ids"); - - /** - * @brief Returns the input tensor for the given `name`. - * - * @param name - * @return const TensorObject - * @throws std::runtime_error If no tensor matching `name` exists - */ - const TensorObject get_input(const std::string& name) const; - - /** - * @brief Returns the input tensor for the given `name`. - * - * @param name - * @return TensorObject - * @throws std::runtime_error If no tensor matching `name` exists - */ - TensorObject get_input(const std::string& name); - - /** - * Update the value of ain input tensor. The tensor must already exist, otherwise this will halt on a fatal error. - */ - void set_input(const std::string& name, const TensorObject& value); -}; - -/****** MultiInferenceMessageInterfaceProxy****************/ -/** - * @brief Interface proxy, used to insulate python bindings. - */ -struct MORPHEUS_EXPORT MultiInferenceMessageInterfaceProxy : public MultiTensorMessageInterfaceProxy -{ - /** - * @brief Create and initialize a MultiInferenceMessage object, and return a shared pointer to the result - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Holds the generic tensor data in cupy arrays that will be used for inference stages - * @param offset Message offset in inference memory instance - * @param count Message count in inference memory instance - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - * @return std::shared_ptr - */ - static std::shared_ptr init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name); -}; -/** @} */ // end of group -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_inference_fil.hpp b/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_inference_fil.hpp deleted file mode 100644 index 9908ec0ea6..0000000000 --- a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_inference_fil.hpp +++ /dev/null @@ -1,152 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#pragma once - -#include "morpheus/export.h" -#include "morpheus/messages/memory/tensor_memory.hpp" -#include "morpheus/messages/meta.hpp" // for MessageMeta -#include "morpheus/messages/multi.hpp" -#include "morpheus/messages/multi_inference.hpp" -#include "morpheus/objects/tensor_object.hpp" -#include "morpheus/types.hpp" // for TensorIndex - -#include // for object - -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -/****** MultiInferenceFILMessage****************************************/ - -/** - * @addtogroup messages - * @{ - * @file - */ - -/** - * A stronger typed version of `MultiInferenceMessage` that is used for FIL workloads. Helps ensure the - * proper inputs are set and eases debugging. - * - */ - -class MORPHEUS_EXPORT MultiInferenceFILMessage - : public DerivedMultiMessage -{ - public: - /** - * @brief Construct a new Multi Inference FIL Message object - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Holds the generic tensor data in cupy arrays that will be used for inference stages - * @param offset Message offset in inference memory object - * @param count Message count in inference memory object - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - */ - MultiInferenceFILMessage(std::shared_ptr meta, - TensorIndex mess_offset = 0, - TensorIndex mess_count = -1, - std::shared_ptr memory = nullptr, - TensorIndex offset = 0, - TensorIndex count = -1, - std::string id_tensor_name = "seq_ids"); - - /** - * @brief Returns the 'input__0' tensor, throws a `std::runtime_error` if it does not exist - * - * @param name - * @return const TensorObject - * @throws std::runtime_error If no tensor named "input__0" exists - */ - const TensorObject get_input__0() const; - - /** - * @brief Sets a tensor named 'input__0' - * - * @param input__0 - */ - void set_input__0(const TensorObject& input__0); - - /** - * @brief Returns the 'seq_ids' tensor, throws a `std::runtime_error` if it does not exist - * - * @param name - * @return const TensorObject - * @throws std::runtime_error If no tensor named "seq_ids" exists - */ - const TensorObject get_seq_ids() const; - - /** - * @brief Sets a tensor named 'seq_ids' - * - * @param seq_ids - */ - void set_seq_ids(const TensorObject& seq_ids); -}; - -/****** MultiInferenceFILMessageInterfaceProxy *************************/ -/** - * @brief Interface proxy, used to insulate python bindings. - */ -struct MORPHEUS_EXPORT MultiInferenceFILMessageInterfaceProxy : public MultiInferenceMessageInterfaceProxy -{ - /** - * @brief Create and initialize a MultiInferenceFILMessage, and return a shared pointer to the result - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Holds the generic tensor data in cupy arrays that will be used for inference stages - * @param offset Message offset in inference memory object - * @param count Message count in inference memory object - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - * @return std::shared_ptr - */ - static std::shared_ptr init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name); - - /** - * @brief Get 'input__0' tensor as a python object - * - * @param self - * @return pybind11::object - * @throws pybind11::attribute_error When no tensor named "input__0" exists. - */ - static pybind11::object input__0(MultiInferenceFILMessage& self); - - /** - * @brief Get 'seq_ids' tensor as a python object - * - * @param self - * @return pybind11::object - * @throws pybind11::attribute_error When no tensor named "seq_ids" exists. - */ - static pybind11::object seq_ids(MultiInferenceFILMessage& self); -}; -/** @} */ // end of group -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_inference_nlp.hpp b/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_inference_nlp.hpp deleted file mode 100644 index c4936f9d68..0000000000 --- a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_inference_nlp.hpp +++ /dev/null @@ -1,175 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#pragma once - -#include "morpheus/export.h" -#include "morpheus/messages/memory/tensor_memory.hpp" -#include "morpheus/messages/meta.hpp" // for MessageMeta -#include "morpheus/messages/multi.hpp" -#include "morpheus/messages/multi_inference.hpp" -#include "morpheus/objects/tensor_object.hpp" -#include "morpheus/types.hpp" // for TensorIndex - -#include // for object - -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -/****** MultiInferenceNLPMessage****************************************/ - -/** - * @addtogroup messages - * @{ - * @file - */ - -/** - * A stronger typed version of `MultiInferenceMessage` that is used for NLP workloads. Helps ensure the - * proper inputs are set and eases debugging. - * - */ -class MORPHEUS_EXPORT MultiInferenceNLPMessage - : public DerivedMultiMessage -{ - public: - /** - * @brief Construct a new Multi Inference NLP Message object - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Holds the generic tensor data in cupy arrays that will be used for inference stages - * @param offset Message offset in inference memory object - * @param count Message count in inference memory object - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - */ - MultiInferenceNLPMessage(std::shared_ptr meta, - TensorIndex mess_offset = 0, - TensorIndex mess_count = -1, - std::shared_ptr memory = nullptr, - TensorIndex offset = 0, - TensorIndex count = -1, - std::string id_tensor_name = "seq_ids"); - - /** - * @brief Returns the 'input_ids' tensor, throws a `std::runtime_error` if it does not exist. - * - * @param name - * @return const TensorObject - * @throws std::runtime_error If no tensor named "input_ids" exists - */ - const TensorObject get_input_ids() const; - - /** - * @brief Sets a tensor named 'input_ids'. - * - * @param input_ids - */ - void set_input_ids(const TensorObject& input_ids); - - /** - * @brief Returns the 'input_mask' tensor, throws a `std::runtime_error` if it does not exist. - * - * @param name - * @return const TensorObject - * @throws std::runtime_error If no tensor named "input_mask" exists - */ - const TensorObject get_input_mask() const; - - /** - * @brief Sets a tensor named 'input_mask'. - * - * @param input_mask - */ - void set_input_mask(const TensorObject& input_mask); - - /** - * @brief Returns the 'seq_ids' tensor, throws a `std::runtime_error` if it does not exist. - * - * @param name - * @return const TensorObject - * @throws std::runtime_error If no tensor named "seq_ids" exists - */ - const TensorObject get_seq_ids() const; - - /** - * @brief Sets a tensor named 'seq_ids'. - * - * @param seq_ids - */ - void set_seq_ids(const TensorObject& seq_ids); -}; - -/****** MultiInferenceNLPMessageInterfaceProxy *************************/ -/** - * @brief Interface proxy, used to insulate python bindings. - */ -struct MORPHEUS_EXPORT MultiInferenceNLPMessageInterfaceProxy : public MultiInferenceMessageInterfaceProxy -{ - /** - * @brief Create and initialize a MultiInferenceNLPMessage, and return a shared pointer to the result - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Holds the generic tensor data in cupy arrays that will be used for inference stages - * @param offset Message offset in inference memory object - * @param count Message count in inference memory object - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - * @return std::shared_ptr - */ - static std::shared_ptr init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name); - - /** - * @brief Get 'input_ids' tensor as a python object - * - * @param self - * @return pybind11::object - * @throws pybind11::attribute_error When no tensor named "input_ids" exists. - */ - static pybind11::object input_ids(MultiInferenceNLPMessage& self); - - /** - * @brief Get 'input_mask' tensor as a python object - * - * @param self - * @return pybind11::object - * @throws pybind11::attribute_error When no tensor named "input_mask" exists. - */ - static pybind11::object input_mask(MultiInferenceNLPMessage& self); - - /** - * @brief Get 'seq_ids' tensor as a python object - * - * @param self - * @return pybind11::object - * @throws pybind11::attribute_error When no tensor named "seq_ids" exists. - */ - static pybind11::object seq_ids(MultiInferenceNLPMessage& self); -}; -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_response.hpp b/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_response.hpp deleted file mode 100644 index b3234e2408..0000000000 --- a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_response.hpp +++ /dev/null @@ -1,180 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#pragma once - -#include "morpheus/export.h" -#include "morpheus/messages/memory/tensor_memory.hpp" -#include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi.hpp" -#include "morpheus/messages/multi_tensor.hpp" -#include "morpheus/objects/tensor_object.hpp" -#include "morpheus/types.hpp" // for TensorIndex - -#include - -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -/****** MultiResponseMessage****************************************/ - -/** - * @addtogroup messages - * @{ - * @file - */ - -/** - * This class is used to get or set the inference output from message containers derived - * from ResponseMemory. - * - */ - -class MORPHEUS_EXPORT MultiResponseMessage : public DerivedMultiMessage -{ - public: - /** - * @brief Default copy constructor - */ - MultiResponseMessage(const MultiResponseMessage& other) = default; - - /** - * @brief Construct a new Multi Response Message object - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Shared pointer of a tensor memory - * @param offset Message offset in inference memory instance - * @param count Message count in inference memory instance - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - * @param probs_tensor_name Name of the tensor that holds output probabilities - */ - MultiResponseMessage(std::shared_ptr meta, - TensorIndex mess_offset = 0, - TensorIndex mess_count = -1, - std::shared_ptr memory = nullptr, - TensorIndex offset = 0, - TensorIndex count = -1, - std::string id_tensor_name = "seq_ids", - std::string probs_tensor_name = "probs"); - - std::string probs_tensor_name; - - /** - * @brief Returns the output tensor with the given name. - * - * @param name - * @return const TensorObject - * @throws std::runtime_error If no tensor matching `name` exists - */ - const TensorObject get_output(const std::string& name) const; - - /** - * @brief Returns the output tensor with the given name. - * - * @param name - * @return TensorObject - * @throws std::runtime_error If no tensor matching `name` exists - */ - TensorObject get_output(const std::string& name); - - /** - * @brief Update the value of a given output tensor. The tensor must already exist, otherwise this will halt on a - * fatal error. - * - * @param name - * @param value - */ - void set_output(const std::string& name, const TensorObject& value); - - /** - * @brief Get the tensor that holds output probabilities. Equivalent to `get_tensor(probs_tensor_name)` - * - * @return const TensorObject - */ - TensorObject get_probs_tensor() const; -}; - -/****** MultiResponseMessageInterfaceProxy *************************/ -/** - * @brief Interface proxy, used to insulate python bindings. - */ -struct MORPHEUS_EXPORT MultiResponseMessageInterfaceProxy : public MultiTensorMessageInterfaceProxy -{ - /** - * @brief Create and initialize a MultiResponseMessage, and return a shared pointer to the result - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Shared pointer of a tensor memory - * @param offset Message offset in inference memory instance - * @param count Message count in inference memory instance - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - * @param probs_tensor_name Name of the tensor that holds output probabilities - * @return std::shared_ptr - */ - static std::shared_ptr init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name, - std::string probs_tensor_name); - - /** - * @brief Gets the `probs_tensor_name` property - * - * @param self - * @return std::string Name of `probs_tensor_name` - */ - static std::string probs_tensor_name_getter(MultiResponseMessage& self); - - /** - * @brief Sets the `probs_tensor_name` property - * - * @param self - * @param probs_tensor_name New name of `probs_tensor_name` property - */ - static void probs_tensor_name_setter(MultiResponseMessage& self, std::string probs_tensor_name); - - /** - * @brief Returns the output tensor for a given name - * - * @param self - * @param name : Tensor name - * @return pybind11::object - * @throws pybind11::key_error When no matching tensor exists. - */ - static pybind11::object get_output(MultiResponseMessage& self, const std::string& name); - - /** - * @brief Get the tensor that holds output probabilities. Equivalent to `get_tensor(probs_tensor_name)` - * - * @param self - * @return pybind11::object A cupy.ndarray object - */ - static pybind11::object get_probs_tensor(MultiResponseMessage& self); -}; -/** @} */ // end of group -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_response_probs.hpp b/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_response_probs.hpp deleted file mode 100644 index 7449a1f5ee..0000000000 --- a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_response_probs.hpp +++ /dev/null @@ -1,134 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#pragma once - -#include "morpheus/export.h" -#include "morpheus/messages/memory/tensor_memory.hpp" -#include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi.hpp" -#include "morpheus/messages/multi_response.hpp" -#include "morpheus/objects/tensor_object.hpp" -#include "morpheus/types.hpp" // for TensorIndex - -#include - -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -/****** MultiResponseProbsMessage****************************************/ - -/** - * @addtogroup messages - * @{ - * @file - */ - -/** - * A stronger typed version of `MultiResponseMessage` that is used for inference workloads that return a probability - * array. Helps ensure the proper outputs are set and eases debugging - * - */ - -class MORPHEUS_EXPORT MultiResponseProbsMessage - : public DerivedMultiMessage -{ - public: - /** - * @brief Default copy constructor - */ - MultiResponseProbsMessage(const MultiResponseProbsMessage& other) = default; - - /** - * Construct a new Multi Response Probs Message object - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Holds the inference response probabilites as a tensor - * @param offset Message offset in inference memory instance - * @param count Message count in inference memory instance - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - * @param probs_tensor_name Name of the tensor that holds output probabilities - */ - MultiResponseProbsMessage(std::shared_ptr meta, - TensorIndex mess_offset = 0, - TensorIndex mess_count = -1, - std::shared_ptr memory = nullptr, - TensorIndex offset = 0, - TensorIndex count = -1, - std::string id_tensor_name = "seq_ids", - std::string probs_tensor_name = "probs"); - - /** - * @brief Returns the `probs` (probabilities) output tensor - * - * @return const TensorObject - */ - const TensorObject get_probs() const; - - /** - * @brief Update the `probs` output tensor. Will halt on a fatal error if the `probs` output tensor does not exist. - * - * @param probs - */ - void set_probs(const TensorObject& probs); -}; - -/****** MultiResponseProbsMessageInterfaceProxy *************************/ -/** - * @brief Interface proxy, used to insulate python bindings. - */ -struct MORPHEUS_EXPORT MultiResponseProbsMessageInterfaceProxy : public MultiResponseMessageInterfaceProxy -{ - /** - * @brief Create and initialize a MultiResponseProbsMessage object, and return a shared pointer to the result - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Holds the inference response probabilites as a tensor - * @param offset Message offset in inference memory instance - * @param count Message count in inference memory instance - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - * @param probs_tensor_name Name of the tensor that holds output probabilities - * @return std::shared_ptr - */ - static std::shared_ptr init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name, - std::string probs_tensor_name); - - /** - * @brief Return the `probs` (probabilities) output tensor - * - * @param self - * @return pybind11::object - * @throws pybind11::attribute_error When no tensor named "probs" exists. - */ - static pybind11::object probs(MultiResponseProbsMessage& self); -}; -/** @} */ // end of group -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_tensor.hpp b/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_tensor.hpp deleted file mode 100644 index 67ce442163..0000000000 --- a/python/morpheus/morpheus/_lib/include/morpheus/messages/multi_tensor.hpp +++ /dev/null @@ -1,232 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#pragma once - -#include "morpheus/export.h" -#include "morpheus/messages/memory/tensor_memory.hpp" -#include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi.hpp" -#include "morpheus/objects/tensor_object.hpp" -#include "morpheus/types.hpp" // for TensorIndex, RangeType - -#include // for object - -#include -#include -#include - -namespace morpheus { - -/****** MultiTensorMessage*******************************/ - -/** - * @addtogroup messages - * @{ - * @file - */ - -/** - * Base class for MultiInferenceMessage & MultiResponseMessage - * Contains a pointer to an instance of TensorMemory along with an - * offset & count to those tensors - * - * mess_offset & mess_count refer to the range of records in meta. - * offset & count refer to the range of records in TensorMemory - * - * While TensorMemory can contain multiple tensors, it is a requirement that - * they are all of the same length and that element N in each tensor refers - * to the same record - * - */ -class MORPHEUS_EXPORT MultiTensorMessage : public DerivedMultiMessage -{ - public: - /** - * @brief Default copy constructor - */ - MultiTensorMessage(const MultiTensorMessage& other) = default; - - /** - * Construct a new Multi Tensor Message object. - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Shared pointer of a tensor memory - * @param offset Message offset in tensor memory instance - * @param count Message count in tensor memory instance - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - */ - MultiTensorMessage(std::shared_ptr meta, - TensorIndex mess_offset = 0, - TensorIndex mess_count = -1, - std::shared_ptr memory = nullptr, - TensorIndex offset = 0, - TensorIndex count = -1, - std::string id_tensor_name = "seq_ids"); - - std::shared_ptr memory; - TensorIndex offset{0}; - TensorIndex count{0}; - std::string id_tensor_name; - - /** - * @brief Returns a tensor with the given name. - * - * @param name - * @return const TensorObject - * @throws std::runtime_error If no tensor matching `name` exists - */ - const TensorObject get_tensor(const std::string& name) const; - - /** - * @brief Returns a tensor with the given name. - * - * @param name - * @return TensorObject - * @throws std::runtime_error If no tensor matching `name` exists - */ - TensorObject get_tensor(const std::string& name); - - /** - * @brief Update the value of a given tensor. The tensor must already exist, otherwise a runtime_error is thrown. - * error - * - * @param name - * @param value - * @throws std::runtime_error If no tensor matching `name` exists - */ - void set_tensor(const std::string& name, const TensorObject& value); - - /** - * @brief Get the tensor that holds message ID information. Equivalent to `get_tensor(id_tensor_name)` - * - * @return const TensorObject - */ - TensorObject get_id_tensor() const; - - protected: - void get_slice_impl(std::shared_ptr new_message, TensorIndex start, TensorIndex stop) const override; - - void copy_ranges_impl(std::shared_ptr new_message, - const std::vector& ranges, - TensorIndex num_selected_rows) const override; - - std::shared_ptr copy_input_ranges(const std::vector& ranges, - TensorIndex num_selected_rows) const; - - TensorObject get_tensor_impl(const std::string& name) const; -}; - -/****** MultiTensorMessageInterfaceProxy *************************/ -/** - * @brief Interface proxy, used to insulate python bindings. - */ -struct MORPHEUS_EXPORT MultiTensorMessageInterfaceProxy -{ - /** - * @brief Create and initialize a MultiTensorMessage, and return a shared pointer to the result - * - * @param meta Holds a data table, in practice a cudf DataFrame, with the ability to return both Python and - * C++ representations of the table - * @param mess_offset Offset into the metadata batch - * @param mess_count Messages count - * @param memory Shared pointer of a tensor memory - * @param offset Message offset in inference memory instance - * @param count Message count in inference memory instance - * @param id_tensor_name Name of the tensor that correlates tensor rows to message IDs - * @return std::shared_ptr - */ - static std::shared_ptr init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name); - - /** - * @brief Returns a shared pointer of a tensor memory object - * - * @return std::shared_ptr - */ - static std::shared_ptr memory(MultiTensorMessage& self); - - /** - * @brief Message offset in tensor memory object - * - * @param self - * @return TensorIndex - */ - static TensorIndex offset(MultiTensorMessage& self); - - /** - * @brief Messages count in tensor memory object - * - * @param self - * @return TensorIndex - */ - static TensorIndex count(MultiTensorMessage& self); - - /** - * @brief Gets the `id_tensor_name` property - * - * @param self - * @return std::string Name of `id_tensor_name` - */ - static std::string id_tensor_name_getter(MultiTensorMessage& self); - - /** - * @brief Sets the `id_tensor_name` property - * - * @param self - * @param id_tensor_name New name of `id_tensor_name` property - */ - static void id_tensor_name_setter(MultiTensorMessage& self, std::string id_tensor_name); - - /** - * @brief Returns the tensor tensor for a given name - * - * @param self - * @param name : Tensor name - * @return pybind11::object - * @throws pybind11::key_error When no matching tensor exists. - */ - static pybind11::object get_tensor(MultiTensorMessage& self, const std::string& name); - - /** - * @brief Get the tensor that holds message ID information. Equivalent to `get_tensor(id_tensor_name)` - * - * @param self - * @return pybind11::object A cupy.ndarray object - */ - static pybind11::object get_id_tensor(MultiTensorMessage& self); - - /** - * @brief Same as `get_tensor` but used when the method is being bound to a python property - * - * @param self - * @param name - * @return pybind11::object - * @throws pybind11::attribute_error When no matching tensor exists. - */ - static pybind11::object get_tensor_property(MultiTensorMessage& self, const std::string name); -}; -/** @} */ // end of group -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/include/morpheus/stages/preallocate.hpp b/python/morpheus/morpheus/_lib/include/morpheus/stages/preallocate.hpp index ff9c95ed87..f3566c934b 100644 --- a/python/morpheus/morpheus/_lib/include/morpheus/stages/preallocate.hpp +++ b/python/morpheus/morpheus/_lib/include/morpheus/stages/preallocate.hpp @@ -20,7 +20,6 @@ #include "morpheus/export.h" #include "morpheus/messages/control.hpp" #include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi.hpp" #include "morpheus/objects/dtype.hpp" // for TypeId #include diff --git a/python/morpheus/morpheus/_lib/messages/module.cpp b/python/morpheus/morpheus/_lib/messages/module.cpp index 270e52d0a5..fdc5fce73b 100644 --- a/python/morpheus/morpheus/_lib/messages/module.cpp +++ b/python/morpheus/morpheus/_lib/messages/module.cpp @@ -26,13 +26,6 @@ #include "morpheus/messages/memory/response_memory_probs.hpp" #include "morpheus/messages/memory/tensor_memory.hpp" #include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi.hpp" -#include "morpheus/messages/multi_inference.hpp" -#include "morpheus/messages/multi_inference_fil.hpp" -#include "morpheus/messages/multi_inference_nlp.hpp" -#include "morpheus/messages/multi_response.hpp" -#include "morpheus/messages/multi_response_probs.hpp" -#include "morpheus/messages/multi_tensor.hpp" #include "morpheus/messages/raw_packet.hpp" #include "morpheus/objects/data_table.hpp" #include "morpheus/objects/mutable_table_ctx_mgr.hpp" @@ -149,22 +142,6 @@ PYBIND11_MODULE(messages, _module) // Add type registrations for all our common types reg_py_type_helper(); reg_py_type_helper(); - reg_py_type_helper(); - reg_py_type_helper(); - reg_py_type_helper(); - reg_py_type_helper(); - reg_py_type_helper(); - reg_py_type_helper(); - reg_py_type_helper(); - - // EdgeConnectors for derived classes of MultiMessage to MultiMessage - register_permutations(); // Tensor Memory classes py::class_>(_module, "TensorMemory") @@ -272,131 +249,6 @@ PYBIND11_MODULE(messages, _module) py::arg("stop")) .def_static("make_from_file", &MessageMetaInterfaceProxy::init_cpp); - py::class_>(_module, "MultiMessage") - .def(py::init<>(&MultiMessageInterfaceProxy::init), - py::kw_only(), - py::arg("meta"), - py::arg("mess_offset") = 0, - py::arg("mess_count") = -1) - .def_property_readonly("meta", &MultiMessageInterfaceProxy::meta) - .def_property_readonly("mess_offset", &MultiMessageInterfaceProxy::mess_offset) - .def_property_readonly("mess_count", &MultiMessageInterfaceProxy::mess_count) - .def("get_meta_column_names", &MultiMessageInterfaceProxy::get_meta_column_names) - .def("get_meta", - static_cast(&MultiMessageInterfaceProxy::get_meta), - py::return_value_policy::move) - .def("get_meta", - static_cast(&MultiMessageInterfaceProxy::get_meta), - py::return_value_policy::move, - py::arg("columns")) - .def("get_meta", - static_cast)>( - &MultiMessageInterfaceProxy::get_meta), - py::return_value_policy::move, - py::arg("columns")) - .def("get_meta", - static_cast(&MultiMessageInterfaceProxy::get_meta), - py::return_value_policy::move, - py::arg("columns")) - .def("set_meta", &MultiMessageInterfaceProxy::set_meta, py::return_value_policy::move) - .def("get_slice", &MultiMessageInterfaceProxy::get_slice, py::return_value_policy::reference_internal) - .def("copy_ranges", - &MultiMessageInterfaceProxy::copy_ranges, - py::arg("ranges"), - py::arg("num_selected_rows") = py::none(), - py::return_value_policy::move) - .def("get_meta_list", &MultiMessageInterfaceProxy::get_meta_list, py::return_value_policy::move); - - py::class_>(_module, "MultiTensorMessage") - .def(py::init<>(&MultiTensorMessageInterfaceProxy::init), - py::kw_only(), - py::arg("meta"), - py::arg("mess_offset") = 0, - py::arg("mess_count") = -1, - py::arg("memory"), - py::arg("offset") = 0, - py::arg("count") = -1, - py::arg("id_tensor_name") = "seq_ids") - .def_property_readonly("memory", &MultiTensorMessageInterfaceProxy::memory) - .def_property_readonly("offset", &MultiTensorMessageInterfaceProxy::offset) - .def_property_readonly("count", &MultiTensorMessageInterfaceProxy::count) - .def("get_tensor", &MultiTensorMessageInterfaceProxy::get_tensor) - .def("get_id_tensor", &MultiResponseMessageInterfaceProxy::get_id_tensor); - - py::class_>( - _module, "MultiInferenceMessage") - .def(py::init<>(&MultiInferenceMessageInterfaceProxy::init), - py::kw_only(), - py::arg("meta"), - py::arg("mess_offset") = 0, - py::arg("mess_count") = -1, - py::arg("memory"), - py::arg("offset") = 0, - py::arg("count") = -1, - py::arg("id_tensor_name") = "seq_ids") - .def("get_input", &MultiInferenceMessageInterfaceProxy::get_tensor); - - py::class_>( - _module, "MultiInferenceNLPMessage") - .def(py::init<>(&MultiInferenceNLPMessageInterfaceProxy::init), - py::kw_only(), - py::arg("meta"), - py::arg("mess_offset") = 0, - py::arg("mess_count") = -1, - py::arg("memory"), - py::arg("offset") = 0, - py::arg("count") = -1, - py::arg("id_tensor_name") = "seq_ids") - .def_property_readonly("input_ids", &MultiInferenceNLPMessageInterfaceProxy::input_ids) - .def_property_readonly("input_mask", &MultiInferenceNLPMessageInterfaceProxy::input_mask) - .def_property_readonly("seq_ids", &MultiInferenceNLPMessageInterfaceProxy::seq_ids); - - py::class_>( - _module, "MultiInferenceFILMessage") - .def(py::init<>(&MultiInferenceFILMessageInterfaceProxy::init), - py::kw_only(), - py::arg("meta"), - py::arg("mess_offset") = 0, - py::arg("mess_count") = -1, - py::arg("memory"), - py::arg("offset") = 0, - py::arg("count") = -1, - py::arg("id_tensor_name") = "seq_ids") - .def_property_readonly("input__0", &MultiInferenceFILMessageInterfaceProxy::input__0) - .def_property_readonly("seq_ids", &MultiInferenceFILMessageInterfaceProxy::seq_ids); - - py::class_>(_module, - "MultiResponseMessage") - .def(py::init<>(&MultiResponseMessageInterfaceProxy::init), - py::kw_only(), - py::arg("meta"), - py::arg("mess_offset") = 0, - py::arg("mess_count") = -1, - py::arg("memory"), - py::arg("offset") = 0, - py::arg("count") = -1, - py::arg("id_tensor_name") = "seq_ids", - py::arg("probs_tensor_name") = "probs") - .def_property("probs_tensor_name", - &MultiResponseMessageInterfaceProxy::probs_tensor_name_getter, - &MultiResponseMessageInterfaceProxy::probs_tensor_name_setter) - .def("get_output", &MultiResponseMessageInterfaceProxy::get_tensor) - .def("get_probs_tensor", &MultiResponseMessageInterfaceProxy::get_probs_tensor); - - py::class_>( - _module, "MultiResponseProbsMessage") - .def(py::init<>(&MultiResponseProbsMessageInterfaceProxy::init), - py::kw_only(), - py::arg("meta"), - py::arg("mess_offset") = 0, - py::arg("mess_count") = -1, - py::arg("memory"), - py::arg("offset") = 0, - py::arg("count") = -1, - py::arg("id_tensor_name") = "seq_ids", - py::arg("probs_tensor_name") = "probs") - .def_property_readonly("probs", &MultiResponseProbsMessageInterfaceProxy::probs); - py::enum_(_module, "ControlMessageType") .value("INFERENCE", ControlMessageType::INFERENCE) .value("NONE", ControlMessageType::INFERENCE) diff --git a/python/morpheus/morpheus/_lib/src/messages/multi.cpp b/python/morpheus/morpheus/_lib/src/messages/multi.cpp deleted file mode 100644 index 6e42e839d7..0000000000 --- a/python/morpheus/morpheus/_lib/src/messages/multi.cpp +++ /dev/null @@ -1,474 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#include "morpheus/messages/multi.hpp" - -#include "morpheus/messages/meta.hpp" -#include "morpheus/objects/dtype.hpp" // for TypeId, DType -#include "morpheus/objects/table_info.hpp" -#include "morpheus/objects/tensor_object.hpp" -#include "morpheus/utilities/cudf_util.hpp" - -#include // for cudaMemcpy, cudaMemcpy2D, cudaMemcpyDeviceToDevice -#include // for column_view -#include -#include -#include -#include -#include -#include // for CHECK -#include // for MRC_CHECK_CUDA -#include // IWYU pragma: keep -#include -#include -#include - -#include // for transform -#include // for size_t -#include // for uint8_t -#include -#include // for runtime_error -#include -#include -// IWYU pragma: no_include - -namespace morpheus { - -namespace py = pybind11; -using namespace py::literals; - -/****** Component public implementations *******************/ -/****** MultiMessage****************************************/ -MultiMessage::MultiMessage(std::shared_ptr meta, TensorIndex offset, TensorIndex count) : - meta(std::move(meta)), - mess_offset(offset) -{ - if (!this->meta) - { - throw std::invalid_argument("Must define `meta` when creating MultiMessage"); - } - - // Default to using the count from the meta if it is unset - if (count == -1) - { - count = this->meta->count() - offset; - } - - this->mess_count = count; - - if (this->mess_offset < 0 || this->mess_offset >= this->meta->count()) - { - throw std::invalid_argument("Invalid message offset value"); - } - if (this->mess_count <= 0 || (this->mess_offset + this->mess_count > this->meta->count())) - { - throw std::invalid_argument("Invalid message count value"); - } -} - -std::vector MultiMessage::get_meta_column_names() const -{ - return this->meta->get_column_names(); -} - -TableInfo MultiMessage::get_meta() -{ - auto table_info = this->get_meta(std::vector{}); - - return table_info; -} - -TableInfo MultiMessage::get_meta(const std::string& col_name) -{ - auto table_view = this->get_meta(std::vector{col_name}); - - return table_view; -} - -TableInfo MultiMessage::get_meta(const std::vector& column_names) -{ - TableInfo info = this->meta->get_info(); - - TableInfo sliced_info = info.get_slice(this->mess_offset, - this->mess_offset + this->mess_count, - column_names.empty() ? info.get_column_names() : column_names); - - return sliced_info; -} - -void MultiMessage::get_slice_impl(std::shared_ptr new_message, TensorIndex start, TensorIndex stop) const -{ - // Start must be between [0, mess_count) - if (start < 0 || start >= this->mess_count) - { - throw std::out_of_range("Invalid `start` argument"); - } - - // Stop must be between (start, mess_count] - if (stop <= start or stop > this->mess_count) - { - throw std::out_of_range("Invalid `stop` argument"); - } - - new_message->mess_offset = this->mess_offset + start; - new_message->mess_count = this->mess_offset + stop - new_message->mess_offset; -} - -void MultiMessage::copy_ranges_impl(std::shared_ptr new_message, - const std::vector& ranges, - TensorIndex num_selected_rows) const -{ - new_message->mess_offset = 0; - new_message->mess_count = num_selected_rows; - new_message->meta = copy_meta_ranges(ranges); -} - -std::shared_ptr MultiMessage::copy_meta_ranges(const std::vector& ranges) const -{ - // copy ranges into a sequntial list of values - // https://github.com/rapidsai/cudf/issues/11223 - std::vector cudf_ranges; - for (const auto& p : ranges) - { - // Append the message offset to the range here - cudf_ranges.push_back(p.first + this->mess_offset); - cudf_ranges.push_back(p.second + this->mess_offset); - } - - auto table_info = this->meta->get_info(); - auto column_names = table_info.get_column_names(); - auto metadata = cudf::io::table_metadata{}; - - metadata.schema_info.reserve(column_names.size() + 1); - metadata.schema_info.emplace_back(""); - - for (auto column_name : column_names) - { - metadata.schema_info.emplace_back(column_name); - } - - auto table_view = table_info.get_view(); - auto sliced_views = cudf::slice(table_view, cudf_ranges); - cudf::io::table_with_metadata table = {cudf::concatenate(sliced_views), std::move(metadata)}; - - return MessageMeta::create_from_cpp(std::move(table), 1); -} - -void MultiMessage::set_meta(const std::string& col_name, TensorObject tensor) -{ - set_meta(std::vector{col_name}, std::vector{tensor}); -} - -void MultiMessage::set_meta(const std::vector& column_names, const std::vector& tensors) -{ - TableInfo table_meta; - try - { - table_meta = this->get_meta(column_names); - } catch (const std::runtime_error& e) - { - std::ostringstream err_msg; - err_msg << e.what() << " Ensure that the stage that needs this column has populated the '_needed_columns' " - << "attribute and that at least one stage in the current segment is using the PreallocatorMixin to " - << "ensure all needed columns have been allocated."; - throw std::runtime_error(err_msg.str()); - } - - for (std::size_t i = 0; i < tensors.size(); ++i) - { - const auto& cv = table_meta.get_column(i); - const auto table_type_id = cv.type().id(); - const auto tensor_type = DType(tensors[i].dtype()); - const auto tensor_type_id = tensor_type.cudf_type_id(); - const auto row_stride = tensors[i].stride(0); - - CHECK(tensors[i].count() == cv.size() && - (table_type_id == tensor_type_id || - (table_type_id == cudf::type_id::BOOL8 && tensor_type_id == cudf::type_id::UINT8))); - - const auto item_size = tensors[i].dtype().item_size(); - - // Dont use cv.data<>() here since that does not account for the size of each element - auto data_start = const_cast(cv.head()) + cv.offset() * item_size; - - if (row_stride == 1) - { - // column major just use cudaMemcpy - MRC_CHECK_CUDA(cudaMemcpy(data_start, tensors[i].data(), tensors[i].bytes(), cudaMemcpyDeviceToDevice)); - } - else - { - MRC_CHECK_CUDA(cudaMemcpy2D(data_start, - item_size, - tensors[i].data(), - row_stride * item_size, - item_size, - cv.size(), - cudaMemcpyDeviceToDevice)); - } - } -} - -std::vector MultiMessage::apply_offset_to_ranges(TensorIndex offset, - const std::vector& ranges) const -{ - std::vector offset_ranges(ranges.size()); - std::transform(ranges.cbegin(), ranges.cend(), offset_ranges.begin(), [offset](const RangeType range) { - return std::pair{offset + range.first, offset + range.second}; - }); - - return offset_ranges; -} - -/****** MultiMessageInterfaceProxy *************************/ -std::shared_ptr MultiMessageInterfaceProxy::init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count) -{ - return std::make_shared(std::move(meta), mess_offset, mess_count); -} - -std::shared_ptr MultiMessageInterfaceProxy::meta(const MultiMessage& self) -{ - return self.meta; -} - -TensorIndex MultiMessageInterfaceProxy::mess_offset(const MultiMessage& self) -{ - return self.mess_offset; -} - -TensorIndex MultiMessageInterfaceProxy::mess_count(const MultiMessage& self) -{ - return self.mess_count; -} - -std::vector MultiMessageInterfaceProxy::get_meta_column_names(const MultiMessage& self) -{ - return self.get_meta_column_names(); -} - -pybind11::object MultiMessageInterfaceProxy::get_meta(MultiMessage& self) -{ - // Need to release the GIL before calling `get_meta()` - pybind11::gil_scoped_release no_gil; - - // Get the column and convert to cudf - auto info = self.get_meta(); - - // Convert to a python datatable. Automatically gets the GIL - return CudfHelper::table_from_table_info(info); -} - -pybind11::object MultiMessageInterfaceProxy::get_meta(MultiMessage& self, std::string col_name) -{ - TableInfo info; - - { - // Need to release the GIL before calling `get_meta()` - pybind11::gil_scoped_release no_gil; - - // Get the column and convert to cudf - info = self.get_meta(); - } - - auto py_table = CudfHelper::table_from_table_info(info); - - // Now convert it to a series by selecting only the column - return py_table[col_name.c_str()]; -} - -pybind11::object MultiMessageInterfaceProxy::get_meta(MultiMessage& self, std::vector columns) -{ - // Need to release the GIL before calling `get_meta()` - pybind11::gil_scoped_release no_gil; - - // Get the column and convert to cudf - auto info = self.get_meta(columns); - - // Convert to a python datatable. Automatically gets the GIL - return CudfHelper::table_from_table_info(info); -} - -pybind11::object MultiMessageInterfaceProxy::get_meta(MultiMessage& self, pybind11::none none_obj) -{ - // Just offload to the overload without columns. This overload is needed to match the python interface - return MultiMessageInterfaceProxy::get_meta(self); -} - -pybind11::object MultiMessageInterfaceProxy::get_meta_list(MultiMessage& self, pybind11::object col_name) -{ - std::vector column_names; - if (!col_name.is_none()) - { - column_names.emplace_back(col_name.cast()); - } - - // Need to release the GIL before calling `get_meta()` - pybind11::gil_scoped_release no_gil; - - auto info = self.get_meta(column_names); - - // Need the GIL for the remainder - pybind11::gil_scoped_acquire gil; - - auto meta = CudfHelper::table_from_table_info(info); - - if (!col_name.is_none()) - { // needed to slice off the id column - meta = meta[col_name]; - } - - auto arrow_tbl = meta.attr("to_arrow")(); - pybind11::object py_list = arrow_tbl.attr("to_pylist")(); - - return py_list; -} - -std::tuple get_indexers(MultiMessage& self, py::object df, py::object columns) -{ - auto row_indexer = pybind11::slice( - pybind11::int_(self.mess_offset), pybind11::int_(self.mess_offset + self.mess_count), pybind11::none()); - - if (columns.is_none()) - { - columns = df.attr("columns").attr("to_list")(); - } - else if (pybind11::isinstance(columns)) - { - // Convert a single string into a list so all versions return tables, not series - pybind11::list col_list; - - col_list.append(columns); - - columns = std::move(col_list); - } - - auto column_indexer = df.attr("columns").attr("get_indexer_for")(columns); - - return std::make_tuple(row_indexer, column_indexer); -} - -void MultiMessageInterfaceProxy::set_meta(MultiMessage& self, pybind11::object columns, pybind11::object value) -{ - // Need to release the GIL before calling `get_meta()` - pybind11::gil_scoped_release no_gil; - - auto mutable_info = self.meta->get_mutable_info(); - - // Need the GIL for the remainder - pybind11::gil_scoped_acquire gil; - - auto pdf = mutable_info.checkout_obj(); - auto& df = *pdf; - - auto [row_indexer, column_indexer] = get_indexers(self, df, columns); - - // Check to see if this is adding a column. If so, we need to use .loc instead of .iloc - if (column_indexer.contains(-1)) - { - // cudf is really bad at adding new columns. Need to use loc with a unique and monotonic index - py::object saved_index = df.attr("index"); - - // Check to see if we can use slices - if (!(saved_index.attr("is_unique").cast() && (saved_index.attr("is_monotonic_increasing").cast() || - saved_index.attr("is_monotonic_decreasing").cast()))) - { - df.attr("reset_index")("drop"_a = true, "inplace"_a = true); - } - else - { - // Erase the saved index so we dont reset it - saved_index = py::none(); - } - - // Perform the update via slices - df.attr("loc")[pybind11::make_tuple(df.attr("index")[row_indexer], columns)] = value; - - // Reset the index if we changed it - if (!saved_index.is_none()) - { - df.attr("set_index")(saved_index, "inplace"_a = true); - } - } - else - { - // If we only have one column, convert it to a series (broadcasts work with more types on a series) - if (pybind11::len(column_indexer) == 1) - { - column_indexer = column_indexer.cast()[0]; - } - - try - { - // Use iloc - df.attr("iloc")[pybind11::make_tuple(row_indexer, column_indexer)] = value; - } catch (py::error_already_set) - { - // Try this as a fallback. Works better for strings. See issue #286 - df[columns].attr("iloc")[row_indexer] = value; - } - } - - mutable_info.return_obj(std::move(pdf)); -} - -std::shared_ptr MultiMessageInterfaceProxy::get_slice(MultiMessage& self, - TensorIndex start, - TensorIndex stop) -{ - if (start < 0) - { - throw std::out_of_range("Invalid message `start` argument"); - } - - if (stop < 0) - { - throw std::out_of_range("Invalid message `stop` argument"); - } - - // Need to drop the GIL before calling any methods on the C++ object - pybind11::gil_scoped_release no_gil; - - // Returns shared_ptr - return self.get_slice(start, stop); -} - -std::shared_ptr MultiMessageInterfaceProxy::copy_ranges(MultiMessage& self, - const std::vector& ranges, - pybind11::object num_selected_rows) -{ - TensorIndex num_rows = 0; - if (num_selected_rows.is_none()) - { - for (const auto& range : ranges) - { - num_rows += range.second - range.first; - } - } - else - { - num_rows = num_selected_rows.cast(); - } - - // Need to drop the GIL before calling any methods on the C++ object - pybind11::gil_scoped_release no_gil; - - return self.copy_ranges(ranges, num_rows); -} - -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/src/messages/multi_inference.cpp b/python/morpheus/morpheus/_lib/src/messages/multi_inference.cpp deleted file mode 100644 index 6664e27550..0000000000 --- a/python/morpheus/morpheus/_lib/src/messages/multi_inference.cpp +++ /dev/null @@ -1,68 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#include "morpheus/messages/multi_inference.hpp" - -#include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi.hpp" - -#include -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -/****** ****************************************/ -MultiInferenceMessage::MultiInferenceMessage(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name) : - DerivedMultiMessage(meta, mess_offset, mess_count, memory, offset, count, std::move(id_tensor_name)) -{} - -const TensorObject MultiInferenceMessage::get_input(const std::string& name) const -{ - return get_tensor(name); -} - -TensorObject MultiInferenceMessage::get_input(const std::string& name) -{ - return get_tensor(name); -} - -void MultiInferenceMessage::set_input(const std::string& name, const TensorObject& value) -{ - set_tensor(name, value); -} - -/****** InterfaceProxy *************************/ -std::shared_ptr MultiInferenceMessageInterfaceProxy::init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name) -{ - return std::make_shared( - std::move(meta), mess_offset, mess_count, std::move(memory), offset, count, std::move(id_tensor_name)); -} - -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/src/messages/multi_inference_fil.cpp b/python/morpheus/morpheus/_lib/src/messages/multi_inference_fil.cpp deleted file mode 100644 index af255778b7..0000000000 --- a/python/morpheus/morpheus/_lib/src/messages/multi_inference_fil.cpp +++ /dev/null @@ -1,87 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#include "morpheus/messages/multi_inference_fil.hpp" - -#include "morpheus/messages/memory/tensor_memory.hpp" -#include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi.hpp" -#include "morpheus/messages/multi_inference.hpp" - -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -/****** MultiInferenceFILMessage****************************************/ -MultiInferenceFILMessage::MultiInferenceFILMessage(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name) : - DerivedMultiMessage(meta, mess_offset, mess_count, memory, offset, count, std::move(id_tensor_name)) -{} - -const TensorObject MultiInferenceFILMessage::get_input__0() const -{ - return this->get_input("input__0"); -} - -void MultiInferenceFILMessage::set_input__0(const TensorObject& input__0) -{ - this->set_input("input__0", input__0); -} - -const TensorObject MultiInferenceFILMessage::get_seq_ids() const -{ - return this->get_input("seq_ids"); -} - -void MultiInferenceFILMessage::set_seq_ids(const TensorObject& seq_ids) -{ - this->set_input("seq_ids", seq_ids); -} - -/****** MultiInferenceFILMessageInterfaceProxy *************************/ -std::shared_ptr MultiInferenceFILMessageInterfaceProxy::init( - std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name) -{ - return std::make_shared( - std::move(meta), mess_offset, mess_count, std::move(memory), offset, count, std::move(id_tensor_name)); -} - -pybind11::object MultiInferenceFILMessageInterfaceProxy::input__0(MultiInferenceFILMessage& self) -{ - return get_tensor_property(self, "input__0"); -} - -pybind11::object MultiInferenceFILMessageInterfaceProxy::seq_ids(MultiInferenceFILMessage& self) -{ - return get_tensor_property(self, "seq_ids"); -} - -} // namespace morpheus -// Created by drobison on 3/17/22. -// diff --git a/python/morpheus/morpheus/_lib/src/messages/multi_inference_nlp.cpp b/python/morpheus/morpheus/_lib/src/messages/multi_inference_nlp.cpp deleted file mode 100644 index 0c0e85d560..0000000000 --- a/python/morpheus/morpheus/_lib/src/messages/multi_inference_nlp.cpp +++ /dev/null @@ -1,100 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#include "morpheus/messages/multi_inference_nlp.hpp" - -#include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi_inference.hpp" - -#include - -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -/****** MultiInferenceNLPMessage****************************************/ -MultiInferenceNLPMessage::MultiInferenceNLPMessage(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name) : - DerivedMultiMessage(meta, mess_offset, mess_count, memory, offset, count, std::move(id_tensor_name)) -{} - -const TensorObject MultiInferenceNLPMessage::get_input_ids() const -{ - return this->get_input("input_ids"); -} - -void MultiInferenceNLPMessage::set_input_ids(const TensorObject& input_ids) -{ - this->set_input("input_ids", input_ids); -} - -const TensorObject MultiInferenceNLPMessage::get_input_mask() const -{ - return this->get_input("input_mask"); -} - -void MultiInferenceNLPMessage::set_input_mask(const TensorObject& input_mask) -{ - this->set_input("input_mask", input_mask); -} - -const TensorObject MultiInferenceNLPMessage::get_seq_ids() const -{ - return this->get_input("seq_ids"); -} - -void MultiInferenceNLPMessage::set_seq_ids(const TensorObject& seq_ids) -{ - this->set_input("seq_ids", seq_ids); -} - -/****** MultiInferenceNLPMessageInterfaceProxy *************************/ -std::shared_ptr MultiInferenceNLPMessageInterfaceProxy::init( - std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name) -{ - return std::make_shared( - std::move(meta), mess_offset, mess_count, std::move(memory), offset, count, std::move(id_tensor_name)); -} - -pybind11::object MultiInferenceNLPMessageInterfaceProxy::input_ids(MultiInferenceNLPMessage& self) -{ - return get_tensor_property(self, "input_ids"); -} - -pybind11::object MultiInferenceNLPMessageInterfaceProxy::input_mask(MultiInferenceNLPMessage& self) -{ - return get_tensor_property(self, "input_mask"); -} - -pybind11::object MultiInferenceNLPMessageInterfaceProxy::seq_ids(MultiInferenceNLPMessage& self) -{ - return get_tensor_property(self, "seq_ids"); -} - -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/src/messages/multi_response.cpp b/python/morpheus/morpheus/_lib/src/messages/multi_response.cpp deleted file mode 100644 index bb1f7205c7..0000000000 --- a/python/morpheus/morpheus/_lib/src/messages/multi_response.cpp +++ /dev/null @@ -1,111 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#include "morpheus/messages/multi_response.hpp" - -#include "morpheus/messages/meta.hpp" -#include "morpheus/messages/multi.hpp" -#include "morpheus/objects/tensor_object.hpp" -#include "morpheus/utilities/cupy_util.hpp" -#include "morpheus/utilities/string_util.hpp" - -#include -#include -#include -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -MultiResponseMessage::MultiResponseMessage(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name, - std::string probs_tensor_name) : - DerivedMultiMessage(meta, mess_offset, mess_count, memory, offset, count, std::move(id_tensor_name)), - probs_tensor_name(std::move(probs_tensor_name)) -{} - -const TensorObject MultiResponseMessage::get_output(const std::string& name) const -{ - return get_tensor(name); -} - -TensorObject MultiResponseMessage::get_output(const std::string& name) -{ - return get_tensor(name); -} - -void MultiResponseMessage::set_output(const std::string& name, const TensorObject& value) -{ - set_tensor(name, value); -} - -TensorObject MultiResponseMessage::get_probs_tensor() const -{ - try - { - return this->get_tensor(this->probs_tensor_name); - } catch (std::runtime_error) - { - // Throw a better error here if we are missing the ID tensor - throw pybind11::key_error{MORPHEUS_CONCAT_STR("Cannot get probabilities tensor. Tensor with name '" - << this->probs_tensor_name - << "' does not exist in the memory object")}; - } -} - -/****** MultiResponseMessageInterfaceProxy *************************/ -std::shared_ptr MultiResponseMessageInterfaceProxy::init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name, - std::string probs_tensor_name) -{ - return std::make_shared(std::move(meta), - mess_offset, - mess_count, - std::move(memory), - offset, - count, - std::move(id_tensor_name), - std::move(probs_tensor_name)); -} - -std::string MultiResponseMessageInterfaceProxy::probs_tensor_name_getter(MultiResponseMessage& self) -{ - return self.probs_tensor_name; -} - -void MultiResponseMessageInterfaceProxy::probs_tensor_name_setter(MultiResponseMessage& self, - std::string probs_tensor_name) -{ - self.probs_tensor_name = probs_tensor_name; -} - -pybind11::object MultiResponseMessageInterfaceProxy::get_probs_tensor(MultiResponseMessage& self) -{ - return CupyUtil::tensor_to_cupy(self.get_probs_tensor()); -} - -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/src/messages/multi_response_probs.cpp b/python/morpheus/morpheus/_lib/src/messages/multi_response_probs.cpp deleted file mode 100644 index 4b643ceb94..0000000000 --- a/python/morpheus/morpheus/_lib/src/messages/multi_response_probs.cpp +++ /dev/null @@ -1,81 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#include "morpheus/messages/multi_response_probs.hpp" - -#include "morpheus/messages/meta.hpp" - -#include - -#include -#include - -namespace morpheus { -/****** Component public implementations *******************/ -/****** MultiResponseProbsMessage****************************************/ -MultiResponseProbsMessage::MultiResponseProbsMessage(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name, - std::string probs_tensor_name) : - DerivedMultiMessage( - meta, mess_offset, mess_count, memory, offset, count, std::move(id_tensor_name), std::move(probs_tensor_name)) -{} - -const TensorObject MultiResponseProbsMessage::get_probs() const -{ - return this->get_output("probs"); -} - -void MultiResponseProbsMessage::set_probs(const TensorObject& probs) -{ - this->set_output("probs", probs); -} - -/****** MultiResponseProbsMessageInterfaceProxy *************************/ -/** - * @brief Interface proxy, used to insulate python bindings. - */ -std::shared_ptr MultiResponseProbsMessageInterfaceProxy::init( - std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name, - std::string probs_tensor_name) -{ - return std::make_shared(std::move(meta), - mess_offset, - mess_count, - std::move(memory), - offset, - count, - std::move(id_tensor_name), - std::move(probs_tensor_name)); -} - -pybind11::object MultiResponseProbsMessageInterfaceProxy::probs(MultiResponseProbsMessage& self) -{ - return get_tensor_property(self, "probs"); -} - -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/src/messages/multi_tensor.cpp b/python/morpheus/morpheus/_lib/src/messages/multi_tensor.cpp deleted file mode 100644 index 1d9db00efd..0000000000 --- a/python/morpheus/morpheus/_lib/src/messages/multi_tensor.cpp +++ /dev/null @@ -1,310 +0,0 @@ -/* - * SPDX-FileCopyrightText: Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - * SPDX-License-Identifier: Apache-2.0 - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -#include "morpheus/messages/multi_tensor.hpp" - -#include "morpheus/objects/dtype.hpp" -#include "morpheus/types.hpp" // for TensorIndex, TensorMap -#include "morpheus/utilities/cupy_util.hpp" // for CupyUtil::tensor_to_cupy -#include "morpheus/utilities/string_util.hpp" - -#include // IWYU pragma: keep -#include // for MRC_PTR_CAST -#include // for key_error - -#include -#include -#include // for runtime_error -#include // for move - -namespace { -// MatX works best with C-Style arrays so ignore this warning -// NOLINTNEXTLINE(modernize-avoid-c-arrays) -using namespace morpheus; -TensorIndex read_idx_from_tensor(const TensorObject& tensor, const TensorIndex (&idx)[2]) -{ - switch (tensor.dtype().type_id()) - { - case TypeId::INT8: - return tensor.read_element(idx); - case TypeId::INT16: - return tensor.read_element(idx); - case TypeId::INT32: - return tensor.read_element(idx); - case TypeId::INT64: - return tensor.read_element(idx); - case TypeId::UINT8: - return tensor.read_element(idx); - case TypeId::UINT16: - return tensor.read_element(idx); - case TypeId::UINT32: - return tensor.read_element(idx); - case TypeId::UINT64: - return tensor.read_element(idx); - default: - CHECK(false) << "Unsupported index type" << tensor.dtype().type_str(); - return -1; - } -} -} // namespace - -namespace morpheus { - -/****** Component public implementations *******************/ -/****** ****************************************/ -MultiTensorMessage::MultiTensorMessage(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name) : - DerivedMultiMessage(meta, mess_offset, mess_count), - memory(std::move(memory)), - offset(offset), - id_tensor_name(std::move(id_tensor_name)) -{ - if (!this->memory) - { - throw std::invalid_argument("Must define `memory` when creating MultiTensorMessage"); - } - - // Default to using the count from the meta if it is unset - if (count == -1) - { - count = this->memory->count - offset; - } - - this->count = count; - - if (this->offset < 0 || this->offset >= this->memory->count) - { - throw std::invalid_argument("Invalid offset value"); - } - if (this->count <= 0 || (this->offset + this->count > this->memory->count)) - { - throw std::invalid_argument("Invalid count value"); - } - if (this->count < this->mess_count) - { - throw std::invalid_argument("Invalid count value. Must have a count greater than or equal to mess_count"); - } - - // Finally, perform a consistency check on the seq_ids - if (this->memory->has_tensor(this->id_tensor_name)) - { - auto id_tensor = this->memory->get_tensor(this->id_tensor_name); - - TensorIndex first_element = read_idx_from_tensor(id_tensor, {this->offset, 0}); - TensorIndex last_element = read_idx_from_tensor(id_tensor, {this->offset + this->count - 1, 0}); - - if (first_element != this->mess_offset) - { - throw std::runtime_error(MORPHEUS_CONCAT_STR("Inconsistent ID column. First element in '" - << this->id_tensor_name << "' tensor, [" << first_element - << "], must match mess_offset, [" << this->mess_offset - << "]")); - } - - if (last_element != this->mess_offset + this->mess_count - 1) - { - throw std::runtime_error(MORPHEUS_CONCAT_STR("Inconsistent ID column. Last element in '" - << this->id_tensor_name << "' tensor, [" << last_element - << "], must not extend beyond last message, [" - << (this->mess_offset + this->mess_count - 1) << "]")); - } - } -} - -const TensorObject MultiTensorMessage::get_tensor(const std::string& name) const -{ - return get_tensor_impl(name); -} - -TensorObject MultiTensorMessage::get_tensor(const std::string& name) -{ - return get_tensor_impl(name); -} - -TensorObject MultiTensorMessage::get_tensor_impl(const std::string& name) const -{ - auto& tensor = this->memory->get_tensor(name); - - // check if we are getting the entire input - if (this->offset == 0 && this->count == this->memory->count) - { - return tensor; - } - - return tensor.slice({this->offset, 0}, {this->offset + this->count, -1}); -} - -void MultiTensorMessage::set_tensor(const std::string& name, const TensorObject& value) -{ - // Get the input slice first - auto slice = this->get_tensor(name); - - // Set the value to use assignment - slice = value; -} - -TensorObject MultiTensorMessage::get_id_tensor() const -{ - try - { - return this->get_tensor(this->id_tensor_name); - } catch (std::runtime_error) - { - // Throw a better error here if we are missing the ID tensor - throw pybind11::key_error{MORPHEUS_CONCAT_STR("Cannot get ID tensor. Tensor with name '" - << this->id_tensor_name - << "' does not exist in the memory object")}; - } -} - -void MultiTensorMessage::get_slice_impl(std::shared_ptr new_message, - TensorIndex start, - TensorIndex stop) const -{ - auto sliced_message = MRC_PTR_CAST(MultiTensorMessage, new_message); - - // Start must be between [0, mess_count) - if (start < 0 || start >= this->count) - { - throw std::out_of_range("Invalid memory `start` argument"); - } - - // Stop must be between (start, mess_count] - if (stop <= start || stop > this->count) - { - throw std::out_of_range("Invalid memory `stop` argument"); - } - - sliced_message->memory = this->memory; - sliced_message->offset = this->offset + start; - sliced_message->count = stop - start; - sliced_message->id_tensor_name = this->id_tensor_name; - - if (this->count != this->mess_count) - { - // If we have more tensor rows than message rows, we need to use the seq_ids to figure out the slicing. This - // will be slow and should be avoided at all costs - if (!this->memory->has_tensor(this->id_tensor_name)) - { - throw std::runtime_error( - "The tensor memory object is missing the required ID tensor 'seq_ids' this tensor is required to make " - "slices of MultiTensorMessages"); - } - - auto id_tensor = this->get_id_tensor(); - - // Determine the new start and stop before passing onto the base - start = read_idx_from_tensor(id_tensor, {start, 0}) - this->mess_offset; - stop = read_idx_from_tensor(id_tensor, {stop - 1, 0}) + 1 - this->mess_offset; - } - - // Pass onto the base - DerivedMultiMessage::get_slice_impl(new_message, start, stop); -} - -void MultiTensorMessage::copy_ranges_impl(std::shared_ptr new_message, - const std::vector& ranges, - TensorIndex num_selected_rows) const -{ - auto copied_message = MRC_PTR_CAST(MultiTensorMessage, new_message); - DerivedMultiMessage::copy_ranges_impl(copied_message, ranges, num_selected_rows); - - copied_message->offset = 0; - copied_message->count = num_selected_rows; - copied_message->memory = copy_input_ranges(ranges, num_selected_rows); -} - -std::shared_ptr MultiTensorMessage::copy_input_ranges(const std::vector& ranges, - TensorIndex num_selected_rows) const -{ - auto offset_ranges = apply_offset_to_ranges(offset, ranges); - auto tensors = memory->copy_tensor_ranges(offset_ranges, num_selected_rows); - return std::make_shared(num_selected_rows, std::move(tensors)); -} - -/****** MultiTensorMessageInterfaceProxy *************************/ -std::shared_ptr MultiTensorMessageInterfaceProxy::init(std::shared_ptr meta, - TensorIndex mess_offset, - TensorIndex mess_count, - std::shared_ptr memory, - TensorIndex offset, - TensorIndex count, - std::string id_tensor_name) -{ - return std::make_shared( - std::move(meta), mess_offset, mess_count, std::move(memory), offset, count, std::move(id_tensor_name)); -} - -std::shared_ptr MultiTensorMessageInterfaceProxy::memory(MultiTensorMessage& self) -{ - return MRC_PTR_CAST(morpheus::TensorMemory, self.memory); -} - -TensorIndex MultiTensorMessageInterfaceProxy::offset(MultiTensorMessage& self) -{ - return self.offset; -} - -TensorIndex MultiTensorMessageInterfaceProxy::count(MultiTensorMessage& self) -{ - return self.count; -} - -std::string MultiTensorMessageInterfaceProxy::id_tensor_name_getter(MultiTensorMessage& self) -{ - return self.id_tensor_name; -} - -void MultiTensorMessageInterfaceProxy::id_tensor_name_setter(MultiTensorMessage& self, std::string id_tensor_name) -{ - self.id_tensor_name = id_tensor_name; -} - -pybind11::object MultiTensorMessageInterfaceProxy::get_tensor(MultiTensorMessage& self, const std::string& name) -{ - try - { - auto tensor = self.get_tensor(name); - return CupyUtil::tensor_to_cupy(tensor); - } catch (const std::runtime_error& e) - { - throw pybind11::key_error{e.what()}; - } -} - -pybind11::object MultiTensorMessageInterfaceProxy::get_id_tensor(MultiTensorMessage& self) -{ - return CupyUtil::tensor_to_cupy(self.get_id_tensor()); -} - -pybind11::object MultiTensorMessageInterfaceProxy::get_tensor_property(MultiTensorMessage& self, const std::string name) -{ - try - { - return get_tensor(self, std::move(name)); - } catch (const pybind11::key_error& e) - { - throw pybind11::attribute_error{e.what()}; - } -} - -} // namespace morpheus diff --git a/python/morpheus/morpheus/_lib/src/stages/inference_client_stage.cpp b/python/morpheus/morpheus/_lib/src/stages/inference_client_stage.cpp index efa1b62df1..c5baa2fa25 100644 --- a/python/morpheus/morpheus/_lib/src/stages/inference_client_stage.cpp +++ b/python/morpheus/morpheus/_lib/src/stages/inference_client_stage.cpp @@ -17,19 +17,16 @@ #include "morpheus/stages/inference_client_stage.hpp" -#include "morpheus/messages/control.hpp" // for ControlMessage -#include "morpheus/messages/memory/response_memory.hpp" // for ResponseMemory -#include "morpheus/messages/memory/tensor_memory.hpp" // for TensorMemory -#include "morpheus/messages/meta.hpp" // for MessageMeta -#include "morpheus/messages/multi_inference.hpp" // for MultiInferenceMessage -#include "morpheus/messages/multi_response.hpp" // for MultiResponseMessage -#include "morpheus/objects/data_table.hpp" // for morpheus -#include "morpheus/objects/dev_mem_info.hpp" // for DevMemInfo -#include "morpheus/objects/dtype.hpp" // for DType -#include "morpheus/objects/tensor.hpp" // for Tensor -#include "morpheus/objects/tensor_object.hpp" // for TensorObject -#include "morpheus/stages/triton_inference.hpp" // for HttpTritonClient, TritonInferenceClient -#include "morpheus/utilities/matx_util.hpp" // for MatxUtil +#include "morpheus/messages/control.hpp" // for ControlMessage +#include "morpheus/messages/memory/tensor_memory.hpp" // for TensorMemory +#include "morpheus/messages/meta.hpp" // for MessageMeta +#include "morpheus/objects/data_table.hpp" // for morpheus +#include "morpheus/objects/dev_mem_info.hpp" // for DevMemInfo +#include "morpheus/objects/dtype.hpp" // for DType +#include "morpheus/objects/tensor.hpp" // for Tensor +#include "morpheus/objects/tensor_object.hpp" // for TensorObject +#include "morpheus/stages/triton_inference.hpp" // for HttpTritonClient, TritonInferenceClient +#include "morpheus/utilities/matx_util.hpp" // for MatxUtil #include // for launch #include // for cudaMemcpy2D, cudaMemcpyKind @@ -175,16 +172,6 @@ struct ExponentialBackoff } }; -static std::shared_ptr make_response(std::shared_ptr message, - TensorMap&& output_tensor_map) -{ - // Final output of all mini-batches - auto response_mem = std::make_shared(message->mess_count, std::move(output_tensor_map)); - - return std::make_shared( - message->meta, message->mess_offset, message->mess_count, std::move(response_mem), 0, response_mem->count); -} - static std::shared_ptr make_response(std::shared_ptr message, TensorMap&& output_tensor_map) { diff --git a/python/morpheus/morpheus/_lib/tests/messages/test_dev_doc_ex3.cpp b/python/morpheus/morpheus/_lib/tests/messages/test_dev_doc_ex3.cpp index 397cd21c26..780ad48b37 100644 --- a/python/morpheus/morpheus/_lib/tests/messages/test_dev_doc_ex3.cpp +++ b/python/morpheus/morpheus/_lib/tests/messages/test_dev_doc_ex3.cpp @@ -17,8 +17,8 @@ #include "../test_utils/common.hpp" // IWYU pragma: associated +#include "morpheus/messages/control.hpp" // for ControlMessage #include "morpheus/messages/meta.hpp" // for MessageMeta -#include "morpheus/messages/multi.hpp" // for MultiMessage #include "morpheus/objects/table_info.hpp" // for MutableTableInfo #include "morpheus/utilities/cudf_util.hpp" // for CudfHelper @@ -57,8 +57,8 @@ TEST_F(TestDevDocEx3, TestPyObjFromMultiMesg) using namespace pybind11::literals; pybind11::gil_scoped_release no_gil; - auto doc_fn = [](std::shared_ptr msg) { - auto mutable_info = msg->meta->get_mutable_info(); + auto doc_fn = [](std::shared_ptr msg) { + auto mutable_info = msg->payload()->get_mutable_info(); std::shared_ptr new_meta; { @@ -79,7 +79,8 @@ TEST_F(TestDevDocEx3, TestPyObjFromMultiMesg) }; auto msg_meta = create_mock_msg_meta({"col1", "col2", "col3"}, {"int32", "float32", "string"}, 5); - auto msg = std::make_shared(msg_meta); + auto msg = std::make_shared(); + msg->payload(msg_meta); auto result = doc_fn(msg); diff --git a/python/morpheus/morpheus/_lib/tests/stages/test_add_classification.cpp b/python/morpheus/morpheus/_lib/tests/stages/test_add_classification.cpp index 62b13d0f2d..95be858d87 100644 --- a/python/morpheus/morpheus/_lib/tests/stages/test_add_classification.cpp +++ b/python/morpheus/morpheus/_lib/tests/stages/test_add_classification.cpp @@ -61,7 +61,7 @@ auto convert_to_host(rmm::device_buffer& buffer) return host_buffer; } -TEST_F(TestAddClassification, TestProcessControlMessageAndMultiResponseMessage) +TEST_F(TestAddClassification, TestProcessControlMessage) { pybind11::gil_scoped_release no_gil; auto test_data_dir = test::get_morpheus_root() / "tests/tests_data"; @@ -90,12 +90,9 @@ TEST_F(TestAddClassification, TestProcessControlMessageAndMultiResponseMessage) cudf::io::csv_reader_options read_opts = cudf::io::csv_reader_options::builder(cudf::io::source_info(input_file)) .dtypes({cudf::data_type(cudf::data_type{cudf::type_to_id()})}) .header(0); - auto meta_mm = MessageMeta::create_from_cpp(cudf::io::read_csv(read_opts)); std::map idx2label = {{0, "bool"}}; - // Create a separate dataframe from a file (otherwise they will overwrite - // eachother) auto meta_cm = MessageMeta::create_from_cpp(cudf::io::read_csv(read_opts)); // Create ControlMessage diff --git a/python/morpheus/morpheus/_lib/tests/stages/test_add_scores.cpp b/python/morpheus/morpheus/_lib/tests/stages/test_add_scores.cpp index 386a16e569..06cc4f55fe 100644 --- a/python/morpheus/morpheus/_lib/tests/stages/test_add_scores.cpp +++ b/python/morpheus/morpheus/_lib/tests/stages/test_add_scores.cpp @@ -49,7 +49,7 @@ using namespace morpheus; TEST_CLASS_WITH_PYTHON(AddScores); -TEST_F(TestAddScores, TestProcessControlMessageAndMultiResponseMessage) +TEST_F(TestAddScores, TestProcessControlMessage) { pybind11::gil_scoped_release no_gil; auto test_data_dir = test::get_morpheus_root() / "tests/tests_data"; @@ -70,14 +70,8 @@ TEST_F(TestAddScores, TestProcessControlMessageAndMultiResponseMessage) auto packed_data = std::make_shared( packed_data_host.data(), cols_size * mess_count * sizeof(double), rmm::cuda_stream_per_thread); - // Create a dataframe from a file - auto meta_mm = MessageMeta::create_from_cpp(load_table_from_file(input_file)); - preallocate(meta_mm, {{"colA", TypeId::FLOAT64}, {"colB", TypeId::FLOAT64}}); - std::map idx2label = {{0, "colA"}, {1, "colB"}}; - // Create a separate dataframe from a file (otherwise they will overwrite - // eachother) auto meta_cm = MessageMeta::create_from_cpp(load_table_from_file(input_file)); preallocate(meta_cm, {{"colA", TypeId::FLOAT64}, {"colB", TypeId::FLOAT64}}); diff --git a/python/morpheus/morpheus/_lib/tests/stages/test_triton_inference_stage.cpp b/python/morpheus/morpheus/_lib/tests/stages/test_triton_inference_stage.cpp index 53346e74cc..cbefd8355e 100644 --- a/python/morpheus/morpheus/_lib/tests/stages/test_triton_inference_stage.cpp +++ b/python/morpheus/morpheus/_lib/tests/stages/test_triton_inference_stage.cpp @@ -426,7 +426,7 @@ TEST_F(TestTritonInferenceStage, ForceConvert) auto tensors = TensorMap(); tensors["seq_ids"].swap(Tensor::create(seq_ids_buffer, dtype, {count, 3}, {})); - // create the MultiInferenceMessage using the sequence id tensor. + // create the ControlMessage using the sequence id tensor. auto memory = std::make_shared(count, std::move(tensors)); auto table = create_test_table_with_metadata(count); auto meta = morpheus::MessageMeta::create_from_cpp(std::move(table), 1); diff --git a/python/morpheus/morpheus/controllers/mlflow_model_writer_controller.py b/python/morpheus/morpheus/controllers/mlflow_model_writer_controller.py index 582c98b5ad..2f81401e94 100644 --- a/python/morpheus/morpheus/controllers/mlflow_model_writer_controller.py +++ b/python/morpheus/morpheus/controllers/mlflow_model_writer_controller.py @@ -33,7 +33,7 @@ import cudf -from morpheus.messages.multi_ae_message import MultiAEMessage +from morpheus.messages import ControlMessage from morpheus.models.dfencoder import AutoEncoder logger = logging.getLogger(__name__) @@ -203,24 +203,24 @@ def _apply_model_permissions(self, reg_model_name: str): reg_model_name, exc_info=True) - def on_data(self, message: MultiAEMessage): + def on_data(self, message: ControlMessage) -> ControlMessage: """ Stores incoming models into MLflow. Parameters ---------- - message : MultiAEMessage + message : ControlMessage The incoming message containing the model and related metadata. Returns ------- - MultiAEMessage + ControlMessage The processed message. """ - user = message.meta.user_id + user = message.get_metadata("user_id") - model: AutoEncoder = message.model + model: AutoEncoder = message.get_metadata("model") model_path = "dfencoder" reg_model_name = self.user_id_to_model(user_id=user) @@ -245,9 +245,9 @@ def on_data(self, message: MultiAEMessage): "Epochs": model.learning_rate_decay.state_dict().get("last_epoch", "unknown"), "Learning rate": model.learning_rate, "Batch size": model.batch_size, - "Start Epoch": message.get_meta(self._timestamp_column_name).min(), - "End Epoch": message.get_meta(self._timestamp_column_name).max(), - "Log Count": message.mess_count, + "Start Epoch": message.payload().get_data(self._timestamp_column_name).min(), + "End Epoch": message.payload().get_data(self._timestamp_column_name).max(), + "Log Count": message.payload().count, }) metrics_dict: typing.Dict[str, float] = {} @@ -266,7 +266,7 @@ def on_data(self, message: MultiAEMessage): # Use the prepare_df function to setup the direct inputs to the model. Only include features returned by # prepare_df to show the actual inputs to the model (any extra are discarded) - input_df = message.get_meta().iloc[0:1] + input_df = message.payload().get_data().iloc[0:1] if isinstance(input_df, cudf.DataFrame): input_df = input_df.to_pandas() @@ -309,9 +309,9 @@ def on_data(self, message: MultiAEMessage): model_src = RunsArtifactRepository.get_underlying_uri(model_info.model_uri) tags = { - "start": message.get_meta(self._timestamp_column_name).min(), - "end": message.get_meta(self._timestamp_column_name).max(), - "count": message.get_meta(self._timestamp_column_name).count() + "start": message.payload().get_data(self._timestamp_column_name).min(), + "end": message.payload().get_data(self._timestamp_column_name).max(), + "count": message.payload().get_data(self._timestamp_column_name).count() } # Now create the model version diff --git a/python/morpheus/morpheus/messages/__init__.py b/python/morpheus/morpheus/messages/__init__.py index a0e4e92953..867c41fefc 100644 --- a/python/morpheus/morpheus/messages/__init__.py +++ b/python/morpheus/morpheus/messages/__init__.py @@ -31,7 +31,6 @@ from morpheus.messages.memory.response_memory import ResponseMemoryProbs from morpheus.messages.message_base import MessageBase from morpheus.messages.message_meta import MessageMeta -from morpheus.messages.multi_message import MultiMessage from morpheus.messages.message_meta import UserMessageMeta __all__ = [ @@ -43,7 +42,6 @@ "InferenceMemoryNLP", "MessageBase", "MessageMeta", - "MultiMessage", "RawPacketMessage", "ResponseMemory", "ResponseMemoryAE", diff --git a/python/morpheus/morpheus/messages/message_base.py b/python/morpheus/morpheus/messages/message_base.py index 465584e4c5..b858d2f607 100644 --- a/python/morpheus/morpheus/messages/message_base.py +++ b/python/morpheus/morpheus/messages/message_base.py @@ -68,7 +68,7 @@ class has an associated C++ implementation (`cpp_class`), returns the Python imp @dataclasses.dataclass class MessageData(MessageBase): """ - Base class for MultiMessage, defining serialization methods + Base class for TensorMemory, defining serialization methods """ def __getstate__(self): diff --git a/python/morpheus/morpheus/messages/multi_ae_message.py b/python/morpheus/morpheus/messages/multi_ae_message.py deleted file mode 100644 index 7f56978775..0000000000 --- a/python/morpheus/morpheus/messages/multi_ae_message.py +++ /dev/null @@ -1,50 +0,0 @@ -# Copyright (c) 2021-2024, NVIDIA CORPORATION. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import dataclasses -import logging -import typing - -from morpheus.messages.message_meta import MessageMeta -from morpheus.messages.multi_message import MultiMessage - -if (typing.TYPE_CHECKING): - from morpheus.models import dfencoder - -logger = logging.getLogger(__name__) - - -@dataclasses.dataclass(init=False) -class MultiAEMessage(MultiMessage): - """ - Subclass of `MultiMessage` specific to the AutoEncoder pipeline, which contains the model. - """ - - model: "dfencoder.AutoEncoder" - train_scores_mean: float - train_scores_std: float - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - model: "dfencoder.AutoEncoder", - train_scores_mean: float = 0.0, - train_scores_std: float = 1.0): - super().__init__(meta=meta, mess_offset=mess_offset, mess_count=mess_count) - - self.model = model - self.train_scores_mean = train_scores_mean - self.train_scores_std = train_scores_std diff --git a/python/morpheus/morpheus/messages/multi_inference_ae_message.py b/python/morpheus/morpheus/messages/multi_inference_ae_message.py deleted file mode 100644 index e1605f1bcb..0000000000 --- a/python/morpheus/morpheus/messages/multi_inference_ae_message.py +++ /dev/null @@ -1,97 +0,0 @@ -# Copyright (c) 2021-2024, NVIDIA CORPORATION. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import dataclasses -import typing - -from morpheus.messages.memory.tensor_memory import TensorMemory -from morpheus.messages.message_meta import MessageMeta -from morpheus.messages.message_meta import UserMessageMeta -from morpheus.messages.multi_inference_message import MultiInferenceMessage -from morpheus.models.dfencoder.autoencoder import AutoEncoder - - -@dataclasses.dataclass -class MultiInferenceAEMessage(MultiInferenceMessage): - """ - A stronger typed version of `MultiInferenceMessage` that is used for AE workloads. Helps ensure the - proper inputs are set and eases debugging. Associates a user ID with a message. - """ - - required_tensors: typing.ClassVar[typing.List[str]] = ["seq_ids"] - - model: AutoEncoder - # train_loss_scores: cp.ndarray - train_scores_mean: float - train_scores_std: float - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - memory: TensorMemory = None, - offset: int = 0, - count: int = -1, - model: AutoEncoder = None, - train_scores_mean: float = float("NaN"), - train_scores_std: float = float("NaN")): - - super().__init__(meta=meta, - mess_offset=mess_offset, - mess_count=mess_count, - memory=memory, - offset=offset, - count=count) - - self.model = model - self.train_scores_mean = train_scores_mean - self.train_scores_std = train_scores_std - - @property - def user_id(self): - """ - Returns the user ID associated with this message. - - """ - - return typing.cast(UserMessageMeta, self.meta).user_id - - @property - def input(self): - """ - Returns autoecoder input tensor. - - Returns - ------- - cupy.ndarray - The autoencoder input tensor. - - """ - - return self.get_input("input") - - @property - def seq_ids(self): - """ - Returns sequence ids, which are used to keep track of messages in a multi-threaded environment. - - Returns - ------- - cupy.ndarray - seq_ids - - """ - - return self.get_input("seq_ids") diff --git a/python/morpheus/morpheus/messages/multi_inference_message.py b/python/morpheus/morpheus/messages/multi_inference_message.py deleted file mode 100644 index a3b563ca44..0000000000 --- a/python/morpheus/morpheus/messages/multi_inference_message.py +++ /dev/null @@ -1,210 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import dataclasses -import typing - -import morpheus._lib.messages as _messages -from morpheus.messages.memory.tensor_memory import TensorMemory -from morpheus.messages.message_meta import MessageMeta -from morpheus.messages.multi_tensor_message import MultiTensorMessage - - -@dataclasses.dataclass -class MultiInferenceMessage(MultiTensorMessage, cpp_class=_messages.MultiInferenceMessage): - """ - This is a container class that holds the InferenceMemory container and the metadata of the data contained - within it. Builds on top of the `MultiTensorMessage` class to add additional data for inferencing. - - This class requires two separate memory blocks for a batch. One for the message metadata (i.e., start time, - IP address, etc.) and another for the raw inference inputs (i.e., input_ids, seq_ids). Since there can be - more inference input requests than messages (This happens when some messages get broken into multiple - inference requests) this class stores two different offset and count values. `mess_offset` and - `mess_count` refer to the offset and count in the message metadata batch and `offset` and `count` index - into the inference batch data. - """ - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - memory: TensorMemory = None, - offset: int = 0, - count: int = -1): - - super().__init__(meta=meta, - mess_offset=mess_offset, - mess_count=mess_count, - memory=memory, - offset=offset, - count=count) - - @property - def inputs(self): - """ - Get inputs stored in the InferenceMemory container. Alias for `MultiInferenceMessage.tensors`. - - Returns - ------- - cupy.ndarray - Inference inputs. - - """ - return self.tensors - - def get_input(self, name: str): - """ - Get input stored in the InferenceMemory container. Alias for `MultiInferenceMessage.get_tensor`. - - Parameters - ---------- - name : str - Input key name. - - Returns - ------- - cupy.ndarray - Inference input. - - Raises - ------ - KeyError - When no matching input tensor exists. - """ - return self.get_tensor(name) - - -@dataclasses.dataclass -class MultiInferenceNLPMessage(MultiInferenceMessage, cpp_class=_messages.MultiInferenceNLPMessage): - """ - A stronger typed version of `MultiInferenceMessage` that is used for NLP workloads. Helps ensure the - proper inputs are set and eases debugging. - """ - - required_tensors: typing.ClassVar[typing.List[str]] = ["input_ids", "input_mask", "seq_ids"] - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - memory: TensorMemory = None, - offset: int = 0, - count: int = -1): - - super().__init__(meta=meta, - mess_offset=mess_offset, - mess_count=mess_count, - memory=memory, - offset=offset, - count=count) - - @property - def input_ids(self): - """ - Returns token-ids for each string padded with 0s to max_length. - - Returns - ------- - cupy.ndarray - The token-ids for each string padded with 0s to max_length. - - """ - - return self._get_tensor_prop("input_ids") - - @property - def input_mask(self): - """ - Returns mask for token-ids result where corresponding positions identify valid token-id values. - - Returns - ------- - cupy.ndarray - The mask for token-ids result where corresponding positions identify valid token-id values. - - """ - - return self._get_tensor_prop("input_mask") - - @property - def seq_ids(self): - """ - Returns sequence ids, which are used to keep track of which inference requests belong to each message. - - Returns - ------- - cupy.ndarray - Ids used to index from an inference input to a message. Necessary since there can be more - inference inputs than messages (i.e., if some messages get broken into multiple inference requests). - - """ - - return self._get_tensor_prop("seq_ids") - - -@dataclasses.dataclass -class MultiInferenceFILMessage(MultiInferenceMessage, cpp_class=_messages.MultiInferenceFILMessage): - """ - A stronger typed version of `MultiInferenceMessage` that is used for FIL workloads. Helps ensure the - proper inputs are set and eases debugging. - """ - - required_tensors: typing.ClassVar[typing.List[str]] = ["input__0", "seq_ids"] - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - memory: TensorMemory = None, - offset: int = 0, - count: int = -1): - - super().__init__(meta=meta, - mess_offset=mess_offset, - mess_count=mess_count, - memory=memory, - offset=offset, - count=count) - - @property - def input__0(self): - """ - Input to FIL model inference. - - Returns - ------- - cupy.ndarray - Input data. - - """ - - return self._get_tensor_prop("input__0") - - @property - def seq_ids(self): - """ - Returns sequence ids, which are used to keep track of messages in a multi-threaded environment. - - Returns - ------- - cupy.ndarray - Sequence ids. - - """ - - return self._get_tensor_prop("seq_ids") diff --git a/python/morpheus/morpheus/messages/multi_message.py b/python/morpheus/morpheus/messages/multi_message.py deleted file mode 100644 index 44e1bb6cba..0000000000 --- a/python/morpheus/morpheus/messages/multi_message.py +++ /dev/null @@ -1,484 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import dataclasses -import inspect -import typing - -import cupy as cp -import numpy as np -import pandas as pd - -import cudf - -import morpheus._lib.messages as _messages -from morpheus.messages.message_base import MessageData -from morpheus.messages.message_meta import MessageMeta - -# Needed to provide the return type of `@classmethod` -Self = typing.TypeVar("Self", bound="MultiMessage") - - -@dataclasses.dataclass -class MultiMessage(MessageData, cpp_class=_messages.MultiMessage): - """ - This class holds data for multiple messages at a time. To avoid copying data for slicing operations, it - holds a reference to a batched metadata object and stores the offset and count into that batch. - - Parameters - ---------- - meta : `MessageMeta` - Deserialized messages metadata for large batch. - mess_offset : int - Offset into the metadata batch. - mess_count : int - Messages count. - - """ - meta: MessageMeta = dataclasses.field(repr=False) - mess_offset: int - mess_count: int - - def __init__(self, *, meta: MessageMeta, mess_offset: int = 0, mess_count: int = -1): - - if meta is None: - raise ValueError("Must define `meta` when creating MultiMessage") - - # Use the meta count if not supplied - if (mess_count == -1): - mess_count = meta.count - mess_offset - - # Check for valid offsets and counts - if mess_offset < 0 or mess_offset >= meta.count: - raise ValueError("Invalid message offset value") - if mess_count <= 0 or (mess_offset + mess_count > meta.count): - raise ValueError("Invalid message count value") - - self.meta = meta - self.mess_offset = mess_offset - self.mess_count = mess_count - - self._base_init_run = True - - def __init_subclass__(cls, **kwargs): - super().__init_subclass__(**kwargs) - - # Until we migrate to python 3.10, its impossible to do keyword only dataclasses. - # Simple article here: https://medium.com/@aniscampos/python-dataclass-inheritance-finally-686eaf60fbb5 - # Once we migrate, we can use `__post_init__` like normal - if (cls.__init__ is MultiMessage.__init__): - raise ValueError(f"Class `{cls}` is improperly configured. " - f"All derived classes of `MultiMessage` must define an `__init__` function which " - f"calls `super().__init__(*args, **kwargs)`.") - - @property - def id_col(self): - """ - Returns ID column values from `morpheus.pipeline.messages.MessageMeta.df`. - - Returns - ------- - pandas.Series - ID column values from the dataframe. - - """ - return self.get_meta("ID") - - @property - def id(self) -> typing.List[int]: # pylint: disable=invalid-name - """ - Returns ID column values from `morpheus.pipeline.messages.MessageMeta.df` as list. - - Returns - ------- - List[int] - ID column values from the dataframe as list. - - """ - - return self.get_meta_list("ID") - - @property - def timestamp(self) -> typing.List[int]: - """ - Returns timestamp column values from morpheus.messages.MessageMeta.df as list. - - Returns - ------- - List[int] - Timestamp column values from the dataframe as list. - - """ - - return self.get_meta_list("timestamp") - - def _get_indexers(self, df, columns: typing.Union[None, str, typing.List[str]] = None): - row_indexer = slice(self.mess_offset, self.mess_offset + self.mess_count, 1) - - if (columns is None): - columns = df.columns.to_list() - elif (isinstance(columns, str)): - # Convert a single string into a list so all versions return tables, not series - columns = [columns] - - column_indexer = df.columns.get_indexer_for(columns) - - return row_indexer, column_indexer - - def _calc_message_slice_bounds(self, start: int, stop: int): - - # Start must be between [0, mess_count) - if (start < 0 or start >= self.mess_count): - raise IndexError("Invalid message `start` argument") - - # Stop must be between (start, mess_count] - if (stop <= start or stop > self.mess_count): - raise IndexError("Invalid message `stop` argument") - - # Calculate the new offset and count - offset = self.mess_offset + start - count = stop - start - - return offset, count - - def get_meta_column_names(self) -> list[str]: - """ - Return column names available in the underlying DataFrame. - - Returns - ------- - list[str] - Column names from the dataframe. - - """ - - return self.meta.get_column_names() - - @typing.overload - def get_meta(self) -> cudf.DataFrame: - ... - - @typing.overload - def get_meta(self, columns: str) -> cudf.Series: - ... - - @typing.overload - def get_meta(self, columns: typing.List[str]) -> cudf.DataFrame: - ... - - def get_meta(self, columns: typing.Union[None, str, typing.List[str]] = None): - """ - Return column values from `morpheus.pipeline.messages.MessageMeta.df`. - - Parameters - ---------- - columns : typing.Union[None, str, typing.List[str]] - Input column names. Returns all columns if `None` is specified. When a string is passed, a `Series` is - returned. Otherwise, a `Dataframe` is returned. - - Returns - ------- - Series or Dataframe - Column values from the dataframe. - - """ - - with self.meta.mutable_dataframe() as df: - row_indexer, column_indexer = self._get_indexers(df, columns=columns) - - if (-1 in column_indexer): - missing_columns = [columns[i] for i, index_value in enumerate(column_indexer) if index_value == -1] - raise KeyError(f"Requested columns {missing_columns} does not exist in the dataframe") - - if (isinstance(columns, str) and len(column_indexer) == 1): - # Make sure to return a series for a single column - column_indexer = column_indexer[0] - - return df.iloc[row_indexer, column_indexer] - - def get_meta_list(self, col_name: str = None): - """ - Return a column values from morpheus.messages.MessageMeta.df as a list. - - Parameters - ---------- - col_name : str - Column name in the dataframe. - - Returns - ------- - List[str] - Column values from the dataframe. - - """ - return self.get_meta(col_name).to_arrow().to_pylist() - - def set_meta(self, columns: typing.Union[None, str, typing.List[str]], value): - """ - Set column values to `morpheus.pipelines.messages.MessageMeta.df`. - - Parameters - ---------- - columns : typing.Union[None, str, typing.List[str]] - Input column names. Sets the value for the corresponding column names. If `None` is specified, all columns - will be used. If the column does not exist, a new one will be created. - value : Any - Value to apply to the specified columns. If a single value is passed, it will be broadcast to all rows. If a - `Series` or `Dataframe` is passed, rows will be matched by index. - - """ - - # Get exclusive access to the dataframe - with self.meta.mutable_dataframe() as df: - # First try to set the values on just our slice if the columns exist - row_indexer, column_indexer = self._get_indexers(df, columns=columns) - - # Check if the value is a cupy array and we have a pandas dataframe, convert to numpy - if (isinstance(value, cp.ndarray) and isinstance(df, pd.DataFrame)): - value = value.get() - - # Check to see if we are adding a column. If so, we need to use df.loc instead of df.iloc - if (-1 not in column_indexer): - - # If we only have one column, convert it to a series (broadcasts work with more types on a series) - if (len(column_indexer) == 1): - column_indexer = column_indexer[0] - - try: - # Now update the slice - df.iloc[row_indexer, column_indexer] = value - except (ValueError, TypeError): - # Try this as a fallback. Works better for strings. See issue #286 - df[columns].iloc[row_indexer] = value - - else: - # Columns should never be empty if we get here - assert columns is not None - - # cudf is really bad at adding new columns - if (isinstance(df, cudf.DataFrame)): - - # TODO(morpheus#1487): This logic no longer works in CUDF 24.04. - # We should find a way to reinable the no-dropped-index path as - # that should be more performant than dropping the index. - # # saved_index = None - - # # # Check to see if we can use slices - # # if (not (df.index.is_unique and - # # (df.index.is_monotonic_increasing or df.index.is_monotonic_decreasing))): - # # # Save the index and reset - # # saved_index = df.index - # # df.reset_index(drop=True, inplace=True) - - # # # Perform the update via slices - # # df.loc[df.index[row_indexer], columns] = value - - # # # Reset the index if we changed it - # # if (saved_index is not None): - # # df.set_index(saved_index, inplace=True) - - saved_index = df.index - df.reset_index(drop=True, inplace=True) - df.loc[df.index[row_indexer], columns] = value - df.set_index(saved_index, inplace=True) - else: - # Need to determine the boolean mask to use indexes with df.loc - row_mask = self._ranges_to_mask(df, [(self.mess_offset, self.mess_offset + self.mess_count)]) - - # Now set the slice - df.loc[row_mask, columns] = value - - def get_slice(self, start, stop): - """ - Returns sliced batches based on offsets supplied. Automatically calculates the correct `mess_offset` - and `mess_count`. - - Parameters - ---------- - start : int - Start offset address. - stop : int - Stop offset address. - - Returns - ------- - `MultiInferenceMessage` - A new `MultiInferenceMessage` with sliced offset and count. - - """ - - # Calc the offset and count. This checks the bounds for us - offset, count = self._calc_message_slice_bounds(start=start, stop=stop) - - return self.from_message(self, meta=self.meta, mess_offset=offset, mess_count=count) - - def _ranges_to_mask(self, df, ranges): - if isinstance(df, cudf.DataFrame): - zeros_fn = cp.zeros - else: - zeros_fn = np.zeros - - mask = zeros_fn(len(df), bool) - - for range_ in ranges: - mask[range_[0]:range_[1]] = True - - return mask - - def copy_meta_ranges(self, - ranges: typing.List[typing.Tuple[int, int]], - mask: typing.Union[None, cp.ndarray, np.ndarray] = None): - """ - Perform a copy of the underlying dataframe for the given `ranges` of rows. - - Parameters - ---------- - ranges : typing.List[typing.Tuple[int, int]] - Rows to include in the copy in the form of `[(`start_row`, `stop_row`),...]` - The `stop_row` isn't included. For example to copy rows 1-2 & 5-7 `ranges=[(1, 3), (5, 8)]` - - mask : typing.Union[None, cupy.ndarray, numpy.ndarray] - Optionally specify rows as a cupy array (when using cudf Dataframes) or a numpy array (when using pandas - Dataframes) of booleans. When not-None `ranges` will be ignored. This is useful as an optimization as this - avoids needing to generate the mask on it's own. - - Returns - ------- - `Dataframe` - """ - df = self.get_meta() - - if mask is None: - mask = self._ranges_to_mask(df, ranges=ranges) - - return df.loc[mask, :] - - def copy_ranges(self, ranges: typing.List[typing.Tuple[int, int]]): - """ - Perform a copy of the current message instance for the given `ranges` of rows. - - Parameters - ---------- - ranges : typing.List[typing.Tuple[int, int]] - Rows to include in the copy in the form of `[(`start_row`, `stop_row`),...]` - The `stop_row` isn't included. For example to copy rows 1-2 & 5-7 `ranges=[(1, 3), (5, 8)]` - - Returns - ------- - `MultiMessage` - """ - sliced_rows = self.copy_meta_ranges(ranges) - - return self.from_message(self, meta=MessageMeta(sliced_rows), mess_offset=0, mess_count=len(sliced_rows)) - - @classmethod - def from_message(cls: type[Self], - message: "MultiMessage", - *, - meta: MessageMeta = None, - mess_offset: int = -1, - mess_count: int = -1, - **kwargs) -> Self: - """ - Creates a new instance of a derived class from `MultiMessage` using an existing message as the template. This is - very useful when a new message needs to be created with a single change to an existing `MessageMeta`. - - When creating the new message, all required arguments for the class specified by `cls` will be pulled from - `message` unless otherwise specified in the `kwargs`. Special handling is performed depending on - whether or not a new `meta` object is supplied. If one is supplied, the offset and count defaults will be 0 and - `meta.count` respectively. Otherwise offset and count will be pulled from the input `message`. - - Parameters - ---------- - cls : typing.Type[Self] - The class to create - message : MultiMessage - An existing message to use as a template. Can be a base or derived from `cls` as long as all arguments can - be pulled from `message` or proveded in `kwargs` - meta : MessageMeta, optional - A new `MessageMeta` to use, by default None - mess_offset : int, optional - A new `mess_offset` to use, by default -1 - mess_count : int, optional - A new `mess_count` to use, by default -1 - **kwargs : `dict` - Keyword arguments to use when creating the new instance. - - Returns - ------- - Self - A new instance of type `cls` - - Raises - ------ - ValueError - If the incoming `message` is None - AttributeError - If some required arguments were not supplied by `kwargs` and could not be pulled from `message` - """ - - if (message is None): - raise ValueError("Must define `message` when creating a MultiMessage with `from_message`") - - if (mess_offset == -1): - if (meta is not None): - mess_offset = 0 - else: - mess_offset = message.mess_offset - - if (mess_count == -1): - if (meta is not None): - # Subtract offset here so we dont go over the end - mess_count = meta.count - mess_offset - else: - mess_count = message.mess_count - - # Do meta last - if meta is None: - meta = message.meta - - # Update the kwargs - kwargs.update({ - "meta": meta, - "mess_offset": mess_offset, - "mess_count": mess_count, - }) - - signature = inspect.signature(cls.__init__) - - for p_name, param in signature.parameters.items(): - - if (p_name == "self"): - # Skip self until this is fixed (python 3.9) https://github.com/python/cpython/issues/85074 - # After that, switch to using inspect.signature(cls) - continue - - # Skip if its already defined - if (p_name in kwargs): - continue - - if (not hasattr(message, p_name)): - # Check for a default - if (param.default == inspect.Parameter.empty): - raise AttributeError( - f"Cannot create message of type {cls}, from {message}. Missing property '{p_name}'") - - # Otherwise, we can ignore - continue - - kwargs[p_name] = getattr(message, p_name) - - # Create a new instance using the kwargs - return cls(**kwargs) diff --git a/python/morpheus/morpheus/messages/multi_response_message.py b/python/morpheus/morpheus/messages/multi_response_message.py deleted file mode 100644 index de7a2fb881..0000000000 --- a/python/morpheus/morpheus/messages/multi_response_message.py +++ /dev/null @@ -1,200 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import dataclasses -import logging -import typing - -import morpheus._lib.messages as _messages -from morpheus.messages.memory.tensor_memory import TensorMemory -from morpheus.messages.message_meta import MessageMeta -from morpheus.messages.multi_tensor_message import MultiTensorMessage -from morpheus.utils import logger as morpheus_logger - -logger = logging.getLogger(__name__) - - -@dataclasses.dataclass -class MultiResponseMessage(MultiTensorMessage, cpp_class=_messages.MultiResponseMessage): - """ - This class contains several inference responses as well as the cooresponding message metadata. - """ - - probs_tensor_name: typing.ClassVar[str] = "probs" - """Name of the tensor that holds output probabilities""" - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - memory: TensorMemory = None, - offset: int = 0, - count: int = -1, - id_tensor_name: str = "seq_ids", - probs_tensor_name: str = "probs"): - - if probs_tensor_name is None: - raise ValueError("Cannot use None for `probs_tensor_name`") - - self.probs_tensor_name = probs_tensor_name - - # Add the tensor name to the required list - if (self.probs_tensor_name not in self.required_tensors): - # Make sure to set a new variable here instead of append otherwise you change all classes - self.required_tensors = self.required_tensors + [self.probs_tensor_name] - - super().__init__(meta=meta, - mess_offset=mess_offset, - mess_count=mess_count, - memory=memory, - offset=offset, - count=count, - id_tensor_name=id_tensor_name) - - @property - def outputs(self): - """ - Get outputs stored in the TensorMemory container. Alias for `MultiResponseMessage.tensors`. - - Returns - ------- - cupy.ndarray - Inference outputs. - - """ - return self.tensors - - def get_output(self, name: str): - """ - Get output stored in the TensorMemory container. Alias for `MultiResponseMessage.get_tensor`. - - Parameters - ---------- - name : str - Output key name. - - Returns - ------- - cupy.ndarray - Inference output. - - """ - return self.get_tensor(name) - - def get_probs_tensor(self): - """ - Get the tensor that holds output probabilities. Equivalent to `get_tensor(probs_tensor_name)` - - Returns - ------- - cupy.ndarray - The probabilities tensor - - Raises - ------ - KeyError - If `self.probs_tensor_name` is not found in the tensors - """ - - try: - return self.get_tensor(self.probs_tensor_name) - except KeyError as exc: - raise KeyError(f"Cannopt get ID tensor. Tensor with name '{self.probs_tensor_name}' " - "does not exist in the memory object") from exc - - -@dataclasses.dataclass -class MultiResponseProbsMessage(MultiResponseMessage, cpp_class=_messages.MultiResponseProbsMessage): - """ - A stronger typed version of `MultiResponseMessage` that is used for inference workloads that return a probability - array. Helps ensure the proper outputs are set and eases debugging. - """ - - required_tensors: typing.ClassVar[typing.List[str]] = ["probs"] - - def __new__(cls, *args, **kwargs): - morpheus_logger.deprecated_message_warning(cls, MultiResponseMessage) - return super(MultiResponseMessage, cls).__new__(cls, *args, **kwargs) - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - memory: TensorMemory, - offset: int = 0, - count: int = -1, - id_tensor_name: str = "seq_ids", - probs_tensor_name: str = "probs"): - - super().__init__(meta=meta, - mess_offset=mess_offset, - mess_count=mess_count, - memory=memory, - offset=offset, - count=count, - id_tensor_name=id_tensor_name, - probs_tensor_name=probs_tensor_name) - - @property - def probs(self): - """ - Probabilities of prediction. - - Returns - ------- - cupy.ndarray - probabilities - - """ - - return self._get_tensor_prop("probs") - - -@dataclasses.dataclass -class MultiResponseAEMessage(MultiResponseMessage, cpp_class=None): - """ - A stronger typed version of `MultiResponseProbsMessage` that is used for inference workloads that return a - probability array. Helps ensure the proper outputs are set and eases debugging. - """ - - user_id: str = None - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - memory: TensorMemory = None, - offset: int = 0, - count: int = -1, - id_tensor_name: str = "seq_ids", - probs_tensor_name: str = "probs", - user_id: str = None): - - if (user_id is None): - raise ValueError(f"Must define `user_id` when creating {self.__class__.__name__}") - - self.user_id = user_id - - super().__init__(meta=meta, - mess_offset=mess_offset, - mess_count=mess_count, - memory=memory, - offset=offset, - count=count, - id_tensor_name=id_tensor_name, - probs_tensor_name=probs_tensor_name) diff --git a/python/morpheus/morpheus/messages/multi_tensor_message.py b/python/morpheus/morpheus/messages/multi_tensor_message.py deleted file mode 100644 index 573eee9bd8..0000000000 --- a/python/morpheus/morpheus/messages/multi_tensor_message.py +++ /dev/null @@ -1,409 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2023-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import dataclasses -import typing - -import morpheus._lib.messages as _messages -from morpheus.messages.memory.tensor_memory import TensorMemory -from morpheus.messages.message_meta import MessageMeta -from morpheus.messages.multi_message import MultiMessage - -# Needed to provide the return type of `@classmethod` -Self = typing.TypeVar("Self", bound="MultiTensorMessage") - - -@dataclasses.dataclass -class MultiTensorMessage(MultiMessage, cpp_class=_messages.MultiTensorMessage): - """ - This class contains several inference responses as well as the corresponding message metadata. - - Parameters - ---------- - memory : `TensorMemory` - Container holding generic tensor data in cupy arrays - offset : int - Offset of each message into the `TensorMemory` block. - count : int - Number of rows in the `TensorMemory` block. - """ - - memory: TensorMemory = dataclasses.field(repr=False) - offset: int - count: int - - required_tensors: typing.ClassVar[typing.List[str]] = [] - """The tensor names that are required for instantiation""" - id_tensor_name: typing.ClassVar[str] = "seq_ids" - """Name of the tensor that correlates tensor rows to message IDs""" - - def __init__(self, - *, - meta: MessageMeta, - mess_offset: int = 0, - mess_count: int = -1, - memory: TensorMemory, - offset: int = 0, - count: int = -1, - id_tensor_name: str = "seq_ids"): - - if memory is None: - raise ValueError(f"Must define `memory` when creating {self.__class__.__name__}") - - # Use the meta count if not supplied - if (count == -1): - count = memory.count - offset - - # Check for valid offsets and counts - if offset < 0 or offset >= memory.count: - raise ValueError("Invalid offset value") - if count <= 0 or (offset + count > memory.count): - raise ValueError("Invalid count value") - - self.memory = memory - self.offset = offset - self.count = count - self.id_tensor_name = id_tensor_name - - # Call the base class last because the properties need to be initialized first - super().__init__(meta=meta, mess_offset=mess_offset, mess_count=mess_count) - - if (self.count < self.mess_count): - raise ValueError("Invalid count value. Must have a count greater than or equal to mess_count") - - # Check the ID tensor for consistency - self._check_id_tensor() - - # Finally, check for the required tensors class attribute - if (hasattr(self.__class__, "required_tensors")): - for tensor_name in self.__class__.required_tensors: - if (not memory.has_tensor(tensor_name)): - raise ValueError((f"`TensorMemory` object must have a '{tensor_name}' " - f"tensor to create `{self.__class__.__name__}`").format(self.__class__.__name__)) - - @property - def tensors(self): - """ - Get tensors stored in the TensorMemory container sliced according to `offset` and `count`. - - Returns - ------- - cupy.ndarray - Inference tensors. - - """ - tensors = self.memory.get_tensors() - return {key: self.get_tensor(key) for key in tensors.keys()} - - def __getattr__(self, name: str) -> typing.Any: - if ("memory" in self.__dict__ and self.memory.has_tensor(name)): - return self._get_tensor_prop(name) - - if hasattr(super(), "__getattr__"): - return super().__getattr__(name) - raise AttributeError(f'No attribute named "{name}" exists') - - def _check_id_tensor(self): - - if (self.memory.has_tensor(self.id_tensor_name)): - # Check the bounds against the elements in the array - id_tensor = self.memory.get_tensor(self.id_tensor_name) - - first_element = id_tensor[self.offset, 0].item() - last_element = id_tensor[self.offset + self.count - 1, 0].item() - - if (first_element != self.mess_offset): - raise RuntimeError(f"Inconsistent ID column. First element in '{self.id_tensor_name}' tensor, " - f"[{first_element}], must match mess_offset, [{self.mess_offset}]") - - if (last_element != self.mess_offset + self.mess_count - 1): - raise RuntimeError(f"Inconsistent ID column. Last element in '{self.id_tensor_name}' tensor, " - f"[{last_element}], must not extend beyond last message, " - f"[{self.mess_offset + self.mess_count - 1}]") - - def _calc_message_slice_bounds(self, start: int, stop: int): - - mess_start = start - mess_stop = stop - - if (self.count != self.mess_count): - - if (not self.memory.has_tensor(self.id_tensor_name)): - raise RuntimeError( - f"The tensor memory object is missing the required ID tensor '{self.id_tensor_name}' " - f"this tensor is required to make slices of MultiTensorMessages") - - id_tensor = self.get_tensor(self.id_tensor_name) - - # Now determine the new mess_start and mess_stop - mess_start = id_tensor[start, 0].item() - self.mess_offset - mess_stop = id_tensor[stop - 1, 0].item() + 1 - self.mess_offset - - # Return the base calculation now - return super()._calc_message_slice_bounds(start=mess_start, stop=mess_stop) - - def _calc_memory_slice_bounds(self, start: int, stop: int): - - # Start must be between [0, mess_count) - if (start < 0 or start >= self.count): - raise IndexError("Invalid memory `start` argument") - - # Stop must be between (start, mess_count] - if (stop <= start or stop > self.count): - raise IndexError("Invalid memory `stop` argument") - - # Calculate the new offset and count - offset = self.offset + start - count = stop - start - - return offset, count - - def get_tensor(self, name: str): - """ - Get tensor stored in the TensorMemory container. - - Parameters - ---------- - name : str - tensor key name. - - Returns - ------- - cupy.ndarray - Inference tensor. - - """ - return self.memory.get_tensor(name)[self.offset:self.offset + self.count, :] - - def get_id_tensor(self): - """ - Get the tensor that holds message ID information. Equivalent to `get_tensor(id_tensor_name)` - - Returns - ------- - cupy.ndarray - Array containing the ID information - - Raises - ------ - KeyError - If `self.id_tensor_name` is not found in the tensors - """ - - try: - return self.get_tensor(self.id_tensor_name) - except KeyError as exc: - raise KeyError(f"Cannopt get ID tensor. Tensor with name '{self.id_tensor_name}' " - "does not exist in the memory object") from exc - - def _get_tensor_prop(self, name: str): - """ - This method is intended to be used by propery methods in subclasses - - Parameters - ---------- - name : str - Tensor key name. - - Returns - ------- - cupy.ndarray - Tensor. - - Raises - ------ - AttributeError - If tensor name does not exist in the container. - """ - try: - return self.get_tensor(name) - except KeyError as e: - raise AttributeError(f'No attribute named "{name}" exists') from e - - def copy_tensor_ranges(self, ranges, mask=None): - """ - Perform a copy of the underlying tensor tensors for the given `ranges` of rows. - - Parameters - ---------- - ranges : typing.List[typing.Tuple[int, int]] - Rows to include in the copy in the form of `[(`start_row`, `stop_row`),...]` - The `stop_row` isn't included. For example to copy rows 1-2 & 5-7 `ranges=[(1, 3), (5, 8)]` - - mask : typing.Union[None, cupy.ndarray, numpy.ndarray] - Optionally specify rows as a cupy array (when using cudf Dataframes) or a numpy array (when using pandas - Dataframes) of booleans. When not-None `ranges` will be ignored. This is useful as an optimization as this - avoids needing to generate the mask on it's own. - - Returns - ------- - typing.Dict[str, cupy.ndarray] - """ - if mask is None: - mask = self._ranges_to_mask(self.get_meta(), ranges=ranges) - - # The tensors property method returns a copy with the offsets applied - tensors = self.tensors - return {key: tensor[mask] for (key, tensor) in tensors.items()} - - def copy_ranges(self, ranges: typing.List[typing.Tuple[int, int]]): - """ - Perform a copy of the current message, dataframe and tensors for the given `ranges` of rows. - - Parameters - ---------- - ranges : typing.List[typing.Tuple[int, int]] - Rows to include in the copy in the form of `[(`start_row`, `stop_row`),...]` - The `stop_row` isn't included. For example to copy rows 1-2 & 5-7 `ranges=[(1, 3), (5, 8)]` - - ------- - `MultiTensorMessage` - """ - mask = self._ranges_to_mask(self.get_meta(), ranges) - sliced_rows = self.copy_meta_ranges(ranges, mask=mask) - sliced_count = len(sliced_rows) - sliced_tensors = self.copy_tensor_ranges(ranges, mask=mask) - - mem = TensorMemory(count=sliced_count, tensors=sliced_tensors) - - return self.from_message(self, - meta=MessageMeta(sliced_rows), - mess_offset=0, - mess_count=sliced_count, - memory=mem, - offset=0, - count=sliced_count) - - def get_slice(self: Self, start, stop) -> Self: - """ - Perform a slice of the current message from `start`:`stop` (excluding `stop`) - - For example to slice from rows 1-3 use `m.get_slice(1, 4)`. The returned `MultiTensorMessage` will contain - references to the same underlying Dataframe and tensor tensors, and this calling this method is reletively low - cost compared to `MultiTensorMessage.copy_ranges` - - Parameters - ---------- - start : int - Starting row of the slice - - stop : int - Stop of the slice - - ------- - `MultiTensorMessage` - """ - - # Calc the offset and count. This checks the bounds for us - mess_offset, mess_count = self._calc_message_slice_bounds(start=start, stop=stop) - offset, count = self._calc_memory_slice_bounds(start=start, stop=stop) - - kwargs = { - "meta": self.meta, - "mess_offset": mess_offset, - "mess_count": mess_count, - "memory": self.memory, - "offset": offset, - "count": count, - } - - return self.from_message(self, **kwargs) - - @classmethod - def from_message(cls: type[Self], - message: "MultiTensorMessage", - *, - meta: MessageMeta = None, - mess_offset: int = -1, - mess_count: int = -1, - memory: TensorMemory = None, - offset: int = -1, - count: int = -1, - **kwargs) -> Self: - """ - Creates a new instance of a derived class from `MultiMessage` using an existing message as the template. This is - very useful when a new message needs to be created with a single change to an existing `MessageMeta`. - - When creating the new message, all required arguments for the class specified by `cls` will be pulled from - `message` unless otherwise specified in the `kwargs`. Special handling is performed depending on - whether or not a new `meta` object is supplied. If one is supplied, the offset and count defaults will be 0 and - `meta.count` respectively. Otherwise offset and count will be pulled from the input `message`. - - - Parameters - ---------- - cls : typing.Type[Self] - The class to create - message : MultiMessage - An existing message to use as a template. Can be a base or derived from `cls` as long as all arguments can - be pulled from `message` or proveded in `kwargs` - meta : MessageMeta, optional - A new `MessageMeta` to use, by default None - mess_offset : int, optional - A new `mess_offset` to use, by default -1 - mess_count : int, optional - A new `mess_count` to use, by default -1 - memory : TensorMemory, optional - A new `TensorMemory` to use. If supplied, `offset` and `count` default to `0` and `memory.count` - respectively. By default None - offset : int, optional - A new `offset` to use, by default -1 - count : int, optional - A new `count` to use, by default -1 - **kwargs : `dict` - Keyword arguments to use when creating the new instance. - - Returns - ------- - Self - A new instance of type `cls` - - Raises - ------ - ValueError - If the incoming `message` is None - """ - - if (message is None): - raise ValueError("Must define `message` when creating a MultiMessage with `from_message`") - - if (offset == -1): - if (memory is not None): - offset = 0 - else: - offset = message.offset - - if (count == -1): - if (memory is not None): - # Subtract offset here so we dont go over the end - count = memory.count - offset - else: - count = message.count - - # Do meta last - if memory is None: - memory = message.memory - - # Update the kwargs - kwargs.update({ - "meta": meta, - "mess_offset": mess_offset, - "mess_count": mess_count, - "memory": memory, - "offset": offset, - "count": count, - }) - - return super().from_message(message, **kwargs) diff --git a/tests/_utils/stages/multi_message_pass_thru.py b/tests/_utils/stages/control_message_pass_thru.py similarity index 82% rename from tests/_utils/stages/multi_message_pass_thru.py rename to tests/_utils/stages/control_message_pass_thru.py index eb0b5c3789..659606d38c 100644 --- a/tests/_utils/stages/multi_message_pass_thru.py +++ b/tests/_utils/stages/control_message_pass_thru.py @@ -17,24 +17,24 @@ import mrc from mrc.core import operators as ops -from morpheus.messages import MultiMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.pass_thru_type_mixin import PassThruTypeMixin from morpheus.pipeline.single_port_stage import SinglePortStage -class MultiMessagePassThruStage(PassThruTypeMixin, SinglePortStage): +class ControlMessagePassThruStage(PassThruTypeMixin, SinglePortStage): @property def name(self) -> str: return "mm-pass-thru" - def accepted_types(self) -> (MultiMessage, ): - return (MultiMessage, ) + def accepted_types(self): + return (ControlMessage, ) - def supports_cpp_node(self) -> bool: + def supports_cpp_node(self): return False - def on_data(self, message: MultiMessage): + def on_data(self, message: ControlMessage): return message def _build_single(self, builder: mrc.Builder, input_node: mrc.SegmentObject) -> mrc.SegmentObject: diff --git a/tests/examples/developer_guide/test_pass_thru.py b/tests/examples/developer_guide/test_pass_thru.py index e8ae2d0086..f98451f318 100644 --- a/tests/examples/developer_guide/test_pass_thru.py +++ b/tests/examples/developer_guide/test_pass_thru.py @@ -20,8 +20,8 @@ from _utils import TEST_DIRS from morpheus.config import Config +from morpheus.messages import ControlMessage from morpheus.messages import MessageMeta -from morpheus.messages import MultiMessage from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.utils.type_aliases import DataFrameType @@ -34,10 +34,11 @@ def _check_pass_thru(config: Config, assert isinstance(stage, SinglePortStage) meta = MessageMeta(filter_probs_df) - multi = MultiMessage(meta=meta) + msg = ControlMessage() + msg.payload(meta) on_data_fn = getattr(stage, on_data_fn_name) - assert on_data_fn(multi) is multi + assert on_data_fn(msg) is msg @pytest.mark.import_mod(os.path.join(TEST_DIRS.examples_dir, 'developer_guide/1_simple_python_stage/pass_thru.py')) diff --git a/tests/examples/digital_fingerprinting/conftest.py b/tests/examples/digital_fingerprinting/conftest.py index 2e7aeb0622..25fb2b54c0 100644 --- a/tests/examples/digital_fingerprinting/conftest.py +++ b/tests/examples/digital_fingerprinting/conftest.py @@ -91,7 +91,7 @@ def dfp_prod_in_sys_path( @pytest.fixture(name="dfp_message_meta") def dfp_message_meta_fixture(config, dataset_pandas: DatasetManager): import pandas as pd - from dfp.messages.multi_dfp_message import DFPMessageMeta + from dfp.messages.dfp_message_meta import DFPMessageMeta user_id = 'test_user' df = dataset_pandas['filter_probs.csv'] @@ -101,12 +101,11 @@ def dfp_message_meta_fixture(config, dataset_pandas: DatasetManager): @pytest.fixture -def dfp_multi_message(dfp_message_meta): - from dfp.messages.multi_dfp_message import MultiDFPMessage - yield MultiDFPMessage(meta=dfp_message_meta) - - -@pytest.fixture -def dfp_multi_ae_message(dfp_message_meta): - from morpheus.messages.multi_ae_message import MultiAEMessage - yield MultiAEMessage(meta=dfp_message_meta, model=mock.MagicMock()) +def control_message(dfp_message_meta): + from morpheus.messages import ControlMessage + message = ControlMessage() + message.payload(dfp_message_meta) + message.set_metadata("user_id", dfp_message_meta.user_id) + message.set_metadata("model", mock.MagicMock()) + + yield message diff --git a/tests/examples/digital_fingerprinting/test_dfp_inference_stage.py b/tests/examples/digital_fingerprinting/test_dfp_inference_stage.py index 46defbbbee..1175d0a61e 100644 --- a/tests/examples/digital_fingerprinting/test_dfp_inference_stage.py +++ b/tests/examples/digital_fingerprinting/test_dfp_inference_stage.py @@ -21,6 +21,7 @@ from _utils.dataset_manager import DatasetManager from morpheus.config import Config +from morpheus.messages import ControlMessage from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.utils.logger import set_log_level @@ -76,17 +77,16 @@ def test_on_data( config: Config, mock_mlflow_client: mock.MagicMock, # pylint: disable=unused-argument mock_model_manager: mock.MagicMock, - dfp_multi_message: "MultiDFPMessage", # noqa: F821 + control_message: "ControlMessage", # noqa: F821 log_level: int, dataset_pandas: DatasetManager): - from dfp.messages.multi_dfp_message import MultiDFPMessage from dfp.stages.dfp_inference_stage import DFPInferenceStage set_log_level(log_level) - expected_results = list(range(1000, dfp_multi_message.mess_count + 1000)) + expected_results = list(range(1000, control_message.payload().count + 1000)) - expected_df = dfp_multi_message.get_meta_dataframe().copy(deep=True) + expected_df = control_message.payload().copy_dataframe() expected_df["results"] = expected_results expected_df["model_version"] = "test_model_name:test_model_version" @@ -101,13 +101,12 @@ def test_on_data( mock_model_manager.load_user_model.return_value = mock_model_cache stage = DFPInferenceStage(config, model_name_formatter="test_model_name-{user_id}") - results = stage.on_data(dfp_multi_message) + results = stage.on_data(control_message) - assert isinstance(results, MultiDFPMessage) - assert results.meta is dfp_multi_message.meta - assert results.mess_offset == dfp_multi_message.mess_offset - assert results.mess_count == dfp_multi_message.mess_count - dataset_pandas.assert_compare_df(results.get_meta(), expected_df) + assert isinstance(results, ControlMessage) + assert results.payload() is control_message.payload() + assert results.payload().count == control_message.payload().count + dataset_pandas.assert_compare_df(results.payload().get_data(), expected_df) @pytest.mark.parametrize("raise_error", [True, False]) @@ -115,7 +114,7 @@ def test_on_data_get_model_error( config: Config, mock_mlflow_client: mock.MagicMock, # pylint: disable=unused-argument mock_model_manager: mock.MagicMock, - dfp_multi_message: "MultiDFPMessage", # noqa: F821 + control_message: "ControlMessage", # noqa: F821 raise_error: bool): from dfp.stages.dfp_inference_stage import DFPInferenceStage @@ -126,4 +125,4 @@ def test_on_data_get_model_error( mock_model_manager.load_user_model.return_value = None stage = DFPInferenceStage(config, model_name_formatter="test_model_name-{user_id}") - assert stage.on_data(dfp_multi_message) is None + assert stage.on_data(control_message) is None diff --git a/tests/examples/digital_fingerprinting/test_dfp_mlflow_model_writer.py b/tests/examples/digital_fingerprinting/test_dfp_mlflow_model_writer.py index 561f94f823..49fc093ba4 100644 --- a/tests/examples/digital_fingerprinting/test_dfp_mlflow_model_writer.py +++ b/tests/examples/digital_fingerprinting/test_dfp_mlflow_model_writer.py @@ -26,7 +26,7 @@ from _utils import TEST_DIRS from _utils.dataset_manager import DatasetManager from morpheus.config import Config -from morpheus.messages.multi_ae_message import MultiAEMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.single_port_stage import SinglePortStage MockedRequests = namedtuple("MockedRequests", ["get", "patch", "response"]) @@ -238,7 +238,7 @@ def test_on_data( databricks_env: dict, databricks_permissions: dict, tracking_uri: str): - from dfp.messages.multi_dfp_message import DFPMessageMeta + from dfp.messages.dfp_message_meta import DFPMessageMeta from dfp.stages.dfp_mlflow_model_writer import DFPMLFlowModelWriterStage from dfp.stages.dfp_mlflow_model_writer import conda_env @@ -273,7 +273,10 @@ def test_on_data( mock_model.get_anomaly_score.return_value = pd.Series(float(i) for i in range(len(df))) meta = DFPMessageMeta(df, 'Account-123456789') - msg = MultiAEMessage(meta=meta, model=mock_model) + msg = ControlMessage() + msg.payload(meta) + msg.set_metadata("model", mock_model) + msg.set_metadata("user_id", meta.user_id) stage = DFPMLFlowModelWriterStage(config, databricks_permissions=databricks_permissions, timeout=10) assert stage._controller.on_data(msg) is msg # Should be a pass-thru diff --git a/tests/examples/digital_fingerprinting/test_dfp_postprocessing_stage.py b/tests/examples/digital_fingerprinting/test_dfp_postprocessing_stage.py index 6eed4c0d9e..b173c145dc 100644 --- a/tests/examples/digital_fingerprinting/test_dfp_postprocessing_stage.py +++ b/tests/examples/digital_fingerprinting/test_dfp_postprocessing_stage.py @@ -21,7 +21,7 @@ from morpheus.common import TypeId from morpheus.config import Config -from morpheus.messages.multi_ae_message import MultiAEMessage +from morpheus.messages import ControlMessage from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.utils.logger import set_log_level @@ -39,7 +39,7 @@ def test_constructor(config: Config): @mock.patch('dfp.stages.dfp_postprocessing_stage.datetime') def test_process_events_on_data(mock_datetime: mock.MagicMock, config: Config, - dfp_multi_ae_message: MultiAEMessage, + control_message: ControlMessage, use_on_data: bool, log_level: int): from dfp.stages.dfp_postprocessing_stage import DFPPostprocessingStage @@ -49,7 +49,7 @@ def test_process_events_on_data(mock_datetime: mock.MagicMock, mock_datetime.now.return_value = mock_dt_obj # post-process should replace nans, lets add a nan to the DF - with dfp_multi_ae_message.meta.mutable_dataframe() as df: + with control_message.payload().mutable_dataframe() as df: df.loc[10, 'v2'] = np.nan df['event_time'] = '' @@ -58,18 +58,23 @@ def test_process_events_on_data(mock_datetime: mock.MagicMock, # on_data is a thin wrapper around process_events, tests should be the same for non-empty messages if use_on_data: - assert stage.on_data(dfp_multi_ae_message) is dfp_multi_ae_message + assert stage.on_data(control_message) is control_message else: - stage._process_events(dfp_multi_ae_message) + stage._process_events(control_message) - assert isinstance(dfp_multi_ae_message, MultiAEMessage) - result_df = dfp_multi_ae_message.meta.copy_dataframe() + assert isinstance(control_message, ControlMessage) + result_df = control_message.payload().copy_dataframe() assert (result_df['event_time'] == '2021-01-01T00:00:00Z').all() - assert result_df['v2'][10] == 'NaN' def test_on_data_none(config: Config): from dfp.stages.dfp_postprocessing_stage import DFPPostprocessingStage stage = DFPPostprocessingStage(config) assert stage.on_data(None) is None - assert stage.on_data(mock.MagicMock(mess_count=0)) is None + mock_payload = mock.MagicMock() + mock_payload.count = 0 + + mock_msg = mock.MagicMock() + mock_msg.payload.return_value = mock_payload + + assert stage.on_data(mock_msg) is None diff --git a/tests/examples/digital_fingerprinting/test_dfp_preprocessing_stage.py b/tests/examples/digital_fingerprinting/test_dfp_preprocessing_stage.py index c7859cd90c..538e20425e 100644 --- a/tests/examples/digital_fingerprinting/test_dfp_preprocessing_stage.py +++ b/tests/examples/digital_fingerprinting/test_dfp_preprocessing_stage.py @@ -19,6 +19,7 @@ from _utils.dataset_manager import DatasetManager from morpheus.config import Config +from morpheus.messages import ControlMessage from morpheus.pipeline.single_port_stage import SinglePortStage from morpheus.utils.column_info import ColumnInfo from morpheus.utils.column_info import CustomColumn @@ -39,15 +40,14 @@ def test_constructor(config: Config): @pytest.mark.parametrize('log_level', [logging.CRITICAL, logging.ERROR, logging.WARNING, logging.INFO, logging.DEBUG]) def test_process_features( config: Config, - dfp_multi_message: "MultiDFPMessage", # noqa: F821 + control_message: "ControlMessage", # noqa: F821 dataset_pandas: DatasetManager, log_level: int): - from dfp.messages.multi_dfp_message import MultiDFPMessage from dfp.stages.dfp_preprocessing_stage import DFPPreprocessingStage set_log_level(log_level) - expected_df = dfp_multi_message.get_meta_dataframe().copy(deep=True) + expected_df = control_message.payload().copy_dataframe() expected_df['v210'] = expected_df['v2'] + 10 expected_df['v3'] = expected_df['v3'].astype(str) @@ -57,7 +57,7 @@ def test_process_features( ]) stage = DFPPreprocessingStage(config, input_schema=schema) - results = stage.process_features(dfp_multi_message) + results = stage.process_features(control_message) - assert isinstance(results, MultiDFPMessage) - dataset_pandas.assert_compare_df(results.get_meta_dataframe(), expected_df) + assert isinstance(results, ControlMessage) + dataset_pandas.assert_compare_df(results.payload().get_data(), expected_df) diff --git a/tests/examples/digital_fingerprinting/test_dfp_rolling_window_stage.py b/tests/examples/digital_fingerprinting/test_dfp_rolling_window_stage.py index a499ba6f32..01504d7d47 100644 --- a/tests/examples/digital_fingerprinting/test_dfp_rolling_window_stage.py +++ b/tests/examples/digital_fingerprinting/test_dfp_rolling_window_stage.py @@ -165,9 +165,10 @@ def test_build_window( use_on_data: bool, dfp_message_meta: "DFPMessageMeta", # noqa: F821 dataset_pandas: DatasetManager): - from dfp.messages.multi_dfp_message import MultiDFPMessage from dfp.stages.dfp_rolling_window_stage import DFPRollingWindowStage + from morpheus.messages import ControlMessage + stage = DFPRollingWindowStage(config, min_history=5, min_increment=7, max_history=100, cache_dir='/test/path/cache') # Create an overlap @@ -183,11 +184,7 @@ def test_build_window( else: msg = stage._build_window(dfp_message_meta) - assert isinstance(msg, MultiDFPMessage) - assert msg.user_id == dfp_message_meta.user_id - assert msg.meta.user_id == dfp_message_meta.user_id - assert msg.mess_offset == 0 - assert msg.mess_count == len(dataset_pandas['filter_probs.csv']) - dataset_pandas.assert_df_equal(msg.get_meta(), train_df) - dataset_pandas.assert_df_equal(msg.meta.get_df(), train_df) - dataset_pandas.assert_df_equal(msg.get_meta_dataframe(), train_df) + assert isinstance(msg, ControlMessage) + assert msg.get_metadata("user_id") == dfp_message_meta.user_id + assert msg.payload().count == len(dataset_pandas['filter_probs.csv']) + dataset_pandas.assert_df_equal(msg.payload().df, train_df) diff --git a/tests/examples/digital_fingerprinting/test_dfp_training.py b/tests/examples/digital_fingerprinting/test_dfp_training.py index bd06f3ecda..60cd545eab 100644 --- a/tests/examples/digital_fingerprinting/test_dfp_training.py +++ b/tests/examples/digital_fingerprinting/test_dfp_training.py @@ -21,7 +21,6 @@ from _utils import TEST_DIRS from _utils.dataset_manager import DatasetManager from morpheus.config import Config -from morpheus.messages.multi_ae_message import MultiAEMessage from morpheus.pipeline.single_port_stage import SinglePortStage @@ -51,10 +50,11 @@ def test_on_data(mock_train_test_split: mock.MagicMock, config: Config, dataset_pandas: DatasetManager, validation_size: float): - from dfp.messages.multi_dfp_message import DFPMessageMeta - from dfp.messages.multi_dfp_message import MultiDFPMessage + from dfp.messages.dfp_message_meta import DFPMessageMeta from dfp.stages.dfp_training import DFPTraining + from morpheus.messages import ControlMessage + mock_ae.return_value = mock_ae input_file = os.path.join(TEST_DIRS.validation_data_dir, "dfp-cloudtrail-role-g-validation-data-input.csv") @@ -65,16 +65,16 @@ def test_on_data(mock_train_test_split: mock.MagicMock, mock_train_test_split.return_value = (train_df, mock_validation_df) meta = DFPMessageMeta(df, 'Account-123456789') - msg = MultiDFPMessage(meta=meta) + msg = ControlMessage() + msg.payload(meta) + msg.set_metadata("user_id", meta.user_id) stage = DFPTraining(config, validation_size=validation_size) results = stage.on_data(msg) - assert isinstance(results, MultiAEMessage) - assert results.meta is meta - assert results.mess_offset == msg.mess_offset - assert results.mess_count == msg.mess_count - assert results.model is mock_ae + assert isinstance(results, ControlMessage) + assert results.payload().count == msg.payload().count + assert results.get_metadata("model") is mock_ae # Pandas doesn't like the comparison that mock will make if we called MagicMock.assert_called_once_with(df) # Checking the call args manually @@ -99,4 +99,4 @@ def test_on_data(mock_train_test_split: mock.MagicMock, } # The stage shouldn't be modifying the dataframe - dataset_pandas.assert_compare_df(results.get_meta(), dataset_pandas[input_file]) + dataset_pandas.assert_compare_df(results.payload().df, dataset_pandas[input_file]) diff --git a/tests/examples/digital_fingerprinting/test_dfp_viz_postproc.py b/tests/examples/digital_fingerprinting/test_dfp_viz_postproc.py index b7cadaff49..571f976712 100644 --- a/tests/examples/digital_fingerprinting/test_dfp_viz_postproc.py +++ b/tests/examples/digital_fingerprinting/test_dfp_viz_postproc.py @@ -25,22 +25,23 @@ # pylint: disable=redefined-outer-name -@pytest.fixture(name="dfp_multi_message") -def dfp_multi_message_fixture(config: Config, dfp_multi_message: "MultiDFPMessage"): # noqa F821 +@pytest.fixture(name="control_message") +def control_message_fixture(config: Config, control_message: "ControlMessage"): # noqa F821 # Fill in some values for columns that the stage is looking for - with dfp_multi_message.meta.mutable_dataframe() as df: + with control_message.payload().mutable_dataframe() as df: step = (len(df) + 1) * 100 df["mean_abs_z"] = list(range(0, len(df) * step, step)) for (i, col) in enumerate(sorted(config.ae.feature_columns)): step = i + 1 * 100 df[f"{col}_z_loss"] = list(range(0, len(df) * step, step)) - yield dfp_multi_message + yield control_message @pytest.fixture(name="expected_df") -def expected_df_fixture(config: Config, dfp_multi_message: "MultiDFPMessage"): # noqa F821 - df = dfp_multi_message.meta.copy_dataframe() +def expected_df_fixture(config: Config, control_message: "ControlMessage"): # noqa F821 + df = control_message.payload().copy_dataframe() + df = df.to_pandas() expected_df = pd.DataFrame() expected_df["user"] = df[config.ae.userid_column_name] expected_df["time"] = df[config.ae.timestamp_column_name] @@ -68,14 +69,14 @@ def test_constructor(config: Config): def test_postprocess( config: Config, - dfp_multi_message: "MultiDFPMessage", # noqa: F821 + control_message: "ControlMessage", # noqa: F821 expected_df: pd.DataFrame, dataset_pandas: DatasetManager): from dfp.stages.dfp_viz_postproc import DFPVizPostprocStage # _postprocess doesn't write to disk, so the fake output_dir, shouldn't be an issue stage = DFPVizPostprocStage(config, period='min', output_dir='/fake/test/dir', output_prefix='test_prefix') - results = stage._postprocess(dfp_multi_message) + results = stage._postprocess(control_message) assert isinstance(results, pd.DataFrame) dataset_pandas.assert_compare_df(results, expected_df) @@ -84,13 +85,13 @@ def test_postprocess( def test_write_to_files( config: Config, tmp_path: str, - dfp_multi_message: "MultiDFPMessage", # noqa: F821 + control_message: "ControlMessage", # noqa: F821 expected_df: pd.DataFrame, dataset_pandas: DatasetManager): from dfp.stages.dfp_viz_postproc import DFPVizPostprocStage stage = DFPVizPostprocStage(config, period='min', output_dir=tmp_path, output_prefix='test_prefix_') - assert stage._write_to_files(dfp_multi_message) is dfp_multi_message + assert stage._write_to_files(control_message) is control_message # The times in the DF have a 30 second step, so the number of unique minutes is half the length of the DF num_expected_periods = len(expected_df) // 2 diff --git a/tests/examples/gnn_fraud_detection_pipeline/test_classification_stage.py b/tests/examples/gnn_fraud_detection_pipeline/test_classification_stage.py index e4935ae592..c597c430ca 100644 --- a/tests/examples/gnn_fraud_detection_pipeline/test_classification_stage.py +++ b/tests/examples/gnn_fraud_detection_pipeline/test_classification_stage.py @@ -19,6 +19,7 @@ from _utils.dataset_manager import DatasetManager from morpheus.config import Config +from morpheus.messages import ControlMessage from morpheus.messages import MessageMeta # pylint: disable=no-name-in-module @@ -35,7 +36,6 @@ def test_constructor(self, config: Config, xgb_model: str, cuml: types.ModuleTyp def test_process_message(self, config: Config, xgb_model: str, dataset_cudf: DatasetManager): from stages.classification_stage import ClassificationStage - from stages.graph_sage_stage import GraphSAGEMultiMessage df = dataset_cudf['examples/gnn_fraud_detection_pipeline/inductive_emb.csv'] df.rename(lambda x: f"ind_emb_{x}", axis=1, inplace=True) @@ -50,12 +50,13 @@ def test_process_message(self, config: Config, xgb_model: str, dataset_cudf: Dat ind_emb_columns = list(df.columns) meta = MessageMeta(df) - msg = GraphSAGEMultiMessage(meta=meta, - node_identifiers=node_identifiers, - inductive_embedding_column_names=ind_emb_columns) + msg = ControlMessage() + msg.payload(meta) + msg.set_metadata("node_identifiers", node_identifiers) + msg.set_metadata("inductive_embedding_column_names", ind_emb_columns) stage = ClassificationStage(config, xgb_model) results = stage._process_message(msg) # The stage actually edits the message in place, and returns it, but we don't need to assert that - dataset_cudf.assert_compare_df(results.get_meta(['prediction', 'node_id']), expected_df) + dataset_cudf.assert_compare_df(results.payload().get_data(['prediction', 'node_id']), expected_df) diff --git a/tests/examples/gnn_fraud_detection_pipeline/test_graph_construction_stage.py b/tests/examples/gnn_fraud_detection_pipeline/test_graph_construction_stage.py index e1785076eb..ee278ef549 100644 --- a/tests/examples/gnn_fraud_detection_pipeline/test_graph_construction_stage.py +++ b/tests/examples/gnn_fraud_detection_pipeline/test_graph_construction_stage.py @@ -22,8 +22,8 @@ import cudf from morpheus.config import Config +from morpheus.messages import ControlMessage from morpheus.messages import MessageMeta -from morpheus.messages import MultiMessage # pylint: disable=no-name-in-module @@ -52,19 +52,19 @@ def test_process_message(self, dgl: types.ModuleType, config: Config, test_data: # Since we used the first 5 rows as the training data, send the second 5 as inference data meta = MessageMeta(cudf.DataFrame(df).tail(5)) - multi_msg = MultiMessage(meta=meta) - fgmm = stage._process_message(multi_msg) + control_msg = ControlMessage() + control_msg.payload(meta) - assert isinstance(fgmm, graph_construction_stage.FraudGraphMultiMessage) - assert fgmm.meta is meta - assert fgmm.mess_offset == 0 - assert fgmm.mess_count == 5 + fgmm = stage._process_message(control_msg) - assert isinstance(fgmm.graph, dgl.DGLGraph) + assert isinstance(fgmm, ControlMessage) + assert fgmm.payload().count == 5 + + assert isinstance(fgmm.get_metadata("graph"), dgl.DGLGraph) # Since the graph has a reverse edge for each edge, one edge comparison is enough. - buy_edges = fgmm.graph.edges(etype='buy') - sell_edges = fgmm.graph.edges(etype='sell') + buy_edges = fgmm.get_metadata("graph").edges(etype='buy') + sell_edges = fgmm.get_metadata("graph").edges(etype='sell') # expected edges, convert [(u,v)] format to [u, v] of DGL edge format. exp_buy_edges = [torch.LongTensor(e).cuda() for e in zip(*expected_edges['buy'])] @@ -76,4 +76,4 @@ def test_process_message(self, dgl: types.ModuleType, config: Config, test_data: # Compare nodes. for node in ['client', 'merchant']: - assert fgmm.graph.nodes(node).tolist() == list(expected_nodes[node + "_node"]) + assert fgmm.get_metadata("graph").nodes(node).tolist() == list(expected_nodes[node + "_node"]) diff --git a/tests/examples/gnn_fraud_detection_pipeline/test_graph_sage_stage.py b/tests/examples/gnn_fraud_detection_pipeline/test_graph_sage_stage.py index 886c339962..f272098a7d 100644 --- a/tests/examples/gnn_fraud_detection_pipeline/test_graph_sage_stage.py +++ b/tests/examples/gnn_fraud_detection_pipeline/test_graph_sage_stage.py @@ -19,8 +19,8 @@ from _utils.dataset_manager import DatasetManager from morpheus.config import Config +from morpheus.messages import ControlMessage from morpheus.messages import MessageMeta -from morpheus.messages import MultiMessage # pylint: disable=no-name-in-module @@ -45,27 +45,28 @@ def test_process_message(self, test_data: dict, dataset_pandas: DatasetManager): from stages.graph_construction_stage import FraudGraphConstructionStage - from stages.graph_sage_stage import GraphSAGEMultiMessage from stages.graph_sage_stage import GraphSAGEStage expected_df = dataset_pandas['examples/gnn_fraud_detection_pipeline/inductive_emb.csv'] df = test_data['df'] meta = MessageMeta(cudf.DataFrame(df)) - multi_msg = MultiMessage(meta=meta) + control_msg = ControlMessage() + control_msg.payload(meta) + construction_stage = FraudGraphConstructionStage(config, training_file) - fgmm_msg = construction_stage._process_message(multi_msg) + fgmm_msg = construction_stage._process_message(control_msg) stage = GraphSAGEStage(config, model_dir=model_dir) results = stage._process_message(fgmm_msg) - assert isinstance(results, GraphSAGEMultiMessage) - assert results.meta is meta - assert results.mess_offset == 0 - assert results.mess_count == len(df) - assert results.node_identifiers == test_data['index'] + assert isinstance(results, ControlMessage) + assert results.payload().count == len(df) + assert results.get_metadata("node_identifiers") == test_data['index'] - cols = results.inductive_embedding_column_names + ['index'] + cols = results.get_metadata("inductive_embedding_column_names") + ['index'] assert sorted(cols) == sorted(expected_df.columns) - ind_emb_df = results.get_meta(cols) + ind_emb_df = results.payload().get_data(cols) + print("ind_emb_df", ind_emb_df) + print("expected_df", expected_df) dataset_pandas.assert_compare_df(ind_emb_df.to_pandas(), expected_df, abs_tol=1, rel_tol=1) diff --git a/tests/pipeline/test_pipeline.py b/tests/pipeline/test_pipeline.py index bf666fa406..aded507af6 100755 --- a/tests/pipeline/test_pipeline.py +++ b/tests/pipeline/test_pipeline.py @@ -19,7 +19,7 @@ import pytest -from _utils import assert_results +from _utils.stages.control_message_pass_thru import ControlMessagePassThruStage from _utils.stages.conv_msg import ConvMsg from _utils.stages.in_memory_multi_source_stage import InMemoryMultiSourceStage from _utils.stages.in_memory_source_x_stage import InMemSourceXStage @@ -29,14 +29,12 @@ from morpheus.messages import MessageMeta from morpheus.pipeline import LinearPipeline from morpheus.pipeline import Pipeline +from morpheus.pipeline.stage_decorator import source +from morpheus.pipeline.stage_decorator import stage from morpheus.stages.boundary.linear_boundary_stage import LinearBoundaryEgressStage from morpheus.stages.boundary.linear_boundary_stage import LinearBoundaryIngressStage from morpheus.stages.input.in_memory_source_stage import InMemorySourceStage -from morpheus.stages.output.compare_dataframe_stage import CompareDataFrameStage from morpheus.stages.output.in_memory_sink_stage import InMemorySinkStage -from morpheus.stages.postprocess.add_scores_stage import AddScoresStage -from morpheus.stages.postprocess.serialize_stage import SerializeStage -from morpheus.stages.preprocess.deserialize_stage import DeserializeStage from morpheus.utils.type_aliases import DataFrameType @@ -168,29 +166,31 @@ def update_state_dict(key: str): @pytest.mark.use_cudf -def test_pipeline_narrowing_types(config: Config, filter_probs_df: DataFrameType): +def test_pipeline_narrowing_types(config: Config): """ Test to ensure that we aren't narrowing the types of messages in the pipeline. - - In this case, `ConvMsg` emits `MultiResponseMessage` messages which are a subclass of `MultiMessage`, - which is the accepted type for `MultiMessagePassThruStage`. We want to ensure that the type is retained allowing us - to place a stage after `MultiMessagePassThruStage` requring `MultiResponseMessage` like `AddScoresStage`. + In this case, `derived_control_message_source` emits `DerivedControlMessage` messages which are a (dummy) + subclass of `ControlMessage`, which is the accepted type for `ControlMessagePassThruStage`. + We want to ensure that the type is retained allowing us to place a stage after `ControlMessagePassThruStage` + requring `DerivedControlMessage`. """ - config.class_labels = ['frogs', 'lizards', 'toads', 'turtles'] - expected_df = filter_probs_df.to_pandas() - expected_df = expected_df.rename(columns=dict(zip(expected_df.columns, config.class_labels))) - pipe = LinearPipeline(config) - pipe.set_source(InMemorySourceStage(config, [filter_probs_df])) - pipe.add_stage(DeserializeStage(config, ensure_sliceable_index=True)) - pipe.add_stage(ConvMsg(config)) - # pipe.add_stage(MultiMessagePassThruStage(config)) - pipe.add_stage(AddScoresStage(config)) - pipe.add_stage(SerializeStage(config, include=[f"^{c}$" for c in config.class_labels])) - compare_stage = pipe.add_stage(CompareDataFrameStage(config, compare_df=expected_df)) - pipe.run() - assert_results(compare_stage.get_results()) + class DerivedControlMessage(ControlMessage): + pass + + @source + def derived_control_message_source() -> DerivedControlMessage: + yield DerivedControlMessage() + + @stage + def derived_control_message_sink(msg: DerivedControlMessage) -> DerivedControlMessage: + return msg + + pipe.set_source(derived_control_message_source(config)) # pylint: disable=E1121 + pipe.add_stage(ControlMessagePassThruStage(config)) + pipe.add_stage(derived_control_message_sink(config)) + pipe.run() @pytest.mark.parametrize("num_outputs", [0, 2, 3]) diff --git a/tests/pipeline/test_stage_schema.py b/tests/pipeline/test_stage_schema.py index 2bc187f367..edd6750826 100644 --- a/tests/pipeline/test_stage_schema.py +++ b/tests/pipeline/test_stage_schema.py @@ -26,7 +26,7 @@ # Fixtures cannot be used directly as paramertize values, but we can fetch them by name @pytest.mark.parametrize("stage_fixture_name,num_inputs,num_outputs", [("in_mem_source_stage", 0, 1), ("in_mem_multi_source_stage", 0, 3), ("stage", 1, 1), - ("split_stage", 1, 2), ("multi_pass_thru_stage", 3, 3)]) + ("split_stage", 1, 2)]) def test_constructor(request: pytest.FixtureRequest, stage_fixture_name: str, num_inputs: int, num_outputs: int): stage = request.getfixturevalue(stage_fixture_name) schema = StageSchema(stage) @@ -72,7 +72,7 @@ def test_multi_port_output_schemas(split_stage: SplitStage): assert port_schema.get_type() is MessageMeta -@pytest.mark.parametrize("stage_fixture_name", ["split_stage", "multi_pass_thru_stage"]) +@pytest.mark.parametrize("stage_fixture_name", ["split_stage"]) def test_output_schema_multi_error(request: pytest.FixtureRequest, stage_fixture_name: str): """ Test confirms that the output_schema property raises an error when there are multiple output schemas diff --git a/tests/stages/test_generate_viz_frames_stage.py b/tests/stages/test_generate_viz_frames_stage.py index 30ddece85d..879220d204 100644 --- a/tests/stages/test_generate_viz_frames_stage.py +++ b/tests/stages/test_generate_viz_frames_stage.py @@ -45,7 +45,7 @@ def test_constructor(config: Config): assert typing_utils.issubtype(ControlMessage, accepted_union) -def test_process_control_message_and_multi_message(config: Config): +def test_process_control_message(config: Config): stage = GenerateVizFramesStage(config) df = cudf.DataFrame({ diff --git a/tests/test_messages.py b/tests/test_messages.py index 7370cd5268..6c376f7e54 100644 --- a/tests/test_messages.py +++ b/tests/test_messages.py @@ -52,12 +52,6 @@ def check_all_messages(should_be_cpp: bool, no_cpp_class: bool): # always received the python impl check_message(messages.UserMessageMeta, None, should_be_cpp, no_cpp_class, *(None, None)) - check_message(messages.MultiMessage, - _messages.MultiMessage, - should_be_cpp, - no_cpp_class, - **{"meta": messages.MessageMeta(df)}) - check_message(tensor_memory.TensorMemory, _messages.TensorMemory, should_be_cpp, no_cpp_class, **{"count": 1}) check_message(messages.InferenceMemory, _messages.InferenceMemory, should_be_cpp, no_cpp_class, **{"count": 1}) diff --git a/tests/test_multi_message.py b/tests/test_multi_message.py deleted file mode 100644 index 4c6fd5e70d..0000000000 --- a/tests/test_multi_message.py +++ /dev/null @@ -1,802 +0,0 @@ -#!/usr/bin/env python -# SPDX-FileCopyrightText: Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# pylint: disable=redefined-outer-name - -import dataclasses -import string -import typing - -import cupy as cp -import numpy as np -import pandas as pd -import pytest - -import cudf - -from _utils.dataset_manager import DatasetManager -from morpheus.messages.memory.inference_memory import InferenceMemory -from morpheus.messages.memory.response_memory import ResponseMemory -from morpheus.messages.memory.response_memory import ResponseMemoryProbs -from morpheus.messages.memory.tensor_memory import TensorMemory -from morpheus.messages.message_meta import MessageMeta -from morpheus.messages.multi_ae_message import MultiAEMessage -from morpheus.messages.multi_inference_ae_message import MultiInferenceAEMessage -from morpheus.messages.multi_inference_message import MultiInferenceFILMessage -from morpheus.messages.multi_inference_message import MultiInferenceMessage -from morpheus.messages.multi_inference_message import MultiInferenceNLPMessage -from morpheus.messages.multi_message import MultiMessage -from morpheus.messages.multi_response_message import MultiResponseMessage -from morpheus.messages.multi_response_message import MultiResponseProbsMessage -from morpheus.messages.multi_tensor_message import MultiTensorMessage - - -@pytest.mark.use_python -def test_missing_explicit_init(): - - with pytest.raises(ValueError, match="improperly configured"): - - @dataclasses.dataclass - class BadMultiMessage(MultiMessage): - - value: float - - BadMultiMessage(meta=None, value=5) - - -def test_constructor_empty(filter_probs_df: cudf.DataFrame): - - meta = MessageMeta(filter_probs_df) - - multi = MultiMessage(meta=meta) - - assert multi.meta is meta - assert multi.mess_offset == 0 - assert multi.mess_count == meta.count - - -def test_constructor_values(filter_probs_df: cudf.DataFrame): - - meta = MessageMeta(filter_probs_df) - - # No count - multi = MultiMessage(meta=meta, mess_offset=2) - assert multi.meta is meta - assert multi.mess_offset == 2 - assert multi.mess_count == meta.count - multi.mess_offset - - # No offset - multi = MultiMessage(meta=meta, mess_count=9) - assert multi.meta is meta - assert multi.mess_offset == 0 - assert multi.mess_count == 9 - - # Both - multi = MultiMessage(meta=meta, mess_offset=4, mess_count=5) - assert multi.meta is meta - assert multi.mess_offset == 4 - assert multi.mess_count == 5 - - -def test_constructor_invalid(filter_probs_df: cudf.DataFrame): - - meta = MessageMeta(filter_probs_df) - - # Negative offset - with pytest.raises(ValueError): - MultiMessage(meta=meta, mess_offset=-1, mess_count=5) - - # Offset beyond start - with pytest.raises(ValueError): - MultiMessage(meta=meta, mess_offset=meta.count, mess_count=5) - - # Too large of count - with pytest.raises(ValueError): - MultiMessage(meta=meta, mess_offset=0, mess_count=meta.count + 1) - - # Count extends beyond end of dataframe - with pytest.raises(ValueError): - MultiMessage(meta=meta, mess_offset=5, mess_count=(meta.count - 5) + 1) - - -def _test_get_meta(df: typing.Union[cudf.DataFrame, pd.DataFrame]): - meta = MessageMeta(df) - - multi = MultiMessage(meta=meta, mess_offset=3, mess_count=5) - - # Manually slice the dataframe according to the multi settings - df_sliced: cudf.DataFrame = df.iloc[multi.mess_offset:multi.mess_offset + multi.mess_count, :] - - DatasetManager.assert_df_equal(multi.get_meta(), df_sliced) - - # Make sure we return a table here, not a series - col_name = df_sliced.columns[0] - DatasetManager.assert_df_equal(multi.get_meta(col_name), df_sliced[col_name]) - - col_name = [df_sliced.columns[0], df_sliced.columns[2]] - DatasetManager.assert_df_equal(multi.get_meta(col_name), df_sliced[col_name]) - - # Out of order columns - col_name = [df_sliced.columns[3], df_sliced.columns[0]] - DatasetManager.assert_df_equal(multi.get_meta(col_name), df_sliced[col_name]) - - # Should fail with missing column - with pytest.raises(KeyError): - multi.get_meta("column_that_does_not_exist") - - # Finally, check that we dont overwrite the original dataframe - multi.get_meta(col_name).iloc[:] = 5 - DatasetManager.assert_df_equal(multi.get_meta(col_name), df_sliced[col_name]) - - -def test_get_meta(filter_probs_df: typing.Union[cudf.DataFrame, pd.DataFrame]): - _test_get_meta(filter_probs_df) - - -# Ignore unused arguments warnigns due to using the `use_cpp` fixture -# pylint:disable=unused-argument - - -@pytest.mark.usefixtures("use_cpp") -def test_get_meta_dup_index(dataset: DatasetManager): - - # Duplicate some indices before creating the meta - df = dataset.replace_index(dataset["filter_probs.csv"], replace_ids={3: 1, 5: 4}) - - # Now just run the other test to reuse code - _test_get_meta(df) - - -@pytest.mark.usefixtures("use_cpp") -def test_set_meta(dataset: DatasetManager): - df_saved = dataset.pandas["filter_probs.csv"] - - meta = MessageMeta(dataset["filter_probs.csv"]) - - multi = MultiMessage(meta=meta, mess_offset=3, mess_count=5) - - saved_mask = np.ones(len(df_saved), bool) - saved_mask[multi.mess_offset:multi.mess_offset + multi.mess_count] = False - - def test_value(columns, value): - multi.set_meta(columns, value) - dataset.assert_df_equal(multi.get_meta(columns), value) - - # Now make sure the original dataframe is untouched - dataset.assert_df_equal(df_saved[saved_mask], meta.df[saved_mask]) - - single_column = "v2" - two_columns = ["v1", "v3"] - multi_columns = ["v4", "v2", "v3"] # out of order as well - - # Setting an integer - test_value(None, 0) - test_value(single_column, 1) - test_value(two_columns, 2) - test_value(multi_columns, 3) - - # Setting a list (Single column only) - test_value(single_column, list(range(0, 0 + multi.mess_count))) - - # Setting numpy arrays (single column) - test_value(None, np.random.randn(multi.mess_count, 1)) - test_value(single_column, np.random.randn(multi.mess_count)) # Must be single dimension - test_value(two_columns, np.random.randn(multi.mess_count, 1)) - test_value(multi_columns, np.random.randn(multi.mess_count, 1)) - - # Setting numpy arrays (multi column) - test_value(None, np.random.randn(multi.mess_count, len(dataset["filter_probs.csv"].columns))) - test_value(two_columns, np.random.randn(multi.mess_count, len(two_columns))) - test_value(multi_columns, np.random.randn(multi.mess_count, len(multi_columns))) - - -def _test_set_meta_new_column(df: typing.Union[cudf.DataFrame, pd.DataFrame], df_type: typing.Literal['cudf', - 'pandas']): - - meta = MessageMeta(df) - - multi = MultiMessage(meta=meta, mess_offset=3, mess_count=5) - - # Set a list - val_to_set = list(range(multi.mess_count)) - multi.set_meta("list_column", val_to_set) - DatasetManager.assert_df_equal(multi.get_meta("list_column"), val_to_set) - - # Set a string - val_to_set = "string to set" - multi.set_meta("string_column", val_to_set) - DatasetManager.assert_df_equal(multi.get_meta("string_column"), val_to_set) - - # Set a date - val_to_set = pd.date_range("2018-01-01", periods=multi.mess_count, freq="H") - multi.set_meta("date_column", val_to_set) - DatasetManager.assert_df_equal(multi.get_meta("date_column"), val_to_set) - - if (df_type == "cudf"): - # cudf isnt capable of setting more than one new column at a time - return - - # Now set one with new and old columns - val_to_set = np.random.randn(multi.mess_count, 2) - multi.set_meta(["v2", "new_column2"], val_to_set) - DatasetManager.assert_df_equal(multi.get_meta(["v2", "new_column2"]), val_to_set) - - -@pytest.mark.usefixtures("use_cpp") -def test_set_meta_new_column(dataset: DatasetManager): - _test_set_meta_new_column(dataset["filter_probs.csv"], dataset.default_df_type) - - -@pytest.mark.usefixtures("use_cpp") -def test_set_meta_new_column_dup_index(dataset: DatasetManager): - # Duplicate some indices before creating the meta - df = dataset.replace_index(dataset["filter_probs.csv"], replace_ids={3: 4, 5: 4}) - - _test_set_meta_new_column(df, dataset.default_df_type) - - -@pytest.mark.use_cudf -@pytest.mark.parametrize('use_series', [True, False]) -def test_set_meta_issue_286(filter_probs_df: cudf.DataFrame, use_series: bool): - """ - Explicitly calling set_meta on two different non-overlapping slices. - """ - - meta = MessageMeta(filter_probs_df) - mm1 = MultiMessage(meta=meta, mess_offset=0, mess_count=5) - mm2 = MultiMessage(meta=meta, mess_offset=5, mess_count=5) - - values = list(string.ascii_letters) - if use_series: - values = cudf.Series(values) - - mm1.set_meta('letters', values[0:5]) - mm2.set_meta('letters', values[5:10]) - - -def _test_copy_ranges(df: typing.Union[cudf.DataFrame, pd.DataFrame]): - meta = MessageMeta(df) - - mm1 = MultiMessage(meta=meta) - - mm2 = mm1.copy_ranges([(2, 6)]) - assert len(mm2.meta.df) == 4 - assert mm2.meta.count == 4 - assert len(mm2.get_meta()) == 4 - assert mm2.meta is not meta - assert mm2.meta.df is not df - assert mm2.mess_offset == 0 - assert mm2.mess_count == 6 - 2 - DatasetManager.assert_df_equal(mm2.get_meta(), df.iloc[2:6]) - - # slice two different ranges of rows - mm3 = mm1.copy_ranges([(2, 6), (12, 15)]) - assert len(mm3.meta.df) == 7 - assert mm3.meta.count == 7 - assert len(mm3.get_meta()) == 7 - assert mm3.meta is not meta - assert mm3.meta is not mm2.meta - assert mm3.meta.df is not df - assert mm3.meta.df is not mm2.meta.df - assert mm3.mess_offset == 0 - assert mm3.mess_count == (6 - 2) + (15 - 12) - - if isinstance(df, pd.DataFrame): - concat_fn = pd.concat - else: - concat_fn = cudf.concat - - expected_df = concat_fn([df.iloc[2:6], df.iloc[12:15]]) - - DatasetManager.assert_df_equal(mm3.get_meta(), expected_df) - - -def test_copy_ranges(filter_probs_df: typing.Union[cudf.DataFrame, pd.DataFrame]): - _test_copy_ranges(filter_probs_df) - - -@pytest.mark.usefixtures("use_cpp") -def test_copy_ranges_dup_index(dataset: DatasetManager): - - # Duplicate some indices before creating the meta - df = dataset.dup_index(dataset["filter_probs.csv"], count=4) - - # Now just run the other test to reuse code - _test_copy_ranges(df) - - -def test_get_slice_ranges(filter_probs_df: cudf.DataFrame): - - meta = MessageMeta(filter_probs_df) - - multi_full = MultiMessage(meta=meta) - - # Get the whole thing - slice1 = multi_full.get_slice(multi_full.mess_offset, multi_full.mess_count) - assert slice1.meta is meta - assert slice1.mess_offset == slice1.mess_offset - assert slice1.mess_count == slice1.mess_count - - # Smaller slice - slice2 = multi_full.get_slice(2, 18) - assert slice2.mess_offset == 2 - assert slice2.mess_count == 18 - 2 - - # Chained slice - slice4 = multi_full.get_slice(3, 19).get_slice(1, 10) - assert slice4.mess_offset == 3 + 1 - assert slice4.mess_count == 10 - 1 - - # Negative start - with pytest.raises(IndexError): - multi_full.get_slice(-1, multi_full.mess_count) - - # Past the end - with pytest.raises(IndexError): - multi_full.get_slice(0, multi_full.mess_count + 1) - - # Stop before start - with pytest.raises(IndexError): - multi_full.get_slice(5, 4) - - # Empty slice - with pytest.raises(IndexError): - multi_full.get_slice(5, 5) - - # Offset + Count past end - with pytest.raises(IndexError): - multi_full.get_slice(13, 13 + (multi_full.mess_count - 13) + 1) - - # Invalid chain, stop past end - with pytest.raises(IndexError): - multi_full.get_slice(13, 16).get_slice(1, 5) - - # Invalid chain, start past end - with pytest.raises(IndexError): - multi_full.get_slice(13, 16).get_slice(4, 5) - - -def _test_get_slice_values(df: typing.Union[cudf.DataFrame, pd.DataFrame]): - - meta = MessageMeta(df) - - multi_full = MultiMessage(meta=meta) - - # Single slice - DatasetManager.assert_df_equal(multi_full.get_slice(3, 8).get_meta(), df.iloc[3:8]) - - # Single slice with one columns - DatasetManager.assert_df_equal(multi_full.get_slice(3, 8).get_meta("v1"), df.iloc[3:8]["v1"]) - - # Single slice with multiple columns - DatasetManager.assert_df_equal( - multi_full.get_slice(3, 8).get_meta(["v4", "v3", "v1"]), df.iloc[3:8][["v4", "v3", "v1"]]) - - # Chained slice - DatasetManager.assert_df_equal( - multi_full.get_slice(2, 18).get_slice(5, 9).get_meta(), df.iloc[2 + 5:(2 + 5) + (9 - 5)]) - - # Chained slice one column - DatasetManager.assert_df_equal( - multi_full.get_slice(2, 18).get_slice(5, 9).get_meta("v1"), df.iloc[2 + 5:(2 + 5) + (9 - 5)]["v1"]) - - # Chained slice multi column - DatasetManager.assert_df_equal( - multi_full.get_slice(2, 18).get_slice(5, 9).get_meta(["v4", "v3", "v1"]), - df.iloc[2 + 5:(2 + 5) + (9 - 5)][["v4", "v3", "v1"]]) - - # Set values - multi_full.get_slice(4, 10).set_meta(None, 1.15) - DatasetManager.assert_df_equal(multi_full.get_slice(4, 10).get_meta(), df.iloc[4:10]) - - # Set values one column - multi_full.get_slice(1, 6).set_meta("v3", 5.3) - DatasetManager.assert_df_equal(multi_full.get_slice(1, 6).get_meta("v3"), df.iloc[1:6]["v3"]) - - # Set values multi column - multi_full.get_slice(5, 8).set_meta(["v4", "v1", "v3"], 7) - DatasetManager.assert_df_equal( - multi_full.get_slice(5, 8).get_meta(["v4", "v1", "v3"]), df.iloc[5:8][["v4", "v1", "v3"]]) - - # Chained Set values - multi_full.get_slice(10, 20).get_slice(1, 4).set_meta(None, 8) - DatasetManager.assert_df_equal( - multi_full.get_slice(10, 20).get_slice(1, 4).get_meta(), df.iloc[10 + 1:(10 + 1) + (4 - 1)]) - - # Chained Set values one column - multi_full.get_slice(10, 20).get_slice(3, 5).set_meta("v4", 112) - DatasetManager.assert_df_equal( - multi_full.get_slice(10, 20).get_slice(3, 5).get_meta("v4"), df.iloc[10 + 3:(10 + 3) + (5 - 3)]["v4"]) - - # Chained Set values multi column - multi_full.get_slice(10, 20).get_slice(5, 8).set_meta(["v4", "v1", "v2"], 22) - DatasetManager.assert_df_equal( - multi_full.get_slice(10, 20).get_slice(5, 8).get_meta(["v4", "v1", "v2"]), - df.iloc[10 + 5:(10 + 5) + (8 - 5)][["v4", "v1", "v2"]]) - - -def test_get_slice_values(filter_probs_df: cudf.DataFrame): - _test_get_slice_values(filter_probs_df) - - -@pytest.mark.usefixtures("use_cpp") -def test_get_slice_values_dup_index(dataset: DatasetManager): - - # Duplicate some indices before creating the meta - df = dataset.dup_index(dataset["filter_probs.csv"], count=4) - - # Now just run the other test to reuse code - _test_get_slice_values(df) - - -def test_get_slice_derived(filter_probs_df: cudf.DataFrame): - - multi_tensor_message_tensors = { - "input_ids": cp.zeros((20, 2)), - "input_mask": cp.zeros((20, 2)), - "seq_ids": cp.expand_dims(cp.arange(0, 20, dtype=int), axis=1), - "input__0": cp.zeros((20, 2)), - "probs": cp.zeros((20, 2)), - } - - def compare_slice(message_class, **kwargs): - multi = message_class(**kwargs) - assert isinstance(multi.get_slice(0, 20), message_class) - - meta = MessageMeta(filter_probs_df) - - # Base MultiMessages - compare_slice(MultiMessage, meta=meta) - compare_slice(MultiAEMessage, meta=meta, model=None, train_scores_mean=0.0, train_scores_std=1.0) - - # Tensor messages - compare_slice(MultiTensorMessage, meta=meta, memory=TensorMemory(count=20, tensors=multi_tensor_message_tensors)) - - # Inference messages - compare_slice(MultiInferenceMessage, - meta=meta, - memory=InferenceMemory(count=20, tensors=multi_tensor_message_tensors)) - compare_slice(MultiInferenceNLPMessage, - meta=meta, - memory=InferenceMemory(count=20, tensors=multi_tensor_message_tensors)) - compare_slice(MultiInferenceFILMessage, - meta=meta, - memory=InferenceMemory(count=20, tensors=multi_tensor_message_tensors)) - compare_slice(MultiInferenceAEMessage, - meta=meta, - memory=InferenceMemory(count=20, tensors=multi_tensor_message_tensors)) - - # Response messages - compare_slice(MultiResponseMessage, - meta=meta, - memory=ResponseMemory(count=20, tensors=multi_tensor_message_tensors)) - compare_slice(MultiResponseProbsMessage, - meta=meta, - memory=ResponseMemoryProbs(count=20, probs=multi_tensor_message_tensors["probs"])) - - -def test_from_message(filter_probs_df: cudf.DataFrame): - - # Pylint currently fails to work with classmethod: https://github.com/pylint-dev/pylint/issues/981 - # pylint: disable=no-member - - meta = MessageMeta(filter_probs_df) - - multi = MultiMessage(meta=meta, mess_offset=3, mess_count=10) - - # Once for the base multi-message class - multi2 = MultiMessage.from_message(multi) - assert multi2.meta is multi.meta - assert multi2.mess_offset == multi.mess_offset - assert multi2.mess_count == multi.mess_count - - multi2 = MultiMessage.from_message(multi, mess_offset=5) - assert multi2.meta is multi.meta - assert multi2.mess_offset == 5 - assert multi2.mess_count == multi.mess_count - - multi2 = MultiMessage.from_message(multi, mess_count=7) - assert multi2.meta is multi.meta - assert multi2.mess_offset == multi.mess_offset - assert multi2.mess_count == 7 - - multi2 = MultiMessage.from_message(multi, mess_offset=6, mess_count=9) - assert multi2.meta is multi.meta - assert multi2.mess_offset == 6 - assert multi2.mess_count == 9 - - meta2 = MessageMeta(filter_probs_df[7:14]) - multi2 = MultiMessage.from_message(multi, meta=meta2) - assert multi2.meta is meta2 - assert multi2.mess_offset == 0 - assert multi2.mess_count == meta2.count - - multi2 = MultiMessage.from_message(multi, meta=meta2, mess_offset=4) - assert multi2.meta is meta2 - assert multi2.mess_offset == 4 - assert multi2.mess_count == meta2.count - 4 - - multi2 = MultiMessage.from_message(multi, meta=meta2, mess_count=4) - assert multi2.meta is meta2 - assert multi2.mess_offset == 0 - assert multi2.mess_count == 4 - - # Repeat for tensor memory - memory = TensorMemory(count=20) - multi_tensor = MultiTensorMessage(meta=meta, mess_offset=3, mess_count=10, memory=memory, offset=5, count=10) - - # Create from a base class - multi3: MultiTensorMessage = MultiTensorMessage.from_message(multi, memory=memory) - assert multi3.memory is memory - assert multi3.offset == 0 - assert multi3.count == memory.count - - # Create from existing instance - multi3 = MultiTensorMessage.from_message(multi_tensor) - assert multi3.memory is memory - assert multi3.offset == multi_tensor.offset - assert multi3.count == multi_tensor.count - - multi3 = MultiTensorMessage.from_message(multi_tensor, offset=5) - assert multi3.memory is memory - assert multi3.offset == 5 - assert multi3.count == multi_tensor.count - - multi3 = MultiTensorMessage.from_message(multi_tensor, count=12) - assert multi3.memory is memory - assert multi3.offset == multi_tensor.offset - assert multi3.count == 12 - - multi3 = MultiTensorMessage.from_message(multi_tensor, offset=7, count=11) - assert multi3.memory is memory - assert multi3.offset == 7 - assert multi3.count == 11 - - memory3 = TensorMemory(count=20) - multi3 = MultiTensorMessage.from_message(multi_tensor, memory=memory3) - assert multi3.memory is memory3 - assert multi3.offset == 0 - assert multi3.count == memory3.count - - multi3 = MultiTensorMessage.from_message(multi_tensor, memory=memory3, offset=2) - assert multi3.memory is memory3 - assert multi3.offset == 2 - assert multi3.count == memory3.count - 2 - - multi3 = MultiTensorMessage.from_message(multi_tensor, memory=memory3, count=14) - assert multi3.memory is memory3 - assert multi3.offset == 0 - assert multi3.count == 14 - - multi3 = MultiTensorMessage.from_message(multi_tensor, memory=memory3, offset=4, count=13) - assert multi3.memory is memory3 - assert multi3.offset == 4 - assert multi3.count == 13 - - # Test missing memory - with pytest.raises(AttributeError): - MultiTensorMessage.from_message(multi) - - # Finally, test a class with extra arguments - multi4 = MultiAEMessage.from_message(multi, model=None, train_scores_mean=0.0, train_scores_std=1.0) - assert multi4.meta is meta - assert multi4.mess_offset == multi.mess_offset - assert multi4.mess_count == multi.mess_count - - multi5 = MultiAEMessage.from_message(multi4) - assert multi5.model is multi4.model - assert multi5.train_scores_mean == multi4.train_scores_mean - assert multi5.train_scores_std == multi4.train_scores_std - - multi5 = MultiAEMessage.from_message(multi4, train_scores_mean=7.0) - assert multi5.model is multi4.model - assert multi5.train_scores_mean == 7.0 - assert multi5.train_scores_std == multi4.train_scores_std - - # Test missing other options - with pytest.raises(AttributeError): - MultiAEMessage.from_message(multi) - - -def test_tensor_constructor(filter_probs_df: cudf.DataFrame): - - mess_len = len(filter_probs_df) - ten_len = mess_len * 2 - - meta = MessageMeta(filter_probs_df) - - memory = TensorMemory(count=ten_len) - - # Default constructor - multi_tensor = MultiTensorMessage(meta=meta, memory=memory) - assert multi_tensor.meta is meta - assert multi_tensor.mess_offset == 0 - assert multi_tensor.mess_count == meta.count - assert multi_tensor.memory is memory - assert multi_tensor.offset == 0 - assert multi_tensor.count == memory.count - - # All constructor values - multi_tensor = MultiTensorMessage(meta=meta, mess_offset=3, mess_count=5, memory=memory, offset=5, count=10) - assert multi_tensor.meta is meta - assert multi_tensor.mess_offset == 3 - assert multi_tensor.mess_count == 5 - assert multi_tensor.memory is memory - assert multi_tensor.offset == 5 - assert multi_tensor.count == 10 - - # Larger tensor count - multi_tensor = MultiTensorMessage(meta=meta, memory=TensorMemory(count=21)) - assert multi_tensor.meta is meta - assert multi_tensor.mess_offset == 0 - assert multi_tensor.mess_count == meta.count - assert multi_tensor.offset == 0 - assert multi_tensor.count == multi_tensor.memory.count - - # Negative offset - with pytest.raises(ValueError): - MultiTensorMessage(meta=meta, memory=memory, offset=-1) - - # Offset beyond start - with pytest.raises(ValueError): - MultiTensorMessage(meta=meta, memory=memory, offset=memory.count, count=25) - - # Too large of count - with pytest.raises(ValueError): - MultiTensorMessage(meta=meta, memory=memory, offset=0, count=memory.count + 1) - - # Count extends beyond end of memory - with pytest.raises(ValueError): - MultiTensorMessage(meta=meta, memory=memory, offset=5, count=(memory.count - 5) + 1) - - # Count smaller than mess_count - with pytest.raises(ValueError): - MultiTensorMessage(meta=meta, mess_count=10, memory=memory, count=9) - - # === ID Tensors === - id_tensor = cp.expand_dims(cp.arange(0, mess_len, dtype=int), axis=1) - - # With valid ID tensor - multi_tensor = MultiTensorMessage(meta=meta, memory=TensorMemory(count=mess_len, tensors={"seq_ids": id_tensor})) - assert cp.all(multi_tensor.get_id_tensor() == id_tensor) - - # With different ID name - multi_tensor = MultiTensorMessage(meta=meta, - memory=TensorMemory(count=mess_len, tensors={"other_seq_ids": id_tensor}), - id_tensor_name="other_seq_ids") - assert cp.all(multi_tensor.get_id_tensor() == id_tensor) - - # With message offset - multi_tensor = MultiTensorMessage(meta=meta, - mess_offset=4, - memory=TensorMemory(count=mess_len, tensors={"seq_ids": id_tensor}), - offset=4) - assert cp.all(multi_tensor.get_id_tensor() == id_tensor[4:]) - - # Incorrect start ID - invalid_id_tensor = cp.copy(id_tensor) - invalid_id_tensor[0] = -1 - with pytest.raises(RuntimeError): - multi_tensor = MultiTensorMessage(meta=meta, - memory=TensorMemory(count=mess_len, tensors={"seq_ids": invalid_id_tensor})) - - # Incorrect end ID - invalid_id_tensor = cp.copy(id_tensor) - invalid_id_tensor[-1] = invalid_id_tensor[-1] + 1 - with pytest.raises(RuntimeError): - multi_tensor = MultiTensorMessage(meta=meta, - memory=TensorMemory(count=mess_len, tensors={"seq_ids": invalid_id_tensor})) - - # Incorrect end ID, different id tensor name - invalid_id_tensor = cp.copy(id_tensor) - invalid_id_tensor[-1] = invalid_id_tensor[-1] + 1 - with pytest.raises(RuntimeError): - multi_tensor = MultiTensorMessage(meta=meta, - id_tensor_name="id_tensor", - memory=TensorMemory(count=mess_len, tensors={"id_tensor": invalid_id_tensor})) - - # Doesnt check with invalid due to different name - multi_tensor = MultiTensorMessage(meta=meta, - memory=TensorMemory(count=mess_len, tensors={"id_tensor": invalid_id_tensor})) - - -@pytest.mark.usefixtures("use_cpp") -def test_tensor_slicing(dataset: DatasetManager): - - # Pylint currently fails to work with classmethod: https://github.com/pylint-dev/pylint/issues/981 - # pylint: disable=no-member - - filter_probs_df = dataset["filter_probs.csv"] - mess_len = len(filter_probs_df) - - repeat_counts = [1] * mess_len - repeat_counts[1] = 2 - repeat_counts[4] = 5 - repeat_counts[5] = 3 - repeat_counts[7] = 6 - tensor_count = sum(repeat_counts) - - probs = cp.random.rand(tensor_count, 2) - seq_ids = cp.zeros((tensor_count, 3), dtype=cp.int32) - - for i, repeat_count in enumerate(repeat_counts): - seq_ids[sum(repeat_counts[:i]):sum(repeat_counts[:i]) + repeat_count] = cp.ones((repeat_count, 3), int) * i - - # First with no offsets - memory = InferenceMemory(count=tensor_count, tensors={"seq_ids": seq_ids, "probs": probs}) - multi = MultiInferenceMessage(meta=MessageMeta(filter_probs_df), memory=memory) - multi_slice = multi.get_slice(3, 10) - assert multi_slice.mess_offset == seq_ids[3, 0].item() - assert multi_slice.mess_count == seq_ids[10, 0].item() - seq_ids[3, 0].item() - assert multi_slice.offset == 3 - assert multi_slice.count == 10 - 3 - assert cp.all(multi_slice.get_tensor("probs") == probs[3:10, :]) - - # Offset on memory - multi = MultiInferenceMessage(meta=MessageMeta(filter_probs_df), - mess_offset=seq_ids[4, 0].item(), - memory=memory, - offset=4) - multi_slice = multi.get_slice(6, 13) - assert multi_slice.mess_offset == seq_ids[multi.offset + 6, 0].item() - assert multi_slice.mess_count == seq_ids[multi.offset + 13 - 1, 0].item() + 1 - seq_ids[multi.offset + 6, 0].item() - assert multi_slice.offset == 6 + 4 - assert multi_slice.count == 13 - 6 - assert cp.all(multi_slice.get_tensor("probs") == probs[multi.offset + 6:multi.offset + 13, :]) - - # Should be equivalent to shifting the input tensors and having no offset - equiv_memory = InferenceMemory(count=tensor_count - 4, tensors={"seq_ids": seq_ids[4:], "probs": probs[4:]}) - equiv_multi = MultiInferenceMessage(meta=MessageMeta(filter_probs_df), - mess_offset=seq_ids[4, 0].item(), - memory=equiv_memory) - equiv_slice = equiv_multi.get_slice(6, 13) - assert multi_slice.mess_offset == equiv_slice.mess_offset - assert multi_slice.mess_count == equiv_slice.mess_count - assert multi_slice.offset != equiv_slice.offset - assert multi_slice.count == equiv_slice.count - assert cp.all(multi_slice.get_tensor("probs") == equiv_slice.get_tensor("probs")) - - # Offset on meta - memory = InferenceMemory(count=tensor_count - 3, tensors={"seq_ids": seq_ids[:-3] + 3, "probs": probs[:-3]}) - multi = MultiInferenceMessage(meta=MessageMeta(filter_probs_df), mess_offset=3, memory=memory) - multi_slice = multi.get_slice(2, 9) - assert multi_slice.mess_offset == seq_ids[multi.offset + 2, 0].item() + 3 - assert multi_slice.mess_count == seq_ids[multi.offset + 9 - 1, 0].item() + 1 - seq_ids[multi.offset + 2, 0].item() - assert multi_slice.offset == 2 - assert multi_slice.count == 9 - 2 - assert cp.all(multi_slice.get_tensor("probs") == probs[multi.offset + 2:multi.offset + 9, :]) - - # Should be equivalent to shifting the input dataframe and having no offset - equiv_memory = InferenceMemory(count=tensor_count - 3, tensors={"seq_ids": seq_ids[:-3], "probs": probs[:-3]}) - equiv_multi = MultiInferenceMessage(meta=MessageMeta(filter_probs_df.iloc[3:, :]), memory=equiv_memory) - equiv_slice = equiv_multi.get_slice(2, 9) - assert multi_slice.mess_offset == equiv_slice.mess_offset + 3 - assert multi_slice.mess_count == equiv_slice.mess_count - assert multi_slice.offset == equiv_slice.offset - assert multi_slice.count == equiv_slice.count - dataset.assert_df_equal(multi_slice.get_meta(), equiv_slice.get_meta()) - - # Finally, compare a double slice to a single - memory = InferenceMemory(count=tensor_count, tensors={"seq_ids": seq_ids, "probs": probs}) - multi = MultiInferenceMessage(meta=MessageMeta(filter_probs_df), memory=memory) - double_slice = multi.get_slice(4, 17).get_slice(3, 10) - single_slice = multi.get_slice(4 + 3, 4 + 10) - assert double_slice.mess_offset == single_slice.mess_offset - assert double_slice.mess_count == single_slice.mess_count - assert double_slice.offset == single_slice.offset - assert double_slice.count == single_slice.count - assert cp.all(double_slice.get_tensor("probs") == single_slice.get_tensor("probs")) - dataset.assert_df_equal(double_slice.get_meta(), single_slice.get_meta())