Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement to regex step - Add option for regex replace and null value on input flow #907

Open
glepod opened this issue Oct 23, 2024 · 2 comments · May be fixed by #908
Open

Enhancement to regex step - Add option for regex replace and null value on input flow #907

glepod opened this issue Oct 23, 2024 · 2 comments · May be fixed by #908

Comments

@glepod
Copy link
Contributor

glepod commented Oct 23, 2024

The WHY

  1. Currently the regex transformer performs a return on regex match. Adding the ability to perform a regex replace will make this step flexible for data manipulation.

  2. Currently a null value on the input will be ignored and the "field" parameter is returned. It may be desirable in some case to pass this to the next step. Options added to the flow config to map an alternate value to null, for example, replace with empty string.

The HOW

These are two issues I am facing with a data import I am creating.

  1. With the use of the SED style for the regex pattern, the option to perform a preg_replace and return the result can be achieved.
    Currently '/<match regex>/' is supported. This returns the matched strings from the input.

If we add the ability to interpret 's/<search regex>/<replacement>/<flags>' the step can perform a data manipulation on the input and will return the result.

  1. The fist null mapping I require is for a 'replace with empty string'. Adding a checkbox beneath the input field will trigger the transformation. This option can be added to any step as it can be passed in the config for the step.

Symphony 6.4 introduces the Null-Coalescing Operator which may resolve this issue, however it requires PHP 8.1.0

@keevan
Copy link
Contributor

keevan commented Oct 25, 2024

It may be desirable in some case to pass this to the next step

Can you provide an example of when this would be desirable?

Adding the ability to perform a regex replace will make this step flexible for data manipulation.

Adding this extra functionality will make the step type too complicated and hard to maintain. It should probably be a separate step type. Perhaps something similar to transformer_regex_replace.

With the use of the SED style for the regex pattern

Though this is pretty well known for developers, dataflows is a tool that should be usable by admins as well. Therefore splitting up the fields into separate fields might be easier for them to understand and enter.

The fist null mapping I require is for a 'replace with empty string'

Can you give a detailed example? I'm not sure what you're after and if this is the best approach for your problem.

@glepod
Copy link
Contributor Author

glepod commented Oct 29, 2024

Can you provide an example of when this would be desirable?
Can you give a detailed example? I'm not sure what you're after and if this is the best approach for your problem.

Both of these relate to the null value in a field in the input. eg:
{"SchoolCode":"1234","SchoolName":"A School","coursecode":"/t CODE ","courseid":"1","teacher":"[email protected]"}
{"SchoolCode":"1234","SchoolName":"A School","coursecode":null,"courseid":"1","teacher":"[email protected]"}

For the field coursecode the first record has whitespace that needs to be removed which can be done with the "Regex Transform" step.

However when there is a value that has "coursecode":null as in the second record, the return is the input filed value from the step config (eg. ${{steps.json_reader.record.coursecode}})
{"SchoolCode":"1234","SchoolName":"A School","coursecode":"${{steps.json_reader.record.coursecode}}","courseid":"1","teacher":"[email protected]"}

The cause of this is here: https://github.com/catalyst/moodle-tool_dataflows/blob/MOODLE_401_STABLE/classes/local/variables/var_value.php#L196

This would, in turn, be passed to the next step and if that was a web service, for example, probably would not be an expected value.

The ability to substitute a null for an empty string will fix this. It may not always be possible to fix the data before it gets to the Dataflow tool.

Alternatively, since the issue occurs in var_value.php when checking if a value is set, is_null may no be the correct test and could be a bug.

What do you think of a new step transform_value_map that would substitute a list of values? The null issue could also be fixed with this.

Adding this extra functionality will make the step type too complicated and hard to maintain. It should probably be a separate step type. Perhaps something similar to transformer_regex_replace.

I think that a mode selector for the "Regex transformer" step (flow_transformer_regex) to choose either Match or Replace in the config would be simple enough. This would be similar to the Compression/Decompression options.

However, I am not against having two separate steps. Probably need to rename the flow_transformer_regex to flow_transformer_regex_match.

Though this is pretty well known for developers, dataflows is a tool that should be usable by admins as well. Therefore splitting up the fields into separate fields might be easier for them to understand and enter.

I agree, separate fields would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants