Merge branch 'refs/heads/dev_' into pr/7484

# Conflicts: # docs/en/connector-v2/source/PostgreSQL.md
apache · Aug 28, 2024 · 747af46 · 747af46
2 parents 94c9bd3 + 696f294
commit 747af46
Show file tree

Hide file tree

Showing 103 changed files with 7,472 additions and 237 deletions.
diff --git a/config/plugin_config b/config/plugin_config
@@ -88,4 +88,5 @@ connector-web3j
 connector-milvus
 connector-activemq
 connector-sls
---end--
+connector-cdc-opengauss
+--end--
diff --git a/docs/en/connector-v2/sink/Kafka.md b/docs/en/connector-v2/sink/Kafka.md
@@ -43,6 +43,9 @@ They can be downloaded via install-plugin.sh or from the Maven central repositor
 | format               | String | No       | json    | Data format. The default format is json. Optional text format, canal_json, debezium_json, ogg_json and avro.If you use json or text format. The default field separator is ", ". If you customize the delimiter, add the "field_delimiter" option.If you use canal format, please refer to [canal-json](../formats/canal-json.md) for details.If you use debezium format, please refer to [debezium-json](../formats/debezium-json.md) for details. |
 | field_delimiter      | String | No       | ,       | Customize the field delimiter for data format.                                                                                                                                                                                                                                                                                                                                                                                                      |
 | common-options       |        | No       | -       | Source plugin common parameters, please refer to [Source Common Options](../sink-common-options.md) for details                                                                                                                                                                                                                                                                                                                                     |
+| protobuf_message_name | String | No | - | Effective when the format is set to protobuf, specifies the Message name |
+| protobuf_schema | String | No | - | Effective when the format is set to protobuf, specifies the Schema definition |
+
 
 ## Parameter Interpretation
 
@@ -213,3 +216,56 @@ sink {
 }
 ```
 
+
+### Protobuf Configuration
+
+Set the `format` to `protobuf` and configure the `protobuf` data structure using the `protobuf_message_name` and `protobuf_schema` parameters.
+
+Example Usage:
+
+```hocon
+sink {
+  kafka {
+      topic = "test_protobuf_topic_fake_source"
+      bootstrap.servers = "kafkaCluster:9092"
+      format = protobuf
+      kafka.request.timeout.ms = 60000
+      kafka.config = {
+        acks = "all"
+        request.timeout.ms = 60000
+        buffer.memory = 33554432
+      }
+      protobuf_message_name = Person
+      protobuf_schema = """
+              syntax = "proto3";
+
+              package org.apache.seatunnel.format.protobuf;
+
+              option java_outer_classname = "ProtobufE2E";
+
+              message Person {
+                int32 c_int32 = 1;
+                int64 c_int64 = 2;
+                float c_float = 3;
+                double c_double = 4;
+                bool c_bool = 5;
+                string c_string = 6;
+                bytes c_bytes = 7;
+
+                message Address {
+                  string street = 1;
+                  string city = 2;
+                  string state = 3;
+                  string zip = 4;
+                }
+
+                Address address = 8;
+
+                map<string, float> attributes = 9;
+
+                repeated string phone_numbers = 10;
+              }
+              """
+  }
+}
+```
diff --git a/docs/en/connector-v2/source/Jdbc.md b/docs/en/connector-v2/source/Jdbc.md
@@ -39,7 +39,7 @@ supports query SQL and can achieve projection effect.
 
 ## Options
 
-|                    name                    |  type   | required |  default value  |                                                                                                                                                                                                                                                                                                                    description                                                                                                                                                                                                                                                                                                                     |
+| name                                       | type    | required | default value   | description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
 |--------------------------------------------|---------|----------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | url                                        | String  | Yes      | -               | The URL of the JDBC connection. Refer to a case: jdbc:postgresql://localhost/test                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
 | driver                                     | String  | Yes      | -               | The jdbc class name used to connect to the remote data source, if you use MySQL the value is `com.mysql.cj.jdbc.Driver`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
@@ -52,6 +52,7 @@ supports query SQL and can achieve projection effect.
 | partition_upper_bound                      | Long    | No       | -               | The partition_column max value for scan, if not set SeaTunnel will query database get max value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | partition_lower_bound                      | Long    | No       | -               | The partition_column min value for scan, if not set SeaTunnel will query database get min value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | partition_num                              | Int     | No       | job parallelism | Not recommended for use, The correct approach is to control the number of split through `split.size`<br/> How many splits do we need to split into, only support positive integer. default value is job parallelism.                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| decimal_type_narrowing                     | Boolean | No       | true            | Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now. Please refer to `decimal_type_narrowing` below                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
 | use_select_count                           | Boolean | No       | false           | Use select count for table count rather then other methods in dynamic chunk split stage. This is currently only available for jdbc-oracle.In this scenario, select count directly is used when it is faster to update statistics using sql from analysis table                                                                                                                                                                                                                                                                                                                                                                                     |
 | skip_analyze                               | Boolean | No       | false           | Skip the analysis of table count in dynamic chunk split stage. This is currently only available for jdbc-oracle.In this scenario, you schedule analysis table sql to update related table statistics periodically or your table data does not change frequently                                                                                                                                                                                                                                                                                                                                                                                    |
 | fetch_size                                 | Int     | No       | 0               | For queries that return a large number of objects, you can configure the row fetch size used in the query to improve performance by reducing the number database hits required to satisfy the selection criteria. Zero means use jdbc default value.                                                                                                                                                                                                                                                                                                                                                                                               |
@@ -66,6 +67,28 @@ supports query SQL and can achieve projection effect.
 | split.inverse-sampling.rate                | Int     | No       | 1000            | The inverse of the sampling rate used in the sample sharding strategy. For example, if this value is set to 1000, it means a 1/1000 sampling rate is applied during the sampling process. This option provides flexibility in controlling the granularity of the sampling, thus affecting the final number of shards. It's especially useful when dealing with very large datasets where a lower sampling rate is preferred. The default value is 1000.                                                                                                                                                                                            |
 | common-options                             |         | No       | -               | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
 
+### decimal_type_narrowing
+
+Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now.
+
+eg:
+
+decimal_type_narrowing = true
+
+| Oracle        | SeaTunnel |
+|---------------|-----------|
+| NUMBER(1, 0)  | Boolean   |
+| NUMBER(6, 0)  | INT       |
+| NUMBER(10, 0) | BIGINT    |
+
+decimal_type_narrowing = false
+
+| Oracle        | SeaTunnel      |
+|---------------|----------------|
+| NUMBER(1, 0)  | Decimal(1, 0)  |
+| NUMBER(6, 0)  | Decimal(6, 0)  |
+| NUMBER(10, 0) | Decimal(10, 0) |
+
 ## Parallel Reader
 
 The JDBC Source connector supports parallel reading of data from tables. SeaTunnel will use certain rules to split the data in the table, which will be handed over to readers for reading. The number of readers is determined by the `parallelism` option.