Skip to content

Commit

Permalink
Merge branch 'refs/heads/dev_' into pr/7484
Browse files Browse the repository at this point in the history
# Conflicts:
#	docs/en/connector-v2/source/PostgreSQL.md
  • Loading branch information
Hisoka-X committed Aug 28, 2024
2 parents 94c9bd3 + 696f294 commit 747af46
Show file tree
Hide file tree
Showing 103 changed files with 7,472 additions and 237 deletions.
3 changes: 2 additions & 1 deletion config/plugin_config
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,5 @@ connector-web3j
connector-milvus
connector-activemq
connector-sls
--end--
connector-cdc-opengauss
--end--
56 changes: 56 additions & 0 deletions docs/en/connector-v2/sink/Kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ They can be downloaded via install-plugin.sh or from the Maven central repositor
| format | String | No | json | Data format. The default format is json. Optional text format, canal_json, debezium_json, ogg_json and avro.If you use json or text format. The default field separator is ", ". If you customize the delimiter, add the "field_delimiter" option.If you use canal format, please refer to [canal-json](../formats/canal-json.md) for details.If you use debezium format, please refer to [debezium-json](../formats/debezium-json.md) for details. |
| field_delimiter | String | No | , | Customize the field delimiter for data format. |
| common-options | | No | - | Source plugin common parameters, please refer to [Source Common Options](../sink-common-options.md) for details |
| protobuf_message_name | String | No | - | Effective when the format is set to protobuf, specifies the Message name |
| protobuf_schema | String | No | - | Effective when the format is set to protobuf, specifies the Schema definition |


## Parameter Interpretation

Expand Down Expand Up @@ -213,3 +216,56 @@ sink {
}
```


### Protobuf Configuration

Set the `format` to `protobuf` and configure the `protobuf` data structure using the `protobuf_message_name` and `protobuf_schema` parameters.

Example Usage:

```hocon
sink {
kafka {
topic = "test_protobuf_topic_fake_source"
bootstrap.servers = "kafkaCluster:9092"
format = protobuf
kafka.request.timeout.ms = 60000
kafka.config = {
acks = "all"
request.timeout.ms = 60000
buffer.memory = 33554432
}
protobuf_message_name = Person
protobuf_schema = """
syntax = "proto3";
package org.apache.seatunnel.format.protobuf;
option java_outer_classname = "ProtobufE2E";
message Person {
int32 c_int32 = 1;
int64 c_int64 = 2;
float c_float = 3;
double c_double = 4;
bool c_bool = 5;
string c_string = 6;
bytes c_bytes = 7;
message Address {
string street = 1;
string city = 2;
string state = 3;
string zip = 4;
}
Address address = 8;
map<string, float> attributes = 9;
repeated string phone_numbers = 10;
}
"""
}
}
```
25 changes: 24 additions & 1 deletion docs/en/connector-v2/source/Jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ supports query SQL and can achieve projection effect.

## Options

| name | type | required | default value | description |
| name | type | required | default value | description |
|--------------------------------------------|---------|----------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| url | String | Yes | - | The URL of the JDBC connection. Refer to a case: jdbc:postgresql://localhost/test |
| driver | String | Yes | - | The jdbc class name used to connect to the remote data source, if you use MySQL the value is `com.mysql.cj.jdbc.Driver`. |
Expand All @@ -52,6 +52,7 @@ supports query SQL and can achieve projection effect.
| partition_upper_bound | Long | No | - | The partition_column max value for scan, if not set SeaTunnel will query database get max value. |
| partition_lower_bound | Long | No | - | The partition_column min value for scan, if not set SeaTunnel will query database get min value. |
| partition_num | Int | No | job parallelism | Not recommended for use, The correct approach is to control the number of split through `split.size`<br/> How many splits do we need to split into, only support positive integer. default value is job parallelism. |
| decimal_type_narrowing | Boolean | No | true | Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now. Please refer to `decimal_type_narrowing` below |
| use_select_count | Boolean | No | false | Use select count for table count rather then other methods in dynamic chunk split stage. This is currently only available for jdbc-oracle.In this scenario, select count directly is used when it is faster to update statistics using sql from analysis table |
| skip_analyze | Boolean | No | false | Skip the analysis of table count in dynamic chunk split stage. This is currently only available for jdbc-oracle.In this scenario, you schedule analysis table sql to update related table statistics periodically or your table data does not change frequently |
| fetch_size | Int | No | 0 | For queries that return a large number of objects, you can configure the row fetch size used in the query to improve performance by reducing the number database hits required to satisfy the selection criteria. Zero means use jdbc default value. |
Expand All @@ -66,6 +67,28 @@ supports query SQL and can achieve projection effect.
| split.inverse-sampling.rate | Int | No | 1000 | The inverse of the sampling rate used in the sample sharding strategy. For example, if this value is set to 1000, it means a 1/1000 sampling rate is applied during the sampling process. This option provides flexibility in controlling the granularity of the sampling, thus affecting the final number of shards. It's especially useful when dealing with very large datasets where a lower sampling rate is preferred. The default value is 1000. |
| common-options | | No | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details. |

### decimal_type_narrowing

Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now.

eg:

decimal_type_narrowing = true

| Oracle | SeaTunnel |
|---------------|-----------|
| NUMBER(1, 0) | Boolean |
| NUMBER(6, 0) | INT |
| NUMBER(10, 0) | BIGINT |

decimal_type_narrowing = false

| Oracle | SeaTunnel |
|---------------|----------------|
| NUMBER(1, 0) | Decimal(1, 0) |
| NUMBER(6, 0) | Decimal(6, 0) |
| NUMBER(10, 0) | Decimal(10, 0) |

## Parallel Reader

The JDBC Source connector supports parallel reading of data from tables. SeaTunnel will use certain rules to split the data in the table, which will be handed over to readers for reading. The number of readers is determined by the `parallelism` option.
Expand Down
Loading

0 comments on commit 747af46

Please sign in to comment.