Skip to content

Commit

Permalink
Merge branch 'dev' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
Anush008 authored Aug 29, 2024
2 parents e30ba47 + f49b263 commit 5637abc
Show file tree
Hide file tree
Showing 75 changed files with 4,176 additions and 157 deletions.
8 changes: 8 additions & 0 deletions .github/workflows/labeler/label-scope-conf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -257,11 +257,19 @@ activemq:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/connector-activemq/**
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(activemq)/**'

qdrant:
- all:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/connector-qdrant/**
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(qdrant)/**'

typesense:
- all:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/connector-typesense/**
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(typesense)/**'

Zeta Rest API:
- changed-files:
- any-glob-to-any-file: seatunnel-engine/**/server/rest/**
Expand Down
2 changes: 1 addition & 1 deletion config/plugin_config
Original file line number Diff line number Diff line change
Expand Up @@ -89,5 +89,5 @@ connector-milvus
connector-activemq
connector-sls
connector-qdrant
connector-typesense
connector-cdc-opengauss
--end--
39 changes: 39 additions & 0 deletions docs/en/connector-v2/sink/Rabbitmq.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,21 @@ convenience method for setting the fields in an AMQP URI: host, port, username,

the queue to write the message to

### durable [boolean]

true: The queue will survive a server restart.
false: The queue will be deleted on server restart.

### exclusive [boolean]

true: The queue is used only by the current connection and will be deleted when the connection closes.
false: The queue can be used by multiple connections.

### auto_delete [boolean]

true: The queue will be deleted automatically when the last consumer unsubscribes.
false: The queue will not be automatically deleted.

### schema [Config]

#### fields [Config]
Expand Down Expand Up @@ -112,6 +127,30 @@ sink {
}
```

### Example 2

queue with durable, exclusive, auto_delete:

```hocon
sink {
RabbitMQ {
host = "rabbitmq-e2e"
port = 5672
virtual_host = "/"
username = "guest"
password = "guest"
queue_name = "test1"
durable = "true"
exclusive = "false"
auto_delete = "false"
rabbitmq.config = {
requested-heartbeat = 10
connection-timeout = 10
}
}
}
```

## Changelog

### next version
Expand Down
93 changes: 93 additions & 0 deletions docs/en/connector-v2/sink/Typesense.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Typesense

## Description

Outputs data to `Typesense`.

## Key Features

- [ ] [Exactly Once](../../concept/connector-v2-features.md)
- [x] [CDC](../../concept/connector-v2-features.md)

## Options

| Name | Type | Required | Default Value |
|------------------|--------|----------|------------------------------|
| hosts | array | Yes | - |
| collection | string | Yes | - |
| schema_save_mode | string | Yes | CREATE_SCHEMA_WHEN_NOT_EXIST |
| data_save_mode | string | Yes | APPEND_DATA |
| primary_keys | array | No | |
| key_delimiter | string | No | `_` |
| api_key | string | No | |
| max_retry_count | int | No | 3 |
| max_batch_size | int | No | 10 |
| common-options | | No | - |

### hosts [array]

The access address for Typesense, formatted as `host:port`, e.g., `["typesense-01:8108"]`.

### collection [string]

The name of the collection to write to, e.g., "seatunnel".

### primary_keys [array]

Primary key fields used to generate the document `id`.

### key_delimiter [string]

Sets the delimiter for composite keys (default is `_`).

### api_key [config]

The `api_key` for secure access to Typesense.

### max_retry_count [int]

The maximum number of retry attempts for batch requests.

### max_batch_size [int]

The maximum size of document batches.

### common options

Common parameters for Sink plugins. Refer to [Common Sink Options](../source-common-options.md) for more details.

### schema_save_mode

Choose how to handle the target-side schema before starting the synchronization task:
- `RECREATE_SCHEMA`: Creates the table if it doesn’t exist, and deletes and recreates it if it does.
- `CREATE_SCHEMA_WHEN_NOT_EXIST`: Creates the table if it doesn’t exist, skips creation if it does.
- `ERROR_WHEN_SCHEMA_NOT_EXIST`: Throws an error if the table doesn’t exist.

### data_save_mode

Choose how to handle existing data on the target side before starting the synchronization task:
- `DROP_DATA`: Retains the database structure but deletes the data.
- `APPEND_DATA`: Retains both the database structure and the data.
- `ERROR_WHEN_DATA_EXISTS`: Throws an error if data exists.

## Example

Simple example:

```bash
sink {
Typesense {
source_table_name = "typesense_test_table"
hosts = ["localhost:8108"]
collection = "typesense_to_typesense_sink_with_query"
max_retry_count = 3
max_batch_size = 10
api_key = "xyz"
primary_keys = ["num_employees","id"]
key_delimiter = "="
schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
data_save_mode = "APPEND_DATA"
}
}
```

79 changes: 79 additions & 0 deletions docs/en/connector-v2/source/Typesense.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Typesense

> Typesense Source Connector
## Description

Reads data from Typesense.

## Key Features

- [x] [Batch Processing](../../concept/connector-v2-features.md)
- [ ] [Stream Processing](../../concept/connector-v2-features.md)
- [ ] [Exactly-Once](../../concept/connector-v2-features.md)
- [x] [Schema](../../concept/connector-v2-features.md)
- [x] [Parallelism](../../concept/connector-v2-features.md)
- [ ] [User-Defined Splits Support](../../concept/connector-v2-features.md)

## Options

| Name | Type | Required | Default |
|------------|--------|----------|---------|
| hosts | array | yes | - |
| collection | string | yes | - |
| schema | config | yes | - |
| api_key | string | no | - |
| query | string | no | - |
| batch_size | int | no | 100 |

### hosts [array]

The access address of Typesense, for example: `["typesense-01:8108"]`.

### collection [string]

The name of the collection to write to, for example: `"seatunnel"`.

### schema [config]

The columns to be read from Typesense. For more information, please refer to the [guide](../../concept/schema-feature.md#how-to-declare-type-supported).

### api_key [config]

The `api_key` for Typesense security authentication.

### batch_size

The number of records to query per batch when reading data.

### Common Options

For common parameters of Source plugins, please refer to [Source Common Options](../source-common-options.md).

## Example

```bash
source {
Typesense {
hosts = ["localhost:8108"]
collection = "companies"
api_key = "xyz"
query = "q=*&filter_by=num_employees:>9000"
schema = {
fields {
company_name_list = array<string>
company_name = string
num_employees = long
country = string
id = string
c_row = {
c_int = int
c_string = string
c_array_int = array<int>
}
}
}
}
}
```

21 changes: 0 additions & 21 deletions docs/en/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,23 +203,6 @@ spark {
}
```

## How do I specify a different JDK version for SeaTunnel on YARN?

For example, if you want to set the JDK version to JDK8, there are two cases:

- The YARN cluster has deployed JDK8, but the default JDK is not JDK8. Add two configurations to the SeaTunnel config file:

```
env {
...
spark.executorEnv.JAVA_HOME="/your/java_8_home/directory"
spark.yarn.appMasterEnv.JAVA_HOME="/your/java_8_home/directory"
...
}
```
- YARN cluster does not deploy JDK8. At this time, start SeaTunnel attached with JDK8. For detailed operations, see:
https://www.cnblogs.com/jasondan/p/spark-specific-jdk-version.html

## What should I do if OOM always appears when running SeaTunnel in Spark local[*] mode?

If you run in local mode, you need to modify the `start-seatunnel.sh` startup script. After `spark-submit`, add a parameter `--driver-memory 4g` . Under normal circumstances, local mode is not used in the production environment. Therefore, this parameter generally does not need to be set during On YARN. See: [Application Properties](https://spark.apache.org/docs/latest/configuration.html#application-properties) for details.
Expand Down Expand Up @@ -334,10 +317,6 @@ spark-submit --verbose
...
```

## How do I use SeaTunnel to synchronize data across HDFS clusters?

Just configure hdfs-site.xml properly. Refer to: https://www.cnblogs.com/suanec/p/7828139.html.

## I want to learn the source code of SeaTunnel. Where should I start?

SeaTunnel has a completely abstract and structured code implementation, and many people have chosen SeaTunnel As a way to learn Spark. You can learn the source code from the main program entry: SeaTunnel.java
Expand Down
95 changes: 95 additions & 0 deletions docs/zh/connector-v2/sink/Typesense.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Typesense

## 描述

输出数据到 `Typesense`

## 主要特性

- [ ] [精确一次](../../concept/connector-v2-features.md)
- [x] [cdc](../../concept/connector-v2-features.md)

## 选项

| 名称 | 类型 | 是否必须 | 默认值 |
|------------------|--------|------|------------------------------|
| hosts | array || - |
| collection | string || - |
| schema_save_mode | string || CREATE_SCHEMA_WHEN_NOT_EXIST |
| data_save_mode | string || APPEND_DATA |
| primary_keys | array || |
| key_delimiter | string || `_` |
| api_key | string || |
| max_retry_count | int || 3 |
| max_batch_size | int || 10 |
| common-options | || - |

### hosts [array]

Typesense的访问地址,格式为 `host:port`,例如:["typesense-01:8108"]

### collection [string]

要写入的集合名,例如:“seatunnel”

### primary_keys [array]

主键字段用于生成文档 `id`

### key_delimiter [string]

设定复合键的分隔符(默认为 `_`)。

### api_key [config]

typesense 安全认证的 api_key。

### max_retry_count [int]

批次批量请求最大尝试大小

### max_batch_size [int]

批次批量文档最大大小

### common options

Sink插件常用参数,请参考 [Sink常用选项](../sink-common-options.md) 了解详情

### schema_save_mode

在启动同步任务之前,针对目标侧已有的表结构选择不同的处理方案<br/>
选项介绍:<br/>
`RECREATE_SCHEMA` :当表不存在时会创建,当表已存在时会删除并重建<br/>
`CREATE_SCHEMA_WHEN_NOT_EXIST` :当表不存在时会创建,当表已存在时则跳过创建<br/>
`ERROR_WHEN_SCHEMA_NOT_EXIST` :当表不存在时将抛出错误<br/>

### data_save_mode

在启动同步任务之前,针对目标侧已存在的数据选择不同的处理方案<br/>
选项介绍:<br/>
`DROP_DATA`: 保留数据库结构,删除数据<br/>
`APPEND_DATA`:保留数据库结构,保留数据<br/>
`ERROR_WHEN_DATA_EXISTS`:当有数据时抛出错误<br/>

## 示例

简单示例

```bash
sink {
Typesense {
source_table_name = "typesense_test_table"
hosts = ["localhost:8108"]
collection = "typesense_to_typesense_sink_with_query"
max_retry_count = 3
max_batch_size = 10
api_key = "xyz"
primary_keys = ["num_employees","id"]
key_delimiter = "="
schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
data_save_mode = "APPEND_DATA"
}
}
```

Loading

0 comments on commit 5637abc

Please sign in to comment.