This guide documents an overview of the current Apache SeaTunnel modules and best practices on how to submit a high quality pull request to Apache SeaTunnel.
Module Name | Introduction |
---|---|
seatunnel-api | SeaTunnel connector V2 API module |
seatunnel-common | SeaTunnel common module |
seatunnel-connectors-v2 | SeaTunnel connector V2 module, currently connector V2 is under development and the community will focus on it |
seatunnel-core/seatunnel-spark-starter | SeaTunnel core starter module of connector V2 on Spark engine |
seatunnel-core/seatunnel-flink-starter | SeaTunnel core starter module of connector V2 on Flink engine |
seatunnel-core/seatunnel-starter | SeaTunnel core starter module of connector V2 on SeaTunnel engine |
seatunnel-e2e | SeaTunnel end-to-end test module |
seatunnel-examples | SeaTunnel local examples module, developer can use it to do unit test and integration test |
seatunnel-engine | SeaTunnel engine module, seatunnel-engine is a new computational engine developed by the SeaTunnel Community that focuses on data synchronization. |
seatunnel-formats | SeaTunnel formats module, used to offer the ability of formatting data |
seatunnel-plugin-discovery | SeaTunnel plugin discovery module, used to offer the ability of loading SPI plugins from classpath |
seatunnel-transforms-v2 | SeaTunnel transform V2 module, currently transform V2 is under development and the community will focus on it |
seatunnel-translation | SeaTunnel translation module, used to adapt Connector V2 and other computing engines such as Spark, Flink etc... |
-
Create entity classes using annotations in the
lombok
plugin (@Data
@Getter
@Setter
@NonNull
etc...) to reduce the amount of code. It's a good practice to prioritize the use of lombok plugins in your coding process. -
If you need to use log4j to print logs in a class, preferably use the annotation
@Slf4j
in thelombok
plugin. -
SeaTunnel uses issue to track logical issues, including bugs and improvements, and uses Github's pull requests to manage the review and merge of specific code changes. So making a clear issue or pull request helps the community better understand the developer's intent. The best practice of creating issue or pull request is as the following shown:
[purpose] [module name] [sub-module name] Description
- Pull request purpose includes:
Hotfix
,Feature
,Improve
,Docs
,WIP
. Note that if your pull request's purpose isWIP
, then you need to use github's draft pull request - Issue purpose includes:
Feature
,Bug
,Docs
,Discuss
- Module name: the current pull request or issue involves the name of the module, for example:
Core
,Connector-V2
,Connector-V1
, etc. - Sub-module name: the current pull request or issue involves the name of the sub-module, for example:
File
Redis
Hbase
etc. - Description: provide a brief, clear summary of the current pull request and issue's main goals and aim for a title that conveys the core purpose at a glance.
Tips:For more details, you can refer to Issue Guide and Pull Request Guide
- Pull request purpose includes:
-
Code segments are never repeated. If a code segment is used multiple times, define it multiple times is not a good option, make it a public segment for other modules to use is a best practice.
-
When throwing an exception, throw it along with a hint message and the exception should be smaller in scope. Throwing overly broad exceptions promotes complex error handling code that is more likely to contain security vulnerabilities. For example, if your connector encounters an
IOException
while reading data, a reasonable approach would be to the following:try { // read logic } catch (IOException e) { throw SeaTunnelORCFormatException("This orc file is corrupted, please check it", e); }
-
The Apache project has very strict licensing requirements, so every file in an Apache project should contain a license statement. Check that each new file you add contains the
Apache License Header
before submitting pull request:/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */
-
Apache SeaTunnel uses
Spotless
for code style and formatting checks. You could run the following command andSpotless
will automatically fix the code style and formatting errors for you:./mvnw spotless:apply
-
Before you submit your pull request, make sure the project will compile properly after adding your code, you can use the following commands to package the whole project:
# multi threads compile ./mvnw -T 1C clean package
# single thread compile ./mvnw clean package
-
Before submitting pull request, do a full unit test and integration test locally can better verify the functionality of your code, best practice is to use the
seatunnel-examples
module's ability to self-test to ensure that the multi-engine is running properly and the results are correct. -
If you submit a pull request with a feature that requires updated documentation, always remember to update the documentation.
-
Submit the pull request of connector type can write e2e test to ensure the robustness and robustness of the code, e2e test should include the full data type, and e2e test as little as possible to initialize the docker image, write the test cases of sink and source together to reduce the loss of resources, while using asynchronous features to ensure the stability of the test. A good example can be found at: MongodbIT.java
-
The priority of property permission in the class is set to
private
, and mutability is set tofinal
, which can be changed reasonably if special circumstances are encountered. -
The properties in the class and method parameters prefer to use the base type(int boolean double float...), not recommended to use the wrapper type(Integer Boolean Double Float...), if encounter special circumstances reasonable change.
-
When developing a sink connector you need to be aware that the sink will be serialized, and if some properties cannot be serialized, encapsulate the properties into classes and use the singleton pattern.
-
If there are multiple
if
process judgments in the code flow, try to simplify the flow to multiple ifs instead of if-else-if. -
Pull request has the characteristic of single responsibility, not allowed to include irrelevant code of the feature in pull request, once this situation deal with their own branch before submitting pull request, otherwise the Apache SeaTunnel community will actively close pull request.
-
Contributors should be responsible for their own pull request. If your pull request contains new features or modifies old features, add test cases or e2e tests to prove the reasonableness and functional integrity of your pull request is a good practice.
-
If you think which part of the community's current code is unreasonable (especially the core
core
module and theapi
module), the function needs to be updated or modified, the first thing to do is to propose adiscuss issue
oremail
with the community to discuss the need to modify this part of the function, if the community agrees to submit pull request again, do not submit the issue and pull request directly without discussion, so the community will directly consider this pull request is useless, and will be closed down.