Skip to content

Latest commit

 

History

History
174 lines (119 loc) · 6.95 KB

README.md

File metadata and controls

174 lines (119 loc) · 6.95 KB

MysqlReader Plugin Documentation

Quick Introduction

The MysqlReader plugin enables data reading from a MySQL database. Internally, MysqlReader connects to a remote MySQL database using github.com/go-sql-driver/mysql and database/sql, executing corresponding SQL statements to retrieve data from the MySQL server.

Implementation Principles

MysqlReader connects to a remote MySQL database using github.com/go-sql-driver/mysql. Based on user-provided configuration information, it generates SQL queries and sends them to the remote MySQL server. The returned results from these SQL executions are assembled into an abstract dataset using go-etl's custom data types and passed to downstream Writers for processing.

MysqlReader utilizes the query processes defined in dbmsreader and calls go-etl's custom storage/database DBWrapper for specific queries. DBWrapper encapsulates numerous interfaces from database/sql and abstracts out a database dialect. For MySQL, it adopts the dialect implemented in storage/database/mysql.

Functionality Description

Configuration Example

Configuring a job to synchronize data from a MySQL database to a local destination:

{
    "job":{
        "content":[
            {
                "reader":{
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "123456",
                        "column": ["*"],
                        "connection": {
                            "url": "tcp(192.168.0.1:3306)/mysql?parseTime=false",
                            "table": {
                                "db":"source",
                                "name":"type_table"
                            }
                        },
                        "split": {
                            "key":"id"
                        },
                        "where": "",
                        "querySql":["select a,b from table_a join table_b on table_a.id = table_b.id"]
                    }
                }
            }
        ]
    }
}

Parameter Explanation

url

  • Description: Primarily used to configure the connection information to the remote server. The basic configuration format is: tcp(ip:port)/db, where ip:port represents the IP address and port of the MySQL server, and db indicates the default database to connect to. It is similar to the connection configuration information of mysql, except that the username and password are extracted from the connection configuration for easier encryption.
  • Required: Yes
  • Default: None

username

  • Description: Primarily used to configure the MySQL database username.
  • Required: Yes
  • Default: None

password

  • Description: Primarily used to configure the MySQL database password.
  • Required: Yes
  • Default: None

table

Describes the MySQL table information.

db
  • Description: Primarily used to configure the database name of the MySQL table.
  • Required: Yes
  • Default: None
name
  • Description: Primarily used to configure the table name of the MySQL table.
  • Required: Yes
  • Default: None

column

  • Description: An array of column names from the configured table that need to be synchronized. Users can use the JSON array format to describe the field information. Using "*" represents selecting all columns by default, e.g., ["*"].

    Supports column pruning, meaning users can choose specific columns for export.

    Supports column reordering, allowing columns to be exported in an order different from the table schema.

    Supports constant configuration. Users need to follow the MySQL SQL syntax format: ["id", "table", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3", "true"]. Here, id is a regular column name, table is a column name containing reserved words, 1 is an integer constant, 'bazhen.csy' is a string constant, null is a null pointer, to_char(a + 1) is an expression, 2.3 is a floating-point number, and true is a boolean value.

  • Required: Yes

  • Default: None

split

key
  • Description: Primarily used to configure the split key for the MySQL table. The split key must be of type bigInt/string/time, assuming the data distribution based on the split key is uniform.
  • Required: No
  • Default: None
timeAccuracy
  • Description: Primarily used to configure the time split key for the MySQL table, mainly describing the smallest unit of time, such as day, minute, second, millisecond, microsecond, nanosecond.
  • Required: No
  • Default: None
range
type
  • Description: Primarily used to configure the default value type of the split key for the MySQL table. The value can be bigInt/string/time. This will check the type of the split key in the table, so please ensure the type is correct.
  • Required: No
  • Default: None
left
  • Description: Primarily used to configure the default minimum value of the split key for the MySQL table.
  • Required: No
  • Default: None
right
  • Description: Primarily used to configure the default maximum value of the split key for the MySQL table.
  • Required: No
  • Default: None

where

  • Description: Primarily used to configure the WHERE condition for the SELECT statement.
  • Required: No
  • Default: None

querySql

  • Description: In some business scenarios, the "where" configuration item is not sufficient to describe the filtering conditions. Users can use this configuration item to customize the filtering SQL. When users configure this item, the DataX system will ignore the "table", "column", and other configuration items, and directly use the content of this configuration item for data filtering. For example, if you need to perform a multi-table join and then synchronize the data, you can use select a, b from table_a join table_b on table_a.id = table_b.id. When the user configures querySql, MysqlReader directly ignores the configuration of table, column, and where conditions. The priority of querySql is higher than that of table, column, and where options.
  • Required: No
  • Default: None

trimChar

  • Description: Specifies whether to remove leading and trailing spaces for char types in MySQL.
  • Required: No
  • Default: false

Type Conversion

Currently, MysqlReader supports most MySQL types, but there are still some individual types that are not supported. Please check your types carefully.

Below is a conversion table for MysqlReader regarding MySQL types:

go-etl Type MySQL Data Type
bigInt int, tinyint, smallint, mediumint, bigint, year,unsigned int, unsigned bigint, unsigned smallint, unsigned tinyint
decimal float, double, decimal
string varchar, char, tinytext, text, mediumtext, longtext
time date, datetime, timestamp, time
bytes tinyblob, mediumblob, blob, longblob, varbinary, bit

Performance Report

To be tested.

Constraints and Limitations

Database Encoding Issues

Currently, only the utf8 character set is supported.