Table of Contents
Incremental import arguments
Sqoop provides Incremental import which allows us to to retrieve only rows newer than some previously-imported set of rows by specifying the –incremental parameter. But it expects two more mandatory arguments, –check-column and –last-value.
|Argument||Description & Example|
|–incremental <type>||Valid import-types are ‘append’ or ‘lastmodified’.|
|–check-column <col>||Source column to check for incremental change. (not be of type CHAR/ NCHAR/ VARCHAR/ VARNCHAR/ LONGVARCHAR/ LONGNVARCHAR) (suitable types are date , time , datetime , and timestamp)|
|–last-value (value)||Last imported maximum value in the incremental check column|
- When RDBMS table is only getting new rows and the existing rows are not changed, then we need to use the append mode.
- When existing rows also being updated in addition to new rows then we need to use lastmodified incremental mode.
- Internally, the lastmodified incremental import fires two mapreduce jobs. The first job will import the delta of changed data similarly to normal import. This import job will save data in a temporary directory on HDFS. The second job will take both the old and new data and will merge them together into the final output, preserving only the last updated value for each row.
sqoop import \
--connect jdbc:mysql://mysql.server.com/sqoop \
--table employee \
--incremental append \
--check-column id \
Output line formatting arguments
When Importing RDBMS tables into HDFS files, we can specify the field delimiters and line/record delimiter in the output files on HDFS. These delimiters can be specified on the command line. Data imported with –as-textfile will be formatted according to these parameters.
Arguments of the form \0xhhh will be interpreted as a hexadecimal representation of a character with hex number _hhh_. Arguments of the form \0ooo will be treated as an octal representation of a character represented by octal number _ooo_. The special escapes \n, \r, \”, \b, \t, and \\ act as they do inside Java strings. \0 will be treated as NUL. This will insert NUL characters between fields or lines (if used for –fields-terminated-by or –lines-terminated-by), or will disable enclosing/escaping if used for one of the –enclosed-by, –optionally-enclosed-by, or –escaped-by arguments.
The default delimiters are , for fields, \n for records, no quote character, and no escape character. Note that this can lead to ambiguous/un parse-able records if you import database records containing commas or newlines in the field data. For unambiguous parsing, both must be enabled, e.g., via –mysql-delimiters.
|–enclosed-by <char>||Sets a required field enclosing character|
|–escaped-by <char>||Sets the escape character|
|–fields-terminated-by <char>||Sets the field separator character|
|–lines-terminated-by <char>||Sets the end-of-line character|
|–mysql-delimiters||Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ‘|
|–optionally-enclosed-by <char>||Sets a field enclosing character|
Input parsing arguments
These arguments are used for parsing input files to specify the enclosed by/escaped by characters etc when exporting the files from HDFS to RDBMS tables. These delimiters are same as the one set in previous section for storing into HDFS files, which will act as input files for exporting into RDBMS from HDFS.
|–input-enclosed-by <char>||Sets a required field en-closer.|
|–input-optionally-enclosed-by <char>||Sets a field enclosing character|
|–input-escaped-by <char>||Sets the input escape character|
|–input-fields-terminated-by <char>||Sets the input field separator|
|–input-lines-terminated-by <char>||Sets the input end-of-line char|