Sqoop Import Command Arguments 1


Incremental import arguments

Sqoop provides Incremental import which allows us to to retrieve only rows newer than some previously-imported set of rows by specifying the –incremental parameter. But it expects two more mandatory arguments, –check-column and –last-value.

Argument Description & Example
–incremental <type>  Valid import-types are ‘append’ or ‘lastmodified’. 
–check-column <col>  Source column to check for incremental change. (not be of type CHAR/ NCHAR/  VARCHAR/ VARNCHAR/ LONGVARCHAR/ LONGNVARCHAR) (suitable types are date , time , datetime , and timestamp)
–last-value (value)  Last imported maximum value in the incremental check column

Notes:

  • When RDBMS table is only getting new rows and the existing rows are not changed, then we need to use the append mode.
  • When existing rows also being updated in addition to new rows then we need to use lastmodified incremental mode.
  • Internally, the lastmodified incremental import fires two mapreduce jobs. The first job will import the delta of changed data similarly to normal import. This import job will save data in a temporary directory on HDFS. The second job will take both the old and new data and will merge them together into the final output, preserving only the last updated value for each row.

Example:-

Output line formatting arguments

When Importing RDBMS tables into HDFS files, we can specify the field delimiters and line/record delimiter in the output files on HDFS. These delimiters can be specified on the command line. Data imported with –as-textfile will be formatted according to these parameters.

Arguments of the form \0xhhh will be interpreted as a hexadecimal representation of a character with hex number _hhh_. Arguments of the form \0ooo will be treated as an octal representation of a character represented by octal number _ooo_. The special escapes \n, \r, \”, \b, \t, and \\ act as they do inside Java strings. \0 will be treated as NUL. This will insert NUL characters between fields or lines (if used for –fields-terminated-by or –lines-terminated-by), or will disable enclosing/escaping if used for one of the –enclosed-by, –optionally-enclosed-by, or –escaped-by arguments.

The default delimiters are , for fields, \n for records, no quote character, and no escape character. Note that this can lead to ambiguous/un parse-able records if you import database records containing commas or newlines in the field data. For unambiguous parsing, both must be enabled, e.g., via –mysql-delimiters.

Argument Description 
–enclosed-by <char>  Sets a required field enclosing character
–escaped-by <char>  Sets the escape character
–fields-terminated-by <char>  Sets the field separator character
–lines-terminated-by <char>  Sets the end-of-line character
–mysql-delimiters  Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by:  \ optionally-enclosed-by: ‘
–optionally-enclosed-by <char>  Sets a field enclosing character

Input parsing arguments

These arguments are used for parsing input files to specify the enclosed by/escaped by characters etc when exporting the files from HDFS to RDBMS tables. These delimiters are same as the one set in previous section for storing into HDFS files, which will act as input files for exporting into RDBMS from HDFS.

Argument Description 
–input-enclosed-by <char>  Sets a required field en-closer. 
–input-optionally-enclosed-by <char>  Sets a field enclosing character
–input-escaped-by <char>  Sets the input escape character
–input-fields-terminated-by <char>  Sets the input field separator
–input-lines-terminated-by <char>  Sets the input end-of-line char

Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Sqoop Import Command Arguments

  • Joseph

    Hey Siva,

    I have been trying to work with sqoop and i wanted to test connection to mysql and i tried your command
    sqoop list-databases –connect jdbc:mysql://localhost –connection-manager org.apache.sqoop.manager.GenericJdbcManager –driver com.mysql.jdbc.Driver –username root –password MYPASSWORD
    and i end up with an error
    15/09/25 14:25:03 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    15/09/25 14:25:03 INFO manager.SqlManager: Using default fetchSize of 1000
    15/09/25 14:25:03 ERROR manager.SqlManager: Generic SqlManager.listDatabases() not implemented.
    Could not retrieve database list from server
    15/09/25 14:25:03 ERROR tool.ListDatabasesTool: manager.listDatabases() returned null

    I have also put the jar file in /var/lib/sqoop ”
    mysql-connector-java-5.0.8-bin.jar”  and when i try it wothout specifying the –connection-manager it automatically defaults to GenericJDBCManager,  any idea why this isnt working ?
    Thanks
    -Joseph