Sqoop Import Command Arguments 1


Hive arguments

Apart from importing RDBMS tables into HDFS files, Sqoop also support importing RDBMS tables directly into Hive tables and it is very simple that just using –hive-import argument to import command. Optionally,Sqoop will generate a Hive script containing a CREATE TABLE operation defining table columns using Hive’s types, and a LOAD DATA INPATH statement to move the data files into Hive’s warehouse directory.

Argument Description 
–create-hive-table  If set, then the job will fail if the target hive table exits. By default  this property is false.
–hive-database <database-name>  Sets the database name to use when importing to hive
–hive-delims-replacement <arg>  Replace Hive record \0x01 and row delimiters (\n\r) from imported  string fields with user-defined string
–hive-drop-import-delims  Drop Hive record \0x01 and row delimiters (\n\r) from imported  string  fields
–hive-home <dir>  Override $HIVE_HOME
–hive-import  Import tables into Hive (Uses Hive’s default delimiters if none  are set.)
–hive-overwrite  Overwrite existing data in the Hive table
–hive-partition-key <partition-key>  Sets the partition key to use when importing to hive
–hive-partition-value <partition-value>  Sets the partition value to use when importing to hive
–hive-table <table-name>  Sets the table name to use when importing to hive
–map-column-hive <arg>  Override mapping for specific column to hive types.

Note:

  • The table name used in Hive is, by default, the same as that of the source RDBMS table. We can change the output table name with the –hive-table option.
  • If the Hive table already exists, we can override it with –hive-overwrite option.
  • Hive Importing doesn’t support –as-avrodatafile and –as-sequencefile clauses.
  • By default, row delimiters in hive are \n and \r characters and column delimiters \001 characters, if source RDBMS tables contain these characters inside fields, then we need to use –hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data, or –hive-delims-replacement option to replace those characters with a user-defined characters.
  • Sqoop can import data for Hive into a particular partition by specifying the –hive-partition-key and –hive-partition-value arguments

Examples:

HBase arguments

Apart from HDFS and Hive, Sqoop also supports importing RDBMS tables into HBase tables. Each row of the input table will be transformed into an HBase Put operation to a row of the output table.

Argument Description 
–column-family <family>  Sets the target column family for the import 
–hbase-bulkload  Enables HBase bulk loading
–hbase-create-table  If specified, create missing HBase tables
–hbase-row-key <col>  Specifies which input column to use as the row key
–hbase-table <table>  Import to <table> in HBase

Note:

  • Hbase row-key is decided by split-by column if it is mentioned, otherwise, sqoop will assign primary key of the source table. We can specify our own row-key with –hbase-row-key option.
  • All output columns will be placed in the same column-family specified by –column-family.
  • If the target table and column family do not exist, the Sqoop job will exit with an error, we can specify –hbase-create-table option to create the target table and column family if they do not exist.
  • HBase Importing doesn’t support direct import.

HCatalog arguments

Recently HCatalog Importing is also supported by Sqoop.

Argument Description 
–hcatalog-database <arg>  HCatalog database name, If not specified, the default database  name default is used
–hcatalog-home <hdir>  Override $HCAT_HOME
–hcatalog-partition-keys <partition-key>  Sets the partition keys to use when importing to hive
–hcatalog-partition-values <partition-value>  Sets the partition values to use when importing to  hive
–hcatalog-table <arg>  HCatalog table name
–hive-home <dir>  Override $HIVE_HOME
–hive-partition-key <partition-key>  Sets the partition key to use when importing to hive
–hive-partition-value <partition-value>  Sets the partition value to use when importing to hive
–map-column-hive <arg>  Override mapping for specific column to hive types.
–create-hcatalog-table  Create HCatalog tables before import, By default, HCatalog tables are  assumed to exist.
–hcatalog-storage-stanza <arg>  HCatalog storage stanza for table creation

Notes:

  • Using –hcatalog-database option without –hcatalog-table will result an error message.
  • The presence of the –hcatalog-table option signifies that the import or export job is done using HCatalog tables, and it is a required option for HCatalog jobs
  • The following Sqoop export and import options are not supported with HCatalog jobs.
    • –direct
    • –export-dir
    • –target-dir
    • –warehouse-dir
    • –append
    • –as-sequencefile
    • –as-avrofile
References

Sqoop Documentationhttp://sqoop.apache.org/docs


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Sqoop Import Command Arguments

  • Joseph

    Hey Siva,

    I have been trying to work with sqoop and i wanted to test connection to mysql and i tried your command
    sqoop list-databases –connect jdbc:mysql://localhost –connection-manager org.apache.sqoop.manager.GenericJdbcManager –driver com.mysql.jdbc.Driver –username root –password MYPASSWORD
    and i end up with an error
    15/09/25 14:25:03 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    15/09/25 14:25:03 INFO manager.SqlManager: Using default fetchSize of 1000
    15/09/25 14:25:03 ERROR manager.SqlManager: Generic SqlManager.listDatabases() not implemented.
    Could not retrieve database list from server
    15/09/25 14:25:03 ERROR tool.ListDatabasesTool: manager.listDatabases() returned null

    I have also put the jar file in /var/lib/sqoop ”
    mysql-connector-java-5.0.8-bin.jar”  and when i try it wothout specifying the –connection-manager it automatically defaults to GenericJDBCManager,  any idea why this isnt working ?
    Thanks
    -Joseph