In this post we will discuss about one of the important commands in Apache Sqoop, Sqoop Import Command Arguments with examples. This documentation is applicable for sqoop versions 1.4.5 or later because earlier versions doesn’t support some of the below mentioned arguments to import command
As of Sqoop 1.4.5 version, Sqoop import command supports various number of arguments to import relational database tables into below tools or services.
There are specific arguments for these services, apart from
- Common arguments
- Control arguments
- Input parsing arguments
- Output line formatting arguments
- Incremental import arguments.
Lets discuss about each of these arguments with a brief explanation about each argument.
Sqoop Import Command Arguments
First, lets discuss about the common arguments to Sqoop Import command. Each argument in sqoop commands is prefixed by ‘–‘ (double-hyphen). In the below table only mandatory arguments are highlighted with red color in description column.
|Argument||Description & Example|
|–connect <jdbc-uri>||To Specify JDBC connect string containing hostname or IP address (optionally port) followed by database name. It is mandatory argument.
In this example we are connecting to MySQL On localhost and database name is test_db
|–connection-manager <class>|| Specify connection manager class name It is optional. Ex:
|–connection-param-file <file>||optionally specify extra JDBC parameters via a property file using this option. This file is parsed as standard Java properties and passed into the driver while creating a connection.|
|–driver <class>||Manually specify JDBC driver class to use. Ex: –driver oracle.jdbc.OracleDriver|
|–hadoop-home <hdir>||By default sqoop will look for $HADOOP_HOME to set to $HADOOP_COMMON_HOME. We can override this value with this argument.Specify this to use custom hadoop installation instead of default from $HADOOP_HOME|
|–hadoop-mapred-home <dir>||Same as above for $HADOOP_MAPRED_HOME Specify the path of new hadoop mapreduce installation directory.|
|–username & (–password or -P)||We can specify the username and password of the RDBMS to retrieve the tables. Example: –username u1 –password 12345|
- We should not use the URL localhost in the connect string, if we intend to use Sqoop with a distributed Hadoop cluster. The connect string we supply will be used on data nodes throughout our cluster; if we specify the literal name localhost, each node will connect to a different database on their localhosts (or more likely, no database at all). Instead, we should use the full hostname or IP address of the database host that can be seen by all our remote nodes.
- The –password parameter is insecure, as other users may be able to read our password from the command-line arguments via the output of programs such as ps. The -P argument (prompts for user password) will read a password from a console prompt ,and is the preferred method of entering credentials. Credentials may still be transferred between nodes of the MapReduce cluster using insecure means. The most secure way is to use, –password-file <file containing the password> method. Set authentication password in this file on the users home directory with 400 permissions.
- The jar containing the driver class (ex: com.mysql.jdbc.Driver) should be copied into $SQOOP_HOME/lib directory otherwise exception will be thrown.
Below is the example sqoop import command covering all the arguments discussed above.
--connect jdbc:mysql://hostname_of_db/test_db \
--connection-manager org.apache.sqoop.manager.GenericJdbcManager \
--driver com.mysql.jdbc.Driver \
--username u1 \
--password 12345 \