Hive CREATE TABLE
statement is used to create a table, it is similar to creating a table in RDBMS using SQL syntax, additionally, Hive has many more features to work with files.
1. Hive Create Table Syntax
By using CREATE TABLE
statement you can create a table in Hive, It is similar to SQL and CREATE TABLE
statement takes multiple optional clauses,
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [column_constraint] [COMMENT col_comment], ...)]
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
[CLUSTERED BY (col_name, col_name,.......]
[COMMENT table_comment]
[ROW FORMAT row_format]
[FIELDS TERMINATED BY char]
[LINES TERMINATED BY char]
[LOCATION 'hdfs_path']
[STORED AS file_format]
Make sure the database you are creating doesn’t existIn Hive, table can be created with or without the database, If you wanted to create in a database, specify database name qualifier. on Hive warehouse, if it exists, it throws Database emp already exists
error.
Note: By default, the Hive tables are created in the warehouse directory, the location is specified in value for the key hive.metastore.warehouse.dir
on the config file $HIVE_HOME/conf/hive-site.xml
, by default, the warehouse location would be /user/hive/warehouse
.
On this warehouse location, Hive creates a directory for every database you create and a subdirectory for every table, table data is stored in the table directory.
1.1 Create Table optional clauses
Below I have explained the most used optional clauses with examples.
IF NOT EXISTS
– You can use IF NOT EXISTS to avoid the error in case the table is already present. Hive checks if the requesting table already presents,
EXTERNAL
– Used to create external table
TEMPORARY
– Used to create temporary table.
ROW FORMAT
– Specifies the format of the row.
FIELDS TERMINATED BY
– By default Hive use ^A field separator, To load a file that has a custom field separator like comma, pipe, tab use this option.
PARTITION BY
– Used to create partition data. Using this improves performance.
CLUSTERED BY
– Dividing the data into a specific number for buckets.
LOCATION
– You can specify the custom location where to store the data on HDFS.
Besides these, Hive also supports many optional clauses.
2. Hive Create Table Example
2.1 Create Table
Here is a simple Hive example that creates an employee table in emp database with id, name, age and gender columns. we have also provided comment to the table.
Creating a Database from Java & Scala
CREATE TABLE IF NOT EXISTS emp.employee (
id int,
name string,
age int,
gender string )
COMMENT 'Employee Table'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
CHAR
datatype hence used string
for gender.ROW FORMAT DELIMITED FIELDS TERMINATED BY
optional clause to specify the custom delimiter.2.2 Describe Table
Use DESCRIBE
command to describe the table.
jdbc:hive2://127.0.0.1:10000> DESCRIBE emp.employee;
2.3 Load File into table
Let’s see how to load a data file into the Hive table we just created.LOCAL
without uploading to HDFS.
LOAD DATA INPATH '/user/hive/data/data.txt' INTO TABLE emp.employee;
Note that after loading the data, the source file is not present in the location, and you can check the loaded file at HDFS.
hdfs dfs -ls /user/hive/warehouse/emp.db/employee/
-rw-r--r-- 1 hive supergroup 52 2020-10-09 19:29 /user/hive/warehouse/emp.db/employee/data.txt
2.4 Select table
Use SELECT
command to select the get the data from a table.
SELECT * FROM emp.employee
2.5 Insert Date into Table
Like SQL, you can also use INSERT INTO
to insert rows into Hive table.
INSERT INTO emp.employee values(7,'scott',23,'M');
INSERT INTO emp.employee values(8,'raman',50,'M');
3. Hive Table Types
3.1 Internal or Managed Table
By default, Hive creates an Internal table also known as the Managed table, In the managed table, Hive owns the data/files on the table meaning any data you insert or load files to the table are managed by the Hive process when you drop the table the underlying data or files are also get deleted.
Examples I have used above sections are Managed tables.
Let’s see another example on how to LOAD
file from LOCAL
file system.
LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee;
Unlike loading from HDFS, source file from LOCAL
file system won’t be removed.
3.2 External Table
Using EXTERNAL
option you can create an external table, Hive doesn’t manage the external table, when you drop an external table, only table metadata from Metastore will be removed but the underlying files will not be removed and still they can be accessed via HDFS commands, Pig, Spark or any other Hadoop compatible tools.
CREATE EXTERNAL TABLE emp.employee_external (
id int,
name string,
age int,
gender string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/user/hive/data/employee_external';
LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee_external;
This will create a folder /user/hive/data/employee_external/
on HDFS and creates a file data.txt
.
Let’s check if the file created at Hive external location on HDFS.
hdfs dfs -ls /user/hive/data/employee_external/
-rw-r--r-- 1 hive supergroup 52 2020-10-09 20:29 /user/hive/data/employee_external/data.txt
As explained earlier dropping an external table from Hive will just remove the metadata from Metastore but not the actual file on HDFS.
Let’s see this in action by dropping the table emp.employee_external using DROP TABLE emp.employee_external
command and check if the file still exists by running above hdfs -ls
command.
3.3 Temporary Table
A temporary table is created using TEMPORARY
option, these tables exist only within the current session, upon exiting the session the temporary tables will be removed and cannot be accessed in another session.
There are few limitations to the temporary table
- Cannot Create Partitioned Table
- Indexes are not supported
Below is an example to create TEMPORARY
table
CREATE TEMPORARY TABLE emp.employee_tmp (
id int,
name string,
age int,
gender string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee_external;
This will create a folder /user/hive/data/employee_external/
on HDFS and creates a file data.txt
.
Let’s check if the file created at Hive external location on HDFS.
hdfs dfs -ls /user/hive/data/employee_external/
-rw-r--r-- 1 hive supergroup 52 2020-10-09 20:29 /user/hive/data/employee_external/data.txt
As explained earlier dropping an external table from Hive will just remove the metadata from Metastore but not the actual file on HDFS.
Let’s see this in action by dropping the table emp.employee_external using DROP TABLE emp.employee_external
command and check if the file still exists by running above hdfs -ls
command.