Hive – Create Managed & External Table

Hive CREATE TABLE statement is used to create a table, it is similar to creating a table in RDBMS using SQL syntax, additionally, Hive has many more features to work with files.

In this article, I will explain Hive CREATE TABLE usage and syntax, different types of tables Hive supports, where Hive stores table data in HDFS, how to change the default location, how to load the data from files to Hive table, and finally using partitions.

Table of Contents

1. Hive Create Table Syntax

By using CREATE TABLE statement you can create a table in Hive, It is similar to SQL and CREATE TABLE statement takes multiple optional clauses,

				
					
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [column_constraint] [COMMENT col_comment], ...)]
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
[CLUSTERED BY (col_name, col_name,.......]
[COMMENT table_comment]
[ROW FORMAT row_format]
[FIELDS TERMINATED BY char]
[LINES TERMINATED BY char]
[LOCATION 'hdfs_path']
[STORED AS file_format]

				
			

Make sure the database you are creating doesn’t existIn Hive, table can be created with or without the database, If you wanted to create in a database, specify database name qualifier. on Hive warehouse, if it exists, it throws Database emp already exists error.

Note: By default, the Hive tables are created in the warehouse directory, the location is specified in value for the key hive.metastore.warehouse.dir on the config file $HIVE_HOME/conf/hive-site.xml, by default, the warehouse location would be /user/hive/warehouse.

 

On this warehouse location, Hive creates a directory for every database you create and a subdirectory for every table, table data is stored in the table directory.

1.1 Create Table optional clauses

Below I have explained the most used optional clauses with examples.

IF NOT EXISTS – You can use IF NOT EXISTS to avoid the error in case the table is already present. Hive checks if the requesting table already presents,

EXTERNAL – Used to create external table

TEMPORARY – Used to create temporary table.

ROW FORMAT – Specifies the format of the row.

FIELDS TERMINATED BY – By default Hive use ^A field separator, To load a file that has a custom field separator like comma, pipe, tab use this option.

PARTITION BY – Used to create partition data. Using this improves performance.

CLUSTERED BY – Dividing the data into a specific number for buckets.

LOCATION – You can specify the custom location where to store the data on HDFS.

Besides these, Hive also supports many optional clauses.

2. Hive Create Table Example

2.1 Create Table

Here is a simple Hive example that creates an employee table in emp database with id, name, age and gender columns. we have also provided comment to the table.

Creating a Database from Java & Scala

				
					
CREATE TABLE IF NOT EXISTS emp.employee (
 id int,
 name string,
 age int,
 gender string )
 COMMENT 'Employee Table'
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ',';

				
			
  • Note that the latest versions of Hive don’t support the CHAR datatype hence used string for gender.
  • By default, Hive uses ^A as a field separator while loading a file. but, I have a comma-separated file to load the data into this table hence, I’ve used ROW FORMAT DELIMITED FIELDS TERMINATED BY optional clause to specify the custom delimiter.
  • 2.2 Describe Table

    Use DESCRIBE command to describe the table.

    				
    					jdbc:hive2://127.0.0.1:10000> DESCRIBE emp.employee;
    				
    			

    2.3 Load File into table

    Let’s see how to load a data file into the Hive table we just created.
  • Create a data file (for our example, I am creating a file with comma-separated fields)
  • Upload the data file (data.txt) to HDFS. Note you can also load the data from LOCAL without uploading to HDFS.
  • Now use the Hive LOAD command to load the file into the table.
  • 				
    					
    LOAD DATA INPATH '/user/hive/data/data.txt' INTO TABLE emp.employee;
    
    				
    			

    Note that after loading the data, the source file is not present in the location, and you can check the loaded file at HDFS.

    				
    					hdfs dfs -ls /user/hive/warehouse/emp.db/employee/
    -rw-r--r--   1 hive supergroup         52 2020-10-09 19:29 /user/hive/warehouse/emp.db/employee/data.txt
    				
    			

    2.4 Select table

    Use SELECT command to select the get the data from a table.

    				
    					SELECT * FROM emp.employee
    				
    			

    2.5 Insert Date into Table

    Like SQL, you can also use INSERT INTO to insert rows into Hive table.

    				
    					
    INSERT INTO emp.employee values(7,'scott',23,'M');
    INSERT INTO emp.employee values(8,'raman',50,'M');
    
    				
    			

    3. Hive Table Types

    3.1 Internal or Managed Table

    By default, Hive creates an Internal table also known as the Managed table, In the managed table, Hive owns the data/files on the table meaning any data you insert or load files to the table are managed by the Hive process when you drop the table the underlying data or files are also get deleted.

    Examples I have used above sections are Managed tables.

    Let’s see another example on how to LOAD file from LOCAL file system.

    				
    					
    LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee;
    
    				
    			

    Unlike loading from HDFS, source file from LOCAL file system won’t be removed.

    3.2 External Table

    Using EXTERNAL option you can create an external table, Hive doesn’t manage the external table, when you drop an external table, only table metadata from Metastore will be removed but the underlying files will not be removed and still they can be accessed via HDFS commands, Pig, Spark or any other Hadoop compatible tools.

    				
    					
    CREATE EXTERNAL TABLE emp.employee_external (
     id int,
     name string,
     age int,
     gender string)
     ROW FORMAT DELIMITED
     FIELDS TERMINATED BY ','
     LOCATION '/user/hive/data/employee_external';
    
    				
    			
    				
    					
    LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee_external;
    
    				
    			

    This will create a folder /user/hive/data/employee_external/ on HDFS and creates a file data.txt.

    Let’s check if the file created at Hive external location on HDFS.

    				
    					hdfs dfs -ls /user/hive/data/employee_external/
    -rw-r--r--   1 hive supergroup         52 2020-10-09 20:29 /user/hive/data/employee_external/data.txt
    				
    			

    As explained earlier dropping an external table from Hive will just remove the metadata from Metastore but not the actual file on HDFS.

    Let’s see this in action by dropping the table emp.employee_external using DROP TABLE emp.employee_external command and check if the file still exists by running above hdfs -ls command.

    3.3 Temporary Table

    A temporary table is created using TEMPORARY option, these tables exist only within the current session, upon exiting the session the temporary tables will be removed and cannot be accessed in another session.

    There are few limitations to the temporary table

    Below is an example to create TEMPORARY table

    				
    					
    CREATE TEMPORARY TABLE emp.employee_tmp (
     id int,
     name string,
     age int,
     gender string)
     ROW FORMAT DELIMITED
     FIELDS TERMINATED BY ',';
    
    				
    			
    				
    					
    LOAD DATA LOCAL INPATH '/home/hive/data.txt' INTO TABLE emp.employee_external;
    
    				
    			

    This will create a folder /user/hive/data/employee_external/ on HDFS and creates a file data.txt.

    Let’s check if the file created at Hive external location on HDFS.

    				
    					hdfs dfs -ls /user/hive/data/employee_external/
    -rw-r--r--   1 hive supergroup         52 2020-10-09 20:29 /user/hive/data/employee_external/data.txt
    				
    			

    As explained earlier dropping an external table from Hive will just remove the metadata from Metastore but not the actual file on HDFS.

    Let’s see this in action by dropping the table emp.employee_external using DROP TABLE emp.employee_external command and check if the file still exists by running above hdfs -ls command.

    Related Articles

    Take Your Learning To The Next Level.