Hive

What is Apache Hive?

Apache Hive is an open-source data warehouse solution for Hadoop infrastructure. It is used to process structured data of large datasets and provides a way to run HiveQL queries.

What not?

  • Hive not designed for OLTP processing
  • It’s not a relational database (RDBMS)
  • Not used for row-level updates for real-time systems.

Apache Hive Advantages?

  • Supports large datasets
  • Runs on Hadoop infrastructure which uses commodity hardware
  • Supports SQL syntax
  • Provides Beeline client which is used to connect from Java, Scala, C#, Python, and many more languages.

Hive Clients

Hive Spark Examples

  • Spark Union Hive Tables from different Databases

Hive PySpark Examples

Hive Error or Exceptions

  • Hive – HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
  • Why Hive tables Loads with Null Values

Take Your Learning To The Next Level.