Categories
topic

Features & Advantages of Apache Spark

  • In-memory computation
  • Distributed processing using parallelize
  • Can be used with many cluster managers (Spark, Yarn, Mesos e.t.c)
  • Fault-tolerant
  • Immutable
  • Lazy evaluation
  • Cache & persistence
  • Inbuild-optimization when using DataFrames
  • Supports ANSI SQL
  • Spark is a general-purpose, in-memory, fault-tolerant, distributed processing engine that allows you to process data efficiently in a distributed fashion.
  • Applications running on Spark are 100x faster than traditional systems.
  • You will get great benefits from using Spark for data ingestion pipelines.
				
					spark.eventLog.enabled true
spark.history.fs.logDirectory file:///c:/logs/path
				
			
				
					$SPARK_HOME/sbin/start-history-server.sh
				
			

Leave a Reply

Your email address will not be published. Required fields are marked *

Take Your Learning To The Next Level.