topic

Features & Advantages of Apache Spark

In-memory computation
Distributed processing using parallelize
Can be used with many cluster managers (Spark, Yarn, Mesos e.t.c)
Fault-tolerant
Immutable
Lazy evaluation
Cache & persistence
Inbuild-optimization when using DataFrames
Supports ANSI SQL
Spark is a general-purpose, in-memory, fault-tolerant, distributed processing engine that allows you to process data efficiently in a distributed fashion.
Applications running on Spark are 100x faster than traditional systems.
You will get great benefits from using Spark for data ingestion pipelines.

				
					spark.eventLog.enabled true
spark.history.fs.logDirectory file:///c:/logs/path

				
					$SPARK_HOME/sbin/start-history-server.sh

Take Your Learning To The Next Level.