Categories
topic

Apache Spark Architecture

Spark works in a master-slave architecture where the master is called the “Driver” and slaves are called “Workers”. When you run a Spark application, Spark Driver creates a context that is an entry point to your application, and all operations (transformations and actions) are executed on worker nodes, and the resources are managed by Cluster Manager.

				
					//Create RDD from parallelize    
val dataSeq = Seq(("Java", 20000), ("Python", 100000), ("Scala", 3000))   
val rdd=spark.sparkContext.parallelize(dataSeq)
				
			
				
					
//Create RDD from external Data source
val rdd2 = spark.sparkContext.textFile("/path/textFile.txt")
				
			

Leave a Reply

Your email address will not be published. Required fields are marked *

Take Your Learning To The Next Level.