topic

RDD Spark

Post author By nidisoft_vishain
Post date July 9, 2024
No Comments on RDD Spark

automation, industrial business process workflow optimisation concept on virtual digital screen

RDD (Resilient Distributed Dataset) is a fundamental data structure of Spark and it is the primary data abstraction in Apache Spark and the Spark Core. RDDs are fault-tolerant, immutable distributed collections of objects, which means once you create an RDD you cannot change it. Each dataset in RDD is divided into logical partitions, which can be computed on different nodes of the cluster.

				
					
df.createOrReplaceTempView("PERSON_DATA")
val df2 = spark.sql("SELECT * from PERSON_DATA")
df2.printSchema()
df2.show()

				
					
val groupDF = spark.sql("SELECT gender, count(*) from PERSON_DATA group by gender")
groupDF.show()

Get in touch

Quick Links

Support

Contact Us

Take Your Learning To The Next Level.

Leave a Reply Cancel reply

Get in touch

Quick Links

Support

Contact Us

Take Your Learning To The Next Level.