Categories
topic

Spark Streaming

Spark Streaming is a scalable, high-throughput, fault-tolerant streaming processing system that supports both batch and streaming workloads. It is used to process real-time data from sources like file system folders, TCP sockets, S3, Kafka, Flume, Twitter, and Kinesis to name a few. The processed data can be pushed to databases, Kafka, live dashboards e.t.c
				
					
//Create RDD from parallelize    
val dataSeq = Seq(("Java", 20000), ("Python", 100000), ("Scala", 3000))   
val rdd=spark.sparkContext.parallelize(dataSeq)
				
			

Leave a Reply

Your email address will not be published. Required fields are marked *

Take Your Learning To The Next Level.