Spark GraphX and GraphFrames

Spark GraphX & GraphFrames

1. Overview

Apache Spark offers two powerful libraries for graph processing over distributed data:
GraphX and GraphFrames. Both let you model your data
as vertices (nodes) and edges (relationships), but they differ in APIs
and capabilities.

GraphX Vertex & Edge Tables — Figure 1: GraphX represents graphs via RDD-backed vertex and edge tables.

2. Usage & Use Cases

2.1 GraphX

API: Scala/Java only, built on RDDs with Graph objects.
Common Use Cases:
- PageRank on web graphs
- Connected components for community detection
- Shortest-path algorithms in transportation networks

2.2 GraphFrames

API: Python, Scala, and SQL support over DataFrames.
Common Use Cases:
- Motif finding to detect fraud patterns
- Label propagation for clustering social networks
- SQL-style shortest-path queries

GraphFrames Network Diagram — Figure 2: GraphFrames builds on DataFrames for SQL-like graph queries.

3. Example: Simple PageRank with GraphFrames

Below is a PySpark example that constructs a small graph, runs PageRank, and shows the top-ranked vertices.


# spark-submit --packages graphframes:graphframes:0.8.1-spark3.3-s_2.12

from pyspark.sql import SparkSession
from graphframes import GraphFrame

# Initialize Spark
spark = SparkSession.builder \
    .appName("GraphFramesPageRank") \
    .getOrCreate()

# Define vertices and edges
v = spark.createDataFrame([
    ("A", "Alice"),
    ("B", "Bob"),
    ("C", "Cathy"),
    ("D", "David")
], ["id", "name"])

e = spark.createDataFrame([
    ("A", "B"),
    ("B", "C"),
    ("C", "A"),
    ("A", "D")
], ["src", "dst"])

# Create GraphFrame and run PageRank
g = GraphFrame(v, e)
results = g.pageRank(resetProbability=0.15, maxIter=10)

# Display the top PageRank scores
results.vertices.orderBy("pagerank", ascending=False).show()

Spark GraphX & GraphFrames

1. Overview

2. Usage & Use Cases

2.1 GraphX

2.2 GraphFrames

3. Example: Simple PageRank with GraphFrames

Get in touch

Quick Links

Support

Contact Us

Take Your Learning To The Next Level.

Spark GraphX & GraphFrames

1. Overview

2. Usage & Use Cases

2.1 GraphX

2.2 GraphFrames

3. Example: Simple PageRank with GraphFrames

Leave a Reply Cancel reply

Get in touch

Quick Links

Support

Contact Us

Take Your Learning To The Next Level.