Categories
topic

Spark GraphX and GraphFrames

Spark GraphX & GraphFrames

1. Overview

Apache Spark offers two powerful libraries for graph processing over distributed data:
GraphX and GraphFrames. Both let you model your data
as vertices (nodes) and edges (relationships), but they differ in APIs
and capabilities.

GraphX Vertex & Edge Tables

Figure 1: GraphX represents graphs via RDD-backed vertex and edge tables.

2. Usage & Use Cases

2.1 GraphX

  • API: Scala/Java only, built on RDDs with Graph objects.
  • Common Use Cases:
    • PageRank on web graphs
    • Connected components for community detection
    • Shortest-path algorithms in transportation networks

2.2 GraphFrames

  • API: Python, Scala, and SQL support over DataFrames.
  • Common Use Cases:
    • Motif finding to detect fraud patterns
    • Label propagation for clustering social networks
    • SQL-style shortest-path queries
GraphFrames Network Diagram

Figure 2: GraphFrames builds on DataFrames for SQL-like graph queries.

3. Example: Simple PageRank with GraphFrames

Below is a PySpark example that constructs a small graph, runs PageRank, and shows the top-ranked vertices.


# spark-submit --packages graphframes:graphframes:0.8.1-spark3.3-s_2.12

from pyspark.sql import SparkSession
from graphframes import GraphFrame

# Initialize Spark
spark = SparkSession.builder \
    .appName("GraphFramesPageRank") \
    .getOrCreate()

# Define vertices and edges
v = spark.createDataFrame([
    ("A", "Alice"),
    ("B", "Bob"),
    ("C", "Cathy"),
    ("D", "David")
], ["id", "name"])

e = spark.createDataFrame([
    ("A", "B"),
    ("B", "C"),
    ("C", "A"),
    ("A", "D")
], ["src", "dst"])

# Create GraphFrame and run PageRank
g = GraphFrame(v, e)
results = g.pageRank(resetProbability=0.15, maxIter=10)

# Display the top PageRank scores
results.vertices.orderBy("pagerank", ascending=False).show()
    
GraphFrames Motif Example

Figure 3: Example motif query detecting “A follows B and B follows C.”

Leave a Reply

Your email address will not be published. Required fields are marked *

Take Your Learning To The Next Level.