Apache Spark Installation Guide for Windows & macOS
A. Prerequisites (Both Windows & macOS)
- Java JDK 11+
Download and install from OpenJDK 11 or Oracle JDK.
java -version
- Python 3.8+ (Optional for PySpark)
Install from python.org or via your package manager.
python3 --version
- (Windows only) Hadoop
winutils.exe
Download matching version (e.g. Hadoop 3.3.1) from GitHub and placewinutils.exe
inC:\hadoop\bin
.
B. Installation on Windows
- Create Folders
C:\spark C:\hadoop\bin ← place winutils.exe here
- Download & Unpack Spark
1. Go to spark.apache.org/downloads.html
2. Select “Spark 3.5.0 pre-built for Hadoop 3.3+” and unzip intoC:\spark
(e.g.C:\spark\spark-3.5.0-bin-hadoop3.3
). - Configure Environment Variables
In **System → Advanced → Environment Variables** add:
Then prepend to **Path**:HADOOP_HOME = C:\hadoop SPARK_HOME = C:\spark\spark-3.5.0-bin-hadoop3.3 JAVA_HOME = C:\Program Files\Java\jdk-11.x.x
%HADOOP_HOME%\bin %SPARK_HOME%\bin
- Verify Spark Shell
Open a new PowerShell or CMD and run:spark-shell
- Optional: PySpark
pyspark
C. Installation on macOS
- Install Java
brew install openjdk@11 echo 'export JAVA_HOME="/usr/local/opt/openjdk@11"' >> ~/.zshrc echo 'export PATH="$JAVA_HOME/bin:$PATH"' >> ~/.zshrc source ~/.zshrc
- Install Scala (Optional)
brew install scala
- Download & Unpack Spark
curl -O https://downloads.apache.org/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.3.tgz tar xzf spark-3.5.0-bin-hadoop3.3.tgz mv spark-3.5.0-bin-hadoop3.3 ~/spark
- Configure Environment Variables
Add to~/.zshrc
:
Then run:export SPARK_HOME=~/spark export PATH="$SPARK_HOME/bin:$PATH"
source ~/.zshrc
- Verify Spark Shell & PySpark
spark-shell pyspark
D. Quick Smoke Test
In either OS, run one of these in the shell to confirm:
// Scala (spark-shell)
spark.range(1, 1000000).selectExpr("sum(id)").show()
# Python (pyspark)
df = spark.range(1, 1000000)
df.selectExpr("sum(id)").show()
If you see the sum output without errors, your Spark setup is complete!