Explanation¶
Spark Cluster Overview¶
This Spark documentation page gives an overview of how Spark operates.
Cluster Mode¶
sparkctl always configures Spark clusters in standalone mode. Given that sparkctl expects clusters to be ephemeral, the greater sophistication of YARN and Kubernetes cluster managers is not required.
Submitting Applications¶
Please refer to this documentation page for Spark’s guidance on submitting applications.
To get all submission tools in a Python environment, install pyspark as follows:
$ pip install pyspark
Clients for other languages are available at the main Spark downloads page
Spark Connect¶
Spark Connect is a relatively new feature that simplifies client installation and configuration. Please refer to Spark’s documentation for details. If you want to configure and start a Spark cluster, and then connect to it, all within one Python session, this is the recommended workflow.
Note that there are some caveats listed here.
You enable the Spark connect server with this sparkctl command:
$ sparkctl configure --connect-server
To install the only the client for Python:
$ pip install pyspark-client