(explanation)=
# Explanation

## Spark Cluster Overview
This Spark [documentation page](https://spark.apache.org/docs/latest/cluster-overview.html) gives
an overview of how Spark operates.

## Cluster Mode
sparkctl always configures Spark clusters in
[standalone mode](https://spark.apache.org/docs/latest/spark-standalone.html). Given that sparkctl
expects clusters to be ephemeral, the greater sophistication of YARN and Kubernetes cluster managers
is not required.

## Submitting Applications
Please refer to this [documentation page](https://spark.apache.org/docs/latest/submitting-applications.html)
for Spark's guidance on submitting applications.

To get all submission tools in a Python environment, install pyspark as follows:
```console
$ pip install pyspark
```

Clients for other languages are available at the main Spark
[downloads page](https://spark.apache.org/downloads.html)

## Spark Connect
Spark Connect is a relatively new feature that simplifies client installation and configuration.
Please refer to Spark's [documentation](https://spark.apache.org/docs/latest/spark-connect-overview.html)
for details. If you want to configure and start a Spark cluster, and then connect to it, all within
one Python session, this is the recommended workflow.

Note that there are some caveats listed
[here](https://spark.apache.org/docs/latest/spark-connect-overview.html#how-spark-connect-client-applications-differ-from-classic-spark-applications).

You enable the Spark connect server with this sparkctl command:
```console
$ sparkctl configure --connect-server
```

To install the only the client for Python:
```console
$ pip install pyspark-client
```