sparkctl API¶
- sparkctl.config.make_default_spark_config() SparkConfig ¶
Return a SparkConfig created from the user’s config file.
- class sparkctl.cluster_manager.ClusterManager(config: SparkConfig, status: StatusTracker | None = None)¶
Manages operation of the Spark cluster.
- classmethod from_config(config: SparkConfig) Self ¶
Create a ClusterManager from a config instance.
Examples
>>> from sparkctl import ClusterManager, make_default_spark_config >>> config = make_default_spark_config() >>> config.runtime.start_connect_server = True >>> mgr = ClusterManager.from_config(config)
See also
- classmethod from_config_file(config_file: Path | str | None = None) Self ¶
Create a ClusterManager from a config file. If filename is None, use the default config file (e.g., ~/.sparkctl.toml).
Examples
>>> from sparkctl import ClusterManager >>> mgr = ClusterManager.from_config_file(config_file="config.json")
See also
- classmethod load(directory: Path | str | None = None) Self ¶
Load an active cluster manager from a directory containg a previously-created sparkctl config.
- Parameters:
directory – Directory containing the sparkctl configuration files. Defaults to the current directory.
Examples
>>> from sparkctl import ClusterManager >>> mgr = ClusterManager.load()
>>> mgr = ClusterManager.load(directory="path/to/sparkctl/config")
See also
- clean() None ¶
Delete all Spark runtime files in the directory.
- configure() None ¶
Configure a Spark cluster based on the input parameters.
Examples
>>> from sparkctl import ClusterManager >>> mgr = ClusterManager.from_config_file("config.json") >>> mgr.configure()
- get_spark_session() SparkSession ¶
Return a SparkSession for the current cluster.
Examples
>>> spark = mgr.get_spark_session() >>> spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"]).show()
- set_workers(workers: list[str]) None ¶
Set the workers for the cluster. Must be called after
configure()
and beforestart()
.- Parameters:
workers – Worker node names or IP addresses, will be used as ssh targets.
Examples
>>> from sparkctl import ClusterManager >>> mgr = ClusterManager.from_config(make_default_spark_config()) >>> mgr.configure() >>> mgr.set_workers(["worker1", "worker2"]) >>> mgr.start()
- get_workers() list[str] ¶
Return the current worker node names.
- start(print_env_paths: bool = True) None ¶
Start the Spark cluster. The caller must have called
configure()
beforehand.The environment variables SPARK_CONF_DIR and JAVA_HOME are set to correct values for the current process.
Examples
>>> from sparkctl import ClusterManager >>> mgr = ClusterManager.from_config_file("config.json") >>> mgr.configure() >>> mgr.start()
- managed_cluster() Generator[SparkSession, None, None] ¶
Configure and start the Spark cluster, yield a SparkSession in a context manager, stop the cluster after exit.
The environment variables SPARK_CONF_DIR and JAVA_HOME are set to correct values for the current process while the context is active and cleared when complete.
Examples
>>> from sparkctl import ClusterManager >>> mgr = ClusterManager.from_config_file("config.json") >>> with mgr.managed_start() as spark: df = spark.createDataFrame([(1, 2), (3, 4)], ["a", "b"]) df.show()
- stop() None ¶
Stop all Spark processes.
Examples
>>> from sparkctl import ClusterManager >>> mgr = ClusterManager.from_config_file("config.json") >>> mgr.configure() >>> mgr.start() >>> mgr.stop()
- pydantic model sparkctl.models.SparkConfig¶
Contains all Spark configuration parameters.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Show JSON schema
{ "title": "SparkConfig", "description": "Contains all Spark configuration parameters.", "type": "object", "properties": { "binaries": { "$ref": "#/$defs/BinaryLocations" }, "runtime": { "$ref": "#/$defs/SparkRuntimeParams", "default": { "executor_cores": 5, "executor_memory_gb": null, "driver_memory_gb": 10, "node_memory_overhead_gb": 10, "use_local_storage": false, "start_connect_server": false, "start_history_server": false, "start_thrift_server": false, "spark_log_level": null, "enable_dynamic_allocation": false, "shuffle_partition_multiplier": 1, "enable_hive_metastore": false, "enable_postgres_hive_metastore": false, "postgres_password": "b3812f74-0d88-4465-a581-be14d5354753", "python_path": null, "spark_defaults_template_file": null } }, "directories": { "$ref": "#/$defs/RuntimeDirectories", "default": { "base": "/home/runner/work/sparkctl/sparkctl/docs", "spark_scratch": "/home/runner/work/sparkctl/sparkctl/docs/spark_scratch", "metastore_dir": "/home/runner/work/sparkctl/sparkctl/docs" } }, "compute": { "$ref": "#/$defs/ComputeParams", "default": { "environment": "slurm", "postgres": { "setup_metastore": "postgres/setup_metastore.sh", "start_container": "postgres/start_container.sh", "stop_container": "postgres/stop_container.sh" } } }, "resource_monitor": { "$ref": "#/$defs/ResourceMonitorConfig", "default": { "cpu": true, "disk": true, "memory": true, "network": true, "interval": 5, "enabled": false } }, "app": { "$ref": "#/$defs/AppParams", "default": { "console_level": "INFO", "file_level": "DEBUG", "reraise_exceptions": false } } }, "$defs": { "AppParams": { "additionalProperties": false, "properties": { "console_level": { "default": "INFO", "description": "Console log level", "title": "Console Level", "type": "string" }, "file_level": { "default": "DEBUG", "description": "File log level", "title": "File Level", "type": "string" }, "reraise_exceptions": { "default": false, "description": "Reraise sparkctl exceptions in the CLI handler. Not recommended for users. Useful for developers when debugging issues.", "title": "Reraise Exceptions", "type": "boolean" } }, "title": "AppParams", "type": "object" }, "BinaryLocations": { "additionalProperties": false, "description": "Locations to the Spark and dependent software. Hadoop, Hive, and the PostgreSQL jar file\nare only required if the user wants to enable a Postgres-based Hive metastore.", "properties": { "spark_path": { "description": "Path to the Spark binaries.", "format": "path", "title": "Spark Path", "type": "string" }, "java_path": { "description": "Path to the Java binaries.", "format": "path", "title": "Java Path", "type": "string" }, "hadoop_path": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to the Hadoop binaries.", "title": "Hadoop Path" }, "hive_tarball": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to the Hive binaries.", "title": "Hive Tarball" }, "postgresql_jar_file": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to the PostgreSQL jar file.", "title": "Postgresql Jar File" } }, "required": [ "spark_path", "java_path" ], "title": "BinaryLocations", "type": "object" }, "ComputeEnvironment": { "description": "Defines the supported compute environments.", "enum": [ "native", "slurm" ], "title": "ComputeEnvironment", "type": "string" }, "ComputeParams": { "additionalProperties": false, "properties": { "environment": { "$ref": "#/$defs/ComputeEnvironment", "default": "slurm" }, "postgres": { "$ref": "#/$defs/PostgresScripts", "default": { "start_container": "postgres/start_container.sh", "stop_container": "postgres/stop_container.sh", "setup_metastore": "postgres/setup_metastore.sh" } } }, "title": "ComputeParams", "type": "object" }, "PostgresScripts": { "additionalProperties": false, "description": "Scripts that setup a PostgreSQL database for use in a Hive metastore.\nRelative paths are assumed to be based on the root path of the sparkctl package.\nAbsolute paths can be anywhere on the filesystem.", "properties": { "start_container": { "default": "postgres/start_container.sh", "title": "Start Container", "type": "string" }, "stop_container": { "default": "postgres/stop_container.sh", "title": "Stop Container", "type": "string" }, "setup_metastore": { "default": "postgres/setup_metastore.sh", "title": "Setup Metastore", "type": "string" } }, "title": "PostgresScripts", "type": "object" }, "ResourceMonitorConfig": { "additionalProperties": false, "description": "Defines the resource stats to monitor.", "properties": { "cpu": { "default": true, "description": "Monitor CPU utilization", "title": "Cpu", "type": "boolean" }, "disk": { "default": true, "description": "Monitor disk/storage utilization", "title": "Disk", "type": "boolean" }, "memory": { "default": true, "description": "Monitor memory utilization", "title": "Memory", "type": "boolean" }, "network": { "default": true, "description": "Monitor network utilization", "title": "Network", "type": "boolean" }, "interval": { "default": 5, "description": "Interval in seconds on which to collect stats", "title": "Interval", "type": "integer" }, "enabled": { "default": false, "description": "Enable resource monitoring.", "title": "Enabled", "type": "boolean" } }, "title": "ResourceMonitorConfig", "type": "object" }, "RuntimeDirectories": { "additionalProperties": false, "description": "Defines the directories to be used by a Spark cluster.", "properties": { "base": { "default": ".", "description": "Base directory for the cluster configuration", "format": "path", "title": "Base", "type": "string" }, "spark_scratch": { "default": "spark_scratch", "description": "Directory to use for shuffle data.", "format": "path", "title": "Spark Scratch", "type": "string" }, "metastore_dir": { "default": ".", "description": "Set a custom directory for the metastore and warehouse.", "format": "path", "title": "Metastore Dir", "type": "string" } }, "title": "RuntimeDirectories", "type": "object" }, "SparkRuntimeParams": { "additionalProperties": false, "description": "Controls Spark runtime parameters.", "properties": { "executor_cores": { "default": 5, "description": "Number of cores per executor", "title": "Executor Cores", "type": "integer" }, "executor_memory_gb": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Memory per executor in GB. By default, auto-determine by using what is available. This can also be set implicitly by increasing executor_cores.", "title": "Executor Memory Gb" }, "driver_memory_gb": { "default": 10, "description": "Driver memory in GB. This is the maximum amount of data that can be pulled into the application.", "title": "Driver Memory Gb", "type": "integer" }, "node_memory_overhead_gb": { "default": 10, "description": "Memory to reserve for system processes.", "title": "Node Memory Overhead Gb", "type": "integer" }, "use_local_storage": { "default": false, "description": "Use compute node local storage for shuffle data.", "title": "Use Local Storage", "type": "boolean" }, "start_connect_server": { "default": false, "description": "Enable the Spark connect server.", "title": "Start Connect Server", "type": "boolean" }, "start_history_server": { "default": false, "description": "Enable the Spark history server.", "title": "Start History Server", "type": "boolean" }, "start_thrift_server": { "default": false, "description": "Enable the Thrift server to connect a SQL client.", "title": "Start Thrift Server", "type": "boolean" }, "spark_log_level": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Set the root log level for all Spark processes. Defaults to Spark's defaults.", "title": "Spark Log Level" }, "enable_dynamic_allocation": { "default": false, "description": "Enable Spark dynamic resource allocation.", "title": "Enable Dynamic Allocation", "type": "boolean" }, "shuffle_partition_multiplier": { "default": 1, "description": "Spark SQL shuffle partition multiplier (multipy by the number of worker CPUs)", "title": "Shuffle Partition Multiplier", "type": "integer" }, "enable_hive_metastore": { "default": false, "description": "Create a Hive metastore with Spark defaults (Apache Derby). Supports only one Spark session.", "title": "Enable Hive Metastore", "type": "boolean" }, "enable_postgres_hive_metastore": { "default": false, "description": "Create a metastore with PostgreSQL. Supports multiple Spark sessions.", "title": "Enable Postgres Hive Metastore", "type": "boolean" }, "postgres_password": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Password for PostgreSQL.", "title": "Postgres Password" }, "python_path": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Python path to set for Spark workers. Use the Python inside the Spark distribution by default.", "title": "Python Path" }, "spark_defaults_template_file": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to a custom spark-defaults.conf template file. If not set, use the sparkctl defaults.", "title": "Spark Defaults Template File" } }, "title": "SparkRuntimeParams", "type": "object" } }, "additionalProperties": false, "required": [ "binaries" ] }
- Config:
str_strip_whitespace: bool = True
validate_assignment: bool = True
validate_default: bool = True
extra: str = forbid
use_enum_values: bool = False
arbitrary_types_allowed: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- field app: AppParams = AppParams(console_level='INFO', file_level='DEBUG', reraise_exceptions=False)¶
- field binaries: BinaryLocations [Required]¶
- field compute: ComputeParams = ComputeParams(environment=<ComputeEnvironment.SLURM: 'slurm'>, postgres=PostgresScripts(start_container='postgres/start_container.sh', stop_container='postgres/stop_container.sh', setup_metastore='postgres/setup_metastore.sh'))¶
- field directories: RuntimeDirectories = RuntimeDirectories(base=PosixPath('/home/runner/work/sparkctl/sparkctl/docs'), spark_scratch=PosixPath('/home/runner/work/sparkctl/sparkctl/docs/spark_scratch'), metastore_dir=PosixPath('/home/runner/work/sparkctl/sparkctl/docs'))¶
- field resource_monitor: ResourceMonitorConfig = ResourceMonitorConfig(cpu=True, disk=True, memory=True, network=True, interval=5, enabled=False)¶
- field runtime: SparkRuntimeParams = SparkRuntimeParams(executor_cores=5, executor_memory_gb=None, driver_memory_gb=10, node_memory_overhead_gb=10, use_local_storage=False, start_connect_server=False, start_history_server=False, start_thrift_server=False, spark_log_level=None, enable_dynamic_allocation=False, shuffle_partition_multiplier=1, enable_hive_metastore=False, enable_postgres_hive_metastore=False, postgres_password='b3812f74-0d88-4465-a581-be14d5354753', python_path=None, spark_defaults_template_file=None)¶
- pydantic model sparkctl.models.BinaryLocations¶
Locations to the Spark and dependent software. Hadoop, Hive, and the PostgreSQL jar file are only required if the user wants to enable a Postgres-based Hive metastore.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Show JSON schema
{ "title": "BinaryLocations", "description": "Locations to the Spark and dependent software. Hadoop, Hive, and the PostgreSQL jar file\nare only required if the user wants to enable a Postgres-based Hive metastore.", "type": "object", "properties": { "spark_path": { "description": "Path to the Spark binaries.", "format": "path", "title": "Spark Path", "type": "string" }, "java_path": { "description": "Path to the Java binaries.", "format": "path", "title": "Java Path", "type": "string" }, "hadoop_path": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to the Hadoop binaries.", "title": "Hadoop Path" }, "hive_tarball": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to the Hive binaries.", "title": "Hive Tarball" }, "postgresql_jar_file": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to the PostgreSQL jar file.", "title": "Postgresql Jar File" } }, "additionalProperties": false, "required": [ "spark_path", "java_path" ] }
- Config:
str_strip_whitespace: bool = True
validate_assignment: bool = True
validate_default: bool = True
extra: str = forbid
use_enum_values: bool = False
arbitrary_types_allowed: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- Validators:
- field hadoop_path: Path | None = None¶
Path to the Hadoop binaries.
- Validated by:
- field hive_tarball: Path | None = None¶
Path to the Hive binaries.
- Validated by:
- field java_path: Path [Required]¶
Path to the Java binaries.
- Validated by:
- field postgresql_jar_file: Path | None = None¶
Path to the PostgreSQL jar file.
- Validated by:
- field spark_path: Path [Required]¶
Path to the Spark binaries.
- Validated by:
- validator make_absolute » java_path, hadoop_path, postgresql_jar_file, spark_path, hive_tarball¶
- pydantic model sparkctl.models.ComputeParams¶
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Show JSON schema
{ "title": "ComputeParams", "type": "object", "properties": { "environment": { "$ref": "#/$defs/ComputeEnvironment", "default": "slurm" }, "postgres": { "$ref": "#/$defs/PostgresScripts", "default": { "start_container": "postgres/start_container.sh", "stop_container": "postgres/stop_container.sh", "setup_metastore": "postgres/setup_metastore.sh" } } }, "$defs": { "ComputeEnvironment": { "description": "Defines the supported compute environments.", "enum": [ "native", "slurm" ], "title": "ComputeEnvironment", "type": "string" }, "PostgresScripts": { "additionalProperties": false, "description": "Scripts that setup a PostgreSQL database for use in a Hive metastore.\nRelative paths are assumed to be based on the root path of the sparkctl package.\nAbsolute paths can be anywhere on the filesystem.", "properties": { "start_container": { "default": "postgres/start_container.sh", "title": "Start Container", "type": "string" }, "stop_container": { "default": "postgres/stop_container.sh", "title": "Stop Container", "type": "string" }, "setup_metastore": { "default": "postgres/setup_metastore.sh", "title": "Setup Metastore", "type": "string" } }, "title": "PostgresScripts", "type": "object" } }, "additionalProperties": false }
- Config:
str_strip_whitespace: bool = True
validate_assignment: bool = True
validate_default: bool = True
extra: str = forbid
use_enum_values: bool = False
arbitrary_types_allowed: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- field environment: ComputeEnvironment = ComputeEnvironment.SLURM¶
- field postgres: PostgresScripts = PostgresScripts(start_container='postgres/start_container.sh', stop_container='postgres/stop_container.sh', setup_metastore='postgres/setup_metastore.sh')¶
- pydantic model sparkctl.models.SparkRuntimeParams¶
Controls Spark runtime parameters.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Show JSON schema
{ "title": "SparkRuntimeParams", "description": "Controls Spark runtime parameters.", "type": "object", "properties": { "executor_cores": { "default": 5, "description": "Number of cores per executor", "title": "Executor Cores", "type": "integer" }, "executor_memory_gb": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Memory per executor in GB. By default, auto-determine by using what is available. This can also be set implicitly by increasing executor_cores.", "title": "Executor Memory Gb" }, "driver_memory_gb": { "default": 10, "description": "Driver memory in GB. This is the maximum amount of data that can be pulled into the application.", "title": "Driver Memory Gb", "type": "integer" }, "node_memory_overhead_gb": { "default": 10, "description": "Memory to reserve for system processes.", "title": "Node Memory Overhead Gb", "type": "integer" }, "use_local_storage": { "default": false, "description": "Use compute node local storage for shuffle data.", "title": "Use Local Storage", "type": "boolean" }, "start_connect_server": { "default": false, "description": "Enable the Spark connect server.", "title": "Start Connect Server", "type": "boolean" }, "start_history_server": { "default": false, "description": "Enable the Spark history server.", "title": "Start History Server", "type": "boolean" }, "start_thrift_server": { "default": false, "description": "Enable the Thrift server to connect a SQL client.", "title": "Start Thrift Server", "type": "boolean" }, "spark_log_level": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Set the root log level for all Spark processes. Defaults to Spark's defaults.", "title": "Spark Log Level" }, "enable_dynamic_allocation": { "default": false, "description": "Enable Spark dynamic resource allocation.", "title": "Enable Dynamic Allocation", "type": "boolean" }, "shuffle_partition_multiplier": { "default": 1, "description": "Spark SQL shuffle partition multiplier (multipy by the number of worker CPUs)", "title": "Shuffle Partition Multiplier", "type": "integer" }, "enable_hive_metastore": { "default": false, "description": "Create a Hive metastore with Spark defaults (Apache Derby). Supports only one Spark session.", "title": "Enable Hive Metastore", "type": "boolean" }, "enable_postgres_hive_metastore": { "default": false, "description": "Create a metastore with PostgreSQL. Supports multiple Spark sessions.", "title": "Enable Postgres Hive Metastore", "type": "boolean" }, "postgres_password": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Password for PostgreSQL.", "title": "Postgres Password" }, "python_path": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Python path to set for Spark workers. Use the Python inside the Spark distribution by default.", "title": "Python Path" }, "spark_defaults_template_file": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to a custom spark-defaults.conf template file. If not set, use the sparkctl defaults.", "title": "Spark Defaults Template File" } }, "additionalProperties": false }
- Config:
str_strip_whitespace: bool = True
validate_assignment: bool = True
validate_default: bool = True
extra: str = forbid
use_enum_values: bool = False
arbitrary_types_allowed: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- Validators:
- field driver_memory_gb: int = 10¶
Driver memory in GB. This is the maximum amount of data that can be pulled into the application.
- field enable_dynamic_allocation: bool = False¶
Enable Spark dynamic resource allocation.
- field enable_hive_metastore: bool = False¶
Create a Hive metastore with Spark defaults (Apache Derby). Supports only one Spark session.
- field enable_postgres_hive_metastore: bool = False¶
Create a metastore with PostgreSQL. Supports multiple Spark sessions.
- field executor_cores: int = 5¶
Number of cores per executor
- field executor_memory_gb: int | None = None¶
Memory per executor in GB. By default, auto-determine by using what is available. This can also be set implicitly by increasing executor_cores.
- field node_memory_overhead_gb: int = 10¶
Memory to reserve for system processes.
- field postgres_password: str | None = None¶
Password for PostgreSQL.
- Validated by:
- field python_path: str | None = None¶
Python path to set for Spark workers. Use the Python inside the Spark distribution by default.
- field shuffle_partition_multiplier: int = 1¶
Spark SQL shuffle partition multiplier (multipy by the number of worker CPUs)
- field spark_defaults_template_file: Path | None = None¶
Path to a custom spark-defaults.conf template file. If not set, use the sparkctl defaults.
- field spark_log_level: str | None = None¶
Set the root log level for all Spark processes. Defaults to Spark’s defaults.
- field start_connect_server: bool = False¶
Enable the Spark connect server.
- field start_history_server: bool = False¶
Enable the Spark history server.
- field start_thrift_server: bool = False¶
Enable the Thrift server to connect a SQL client.
- field use_local_storage: bool = False¶
Use compute node local storage for shuffle data.
- validator set_postgres_password » postgres_password¶
- pydantic model sparkctl.models.RuntimeDirectories¶
Defines the directories to be used by a Spark cluster.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Show JSON schema
{ "title": "RuntimeDirectories", "description": "Defines the directories to be used by a Spark cluster.", "type": "object", "properties": { "base": { "default": ".", "description": "Base directory for the cluster configuration", "format": "path", "title": "Base", "type": "string" }, "spark_scratch": { "default": "spark_scratch", "description": "Directory to use for shuffle data.", "format": "path", "title": "Spark Scratch", "type": "string" }, "metastore_dir": { "default": ".", "description": "Set a custom directory for the metastore and warehouse.", "format": "path", "title": "Metastore Dir", "type": "string" } }, "additionalProperties": false }
- Config:
str_strip_whitespace: bool = True
validate_assignment: bool = True
validate_default: bool = True
extra: str = forbid
use_enum_values: bool = False
arbitrary_types_allowed: bool = True
populate_by_name: bool = True
validate_by_alias: bool = True
validate_by_name: bool = True
- Fields:
- Validators:
- field base: Path = PosixPath('.')¶
Base directory for the cluster configuration
- Validated by:
- field metastore_dir: Path = PosixPath('.')¶
Set a custom directory for the metastore and warehouse.
- Validated by:
- field spark_scratch: Path = PosixPath('spark_scratch')¶
Directory to use for shuffle data.
- Validated by:
- validator make_absolute » spark_scratch, metastore_dir, base¶
- clean_spark_conf_dir() Path ¶
Ensure that the Spark conf dir exists and is clean.
- get_events_dir() Path ¶
Return the file path to hive-site.xml
- get_hive_site_file() Path ¶
Return the file path to hive-site.xml
- get_spark_conf_dir() Path ¶
Return the Spark conf directory
- get_spark_defaults_file() Path ¶
Return the file path to spark-defaults.conf
- get_spark_env_file() Path ¶
Return the file path to spark-env.sh
- get_spark_log_file() Path ¶
Return the file path to log properties file
- get_workers_file() Path ¶
Return the file path to workers
- class sparkctl.models.ComputeEnvironment(*values)¶
Defines the supported compute environments.