# How to configure a Hive metastore Spark supports reading and writing data from [Apache Hive](https://hive.apache.org) as described [here](https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html). This is useful if you want to access data in SQL tables through a JDBC/ODBC client instead of Parquet files through Python/R or `spark-sql`. It also provides access to Spark with [SQLAlchemy](https://www.sqlalchemy.org) through [PyHive](https://kyuubi.readthedocs.io/en/v1.8.0/client/python/pyhive.html). You can configure Spark with a Hive Metastore by running this command: ```console $ sparkctl configure --thrift-server --hive-metastore --metastore-dir /path/to/my/metastore ``` By default Spark uses [Apache Derby](https://db.apache.org/derby/) as the database for the metastore. This has a limitation: only one client can be connected to the metastore at a time. If you need multiple simultaneous connections to the metastore, you can use [PostgreSQL](https://www.postgresql.org) as the backend instead by running the following command: ``` $ sparkctl configure --thrift-server --hive-metastore --postgres-hive-metastore --metastore-dir /path/to/my/metastore ``` This takes a few extra minutes to start the first time, as it has to download a container and start the server. Apptainer will cache the container image and you can reuse the database data across Slurm allocations. **Note**: The metadata about your tables will be stored in Derby or Postgres. Your tables will be stored on the filesystem (Parquet files by default) in a directory called `spark_warehouse`, which gets created in the directory passed to `--metastore-dir` (current directory by default). Postgres data, if enabled, will be in the same directory (`pg_data`).