apache · jerryshao · Apr 22, 2024 · Apr 17, 2024 · Apr 22, 2024 · Apr 22, 2024
diff --git a/docs/spark-connector/spark-catalog-hive.md b/docs/spark-connector/spark-catalog-hive.md
@@ -0,0 +1,53 @@
+---
+title: "Spark connector hive catalog"
+slug: /spark-connector/spark-catalog-hive
+keyword: spark connector hive catalog
+license: "Copyright 2024 Datastrato Pvt Ltd.
+This software is licensed under the Apache License version 2."
+---
+
+With the Gravitino Spark connector, accessing data or managing metadata in Hive catalogs becomes straightforward, enabling seamless federation queries across different Hive catalogs.
+
+## Capabilities
+
+Supports most DDL and DML operations in SparkSQL, except such operations:
+
+- Function operations 
+- Partition operations
+- View operations
+- Querying UDF
+- `LOAD` clause
+- `CREATE TABLE LIKE` clause
+- `TRUCATE TABLE` clause
+
+## Requirement
+
+* Hive metastore 2.x
+* HDFS 2.x or 3.x
+
+## SQL example
+
+
+```sql
+
+// Suppose hive_a is the Hive catalog name managed by Gravitino
+USE hive_a;
+
+CREATE DATABASE IF NOT EXISTS mydatabase;
+USE mydatabase;
+
+// Create table
+CREATE TABLE IF NOT EXISTS employees (
+    id INT,
+    name STRING,
+    age INT
+)
+PARTITIONED BY (department STRING)
+STORED AS PARQUET;
+DESC TABLE EXTENDED employees;
+
+INSERT OVERWRITE TABLE employees PARTITION(department='Engineering') VALUES (1, 'John Doe', 30), (2, 'Jane Smith', 28);
+INSERT OVERWRITE TABLE employees PARTITION(department='Marketing') VALUES (3, 'Mike Brown', 32);
+
+SELECT * FROM employees WHERE department = 'Engineering';
+```
diff --git a/docs/spark-connector/spark-catalog-iceberg.md b/docs/spark-connector/spark-catalog-iceberg.md
@@ -0,0 +1,60 @@
+---
+title: "Spark connector Iceberg catalog"
+slug: /spark-connector/spark-catalog-iceberg
+keyword: spark connector iceberg catalog
+license: "Copyright 2024 Datastrato Pvt Ltd.
+This software is licensed under the Apache License version 2."
+---
+
+## Capabilities
+
+#### Support basic DML and DDL operations:
+
+- `CREATE TABLE` 
+
+Supports basic create table clause including table schema, properties, partition, does not support distribution and sort orders.
+
+- `DROP TABLE`
+- `ALTER TABLE`
+- `INSERT INTO&OVERWRITE`
+- `SELECT`
+- `DELETE` 
+
+Supports file delete only.
+
+#### Not supported operations:
+
+- Row level operations. like `MERGE INOT`, `DELETE FROM`, `UPDATE`
+- View operations.
+- Branching and tagging operations.
+- Spark procedures.
+- Other Iceberg extension SQL, like:
+  - `ALTER TABLE prod.db.sample ADD PARTITION FIELD xx`
+  - `ALTER TABLE ... WRITE ORDERED BY`
+
+## SQL example
+
+```sql
+// Suppose iceberg_a is the Iceberg catalog name managed by Gravitino
+USE iceberg_a;
+
+CREATE DATABASE IF NOT EXISTS mydatabase;
+USE mydatabase;
+
+CREATE TABLE IF NOT EXISTS employee (
+  id bigint,
+  name string,
+  department string,
+  hire_date timestamp
+) USING iceberg
+PARTITIONED BY (days(hire_date));
+DESC TABLE EXTENDED employee;
+
+INSERT INTO employee
+VALUES
+(1, 'Alice', 'Engineering', TIMESTAMP '2021-01-01 09:00:00'),
+(2, 'Bob', 'Marketing', TIMESTAMP '2021-02-01 10:30:00'),
+(3, 'Charlie', 'Sales', TIMESTAMP '2021-03-01 08:45:00');
+
+SELECT * FROM employee WHERE date(hire_date) = '2021-01-01'
+```
diff --git a/docs/spark-connector/spark-connector.md b/docs/spark-connector/spark-connector.md
@@ -0,0 +1,93 @@
+---
+title: "Gravitino Spark connector"
+slug: /spark-connector/spark-connector
+keyword: spark connector federation query 
+license: "Copyright 2024 Datastrato Pvt Ltd.
+This software is licensed under the Apache License version 2."
+---
+
+## Overview
+
+The Gravitino Spark connector leverages the Spark DataSourceV2 interface to facilitate the management of diverse catalogs under Gravitino. This capability allows users to perform federation queries, accessing data from various catalogs through a unified interface and consistent access control.
+
+## Capabilities
+
+1. Supports [Hive catalog](spark-catalog-hive.md) and [Iceberg catalog](spark-catalog-iceberg.md).
+2. Supports federation query.
+3. Supports most DDL and DML SQLs.
+
+## Requirement
+
+* Spark 3.4
+* Scala 2.12
+* JDK 8,11,17
+
+## How to use it
+
+1. [Build](../how-to-build.md) or download the Gravitino spark connector jar, and place it to the classpath of Spark.
+2. Configure the Spark session to use the Gravitino spark connector.
+
+| Property                     | Type   | Default Value | Description                                                                                         | Required | Since Version |
+|------------------------------|--------|---------------|-----------------------------------------------------------------------------------------------------|----------|---------------|
+| spark.plugins                | string | (none)        | Gravitino spark plugin name, `com.datastrato.gravitino.spark.connector.plugin.GravitinoSparkPlugin` | Yes      | 0.5.0         |
+| spark.sql.gravitino.metalake | string | (none)        | The metalake name that spark connector used to request to Gravitino.                                | Yes      | 0.5.0         |
+| spark.sql.gravitino.uri      | string | (none)        | The uri of Gravitino server address.                                                                | Yes      | 0.5.0         |
+
+```shell
+./bin/spark-sql -v \
+--conf spark.plugins="com.datastrato.gravitino.spark.connector.plugin.GravitinoSparkPlugin" \
+--conf spark.sql.gravitino.uri=http://127.0.0.1:8090 \
+--conf spark.sql.gravitino.metalake=test \
+--conf spark.sql.warehouse.dir=hdfs://127.0.0.1:9000/user/hive/warehouse-hive
+```
+
+3. Execute the Spark SQL query. 
+
+Suppose there are two catalogs in the metalake `test`, `hive` for Hive catalog and `iceberg` for Iceberg catalog. 
+
+```sql
+// use hive catalog
+USE hive;
+CREATE DATABASE db;
+USE db;
+CREATE TABLE hive_students (id INT, name STRING);
+INSERT INTO hive_students VALUES (1, 'Alice'), (2, 'Bob');
+
+// use Iceberg catalog
+USE iceberg;
+USE db;
+CREATE TABLE IF NOT EXISTS iceberg_scores (id INT, score INT) USING iceberg;
+INSERT INTO iceberg_scores VALUES (1, 95), (2, 88);
+
+// execute federation query between hive table and iceberg table
+SELECT hs.name, is.score FROM hive.db.hive_students hs JOIN iceberg_scores is ON hs.id = is.id;
+```
+
+:::info
+The command `SHOW CATALOGS` will only display the Spark default catalog, named spark_catalog, due to limitations within the Spark catalog manager. It does not list the catalogs present in the metalake. However, after explicitly using the `USE` command with a specific catalog name, that catalog name then becomes visible in the output of `SHOW CATALOGS`.
+:::
+
+## Datatype mapping
+
+Gravitino spark connector support the following datatype mapping between Spark and Gravitino.
+
+| Spark Data Type | Gravitino Data Type | Since Version |
+|-----------------|---------------------|---------------|
+| `BooleanType`   | `boolean`           | 0.5.0         |
+| `ByteType`      | `byte`              | 0.5.0         |
+| `ShortType`     | `short`             | 0.5.0         |
+| `IntegerType`   | `integer`           | 0.5.0         |
+| `LongType`      | `long`              | 0.5.0         |
+| `FloatType`     | `float`             | 0.5.0         |
+| `DoubleType`    | `double`            | 0.5.0         |
+| `DecimalType`   | `decimal`           | 0.5.0         |
+| `StringType`    | `string`            | 0.5.0         |
+| `CharType`      | `char`              | 0.5.0         |
+| `VarcharType`   | `varchar`           | 0.5.0         |
+| `TimestampType` | `timestamp`         | 0.5.0         |
+| `TimestampType` | `timestamp`         | 0.5.0         |
+| `DateType`      | `date`              | 0.5.0         |
+| `BinaryType`    | `binary`            | 0.5.0         |
+| `ArrayType`     | `array`             | 0.5.0         |
+| `MapType`       | `map`               | 0.5.0         |
+| `StructType`    | `struct`            | 0.5.0         |