Skip to content

Commit

Permalink
docs: design doc and better example code (#33)
Browse files Browse the repository at this point in the history
  • Loading branch information
murfffi authored Jan 17, 2025
1 parent 657aa1d commit 0264232
Show file tree
Hide file tree
Showing 5 changed files with 218 additions and 78 deletions.
73 changes: 34 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Golang Apache Impala Driver

<img src="./docs/logo.svg" width="64" alt="project logo - gopher with impala horns" align="right">

**The actively supported Apache Impala driver for Go's [database/sql](https://golang.org/pkg/database/sql) package**

This driver started as a fork of [github.com/bippio/go-impala](https://github.com/bippio/go-impala),
Expand Down Expand Up @@ -29,7 +31,7 @@ The connection string uses a URL format: impala://username:password@host:port?pa

* `auth` - string. Authentication mode. Supported values: "noauth", "ldap"
* `tls` - boolean. Enable TLS
* `ca-cert` - The file that contains the public key certificate of the CA that signed the impala certificate
* `ca-cert` - The file that contains the public key certificate of the CA that signed the Impala certificate
* `batch-size` - integer value (default: 1024). Maximum number of rows fetched per request
* `buffer-size`- in bytes (default: 4096); Buffer size for the Thrift transport
* `mem-limit` - string value (example: 3m); Memory limit for query
Expand Down Expand Up @@ -106,24 +108,25 @@ import (
)

func main() {

opts := impala.DefaultOptions

opts.Host = "<impala host>"
opts.Host = "localhost" // impala host
opts.Port = "21050"

// enable LDAP authentication:
opts.UseLDAP = true
opts.Username = "<ldap username>"
opts.Password = "<ldap password>"

//opts.UseLDAP = true
//opts.Username = "<ldap username>"
//opts.Password = "<ldap password>"
//
// enable TLS
opts.UseTLS = true
opts.CACertPath = "/path/to/cacert"
//opts.UseTLS = true
//opts.CACertPath = "/path/to/cacert"

connector := impala.NewConnector(&opts)
db := sql.OpenDB(connector)
defer db.Close()
defer func() {
_ = db.Close()
}()

ctx := context.Background()

Expand All @@ -132,49 +135,41 @@ func main() {
log.Fatal(err)
}

r := struct {
name string
comment string
}{}

var name, comment string
databases := make([]string, 0) // databases will contain all the DBs to enumerate later
for rows.Next() {
if err := rows.Scan(&r.name, &r.comment); err != nil {
if err := rows.Scan(&name, &comment); err != nil {
log.Fatal(err)
}
databases = append(databases, r.name)
databases = append(databases, name)
}
if err := rows.Err(); err != nil {
log.Fatal(err)
}
log.Println("List of Databases", databases)

stmt, err := db.PrepareContext(ctx, "SHOW TABLES IN ?")
tables, err := impala.NewMetadata(db).GetTables(ctx, "%", "%")
if err != nil {
log.Fatal(err)
}
log.Println("List of Tables", tables)
}
```

tbl := struct {
name string
}{}
## Support

for _, d := range databases {
rows, err := stmt.QueryContext(ctx, d)
if err != nil {
log.Printf("error in querying database %s: %s", d, err.Error())
continue
}
The library is actively tested with Impala 4.1 and 3.4.
All 3.x and 4.x minor versions should work well. 2.x is also supported
on a best-effort basis.

tables := make([]string, 0)
for rows.Next() {
if err := rows.Scan(&tbl.name); err != nil {
log.Println(err)
continue
}
tables = append(tables, tbl.name)
}
log.Printf("List of Tables in Database %s: %v\n", d, tables)
}
}
File any issues that you encounter as Github issues.

```
## Copyright and acknowledgements

This library started as a fork of [github.com/bippio/go-impala](https://github.com/bippio/go-impala),
under [the MIT license](https://github.com/bippio/go-impala/blob/ebab2bf/LICENSE). This library retains the same
license.

The [project logo](/docs/logo.svg) combines the Golang Gopher from
[github.com/golang-samples/gopher-vector](https://github.com/golang-samples/gopher-vector)
with the [Apache Impala logo](https://impala.apache.org/img/impala-logo.png), licensed under the Apache 2 license.
52 changes: 52 additions & 0 deletions docs/DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Design notes

## Non-goals

The goal of this library is to provide a database/sql driver and metadata/schema API for Apache Impala.
The following related capabilities are non-goals for now. This may change based on demand so feel free to open
an issue demanding it. It would need to gather some number of votes to become a goal though.

- **provide a query API to Impala beyond `database/sql`**
While the `database/sql` API does not expose the full capabilities of Impala e.g. async queries,
Go users of these features are better off calling the Impala API directly (generating their own Thrift bindings),
than using this library. If some of the code in `/internal` is valuable,
then copy it because this library maintains API stability and follows semantic versioning only for the public code.
["A little copying is better than a little dependency."](https://go-proverbs.github.io/)
- **fully support Hive, in addition to Impala**
As discussed below, this library is somewhat compatible with Hive, not just Impala. Nevertheless,
testing this library against Hive and resolving issues that occur only with Hive is not a goal.
Hive users are recommended to use [sqlflow.org/gohive](https://sqlflow.org/gohive).

## Impala driver or Hive driver

Impala [implements](https://impala.apache.org/docs/build/asf-site-html/topics/impala_client.html) the Hive remote API,
called [Hive Server 2](https://cwiki.apache.org/confluence/display/hive/hiveserver2+overview).
As a result, all Hive clients are compatible with Impala to some extent and vice versa.
For example, the Apache Impala OSS documentation even recommends that Java applications
[use the Hive JDBC driver](https://impala.apache.org/docs/build/asf-site-html/topics/impala_jdbc.html).

There is a Go database/sql client for Hive - https://sqlflow.org/gohive. However, in my testing,
it is a poor client for Impala. For example,

- **The Hive driver does not show errors, reported by Impala.**
The root cause is that the driver prints only `Respose.Status.InfoMessages[]`,
while Impala populates either `Response.Status.ErrorMessage` or `Response.State.ErrorMessage`
[depending on the error](https://github.com/sclgo/impala-go/blob/657aa1d/internal/isql/connection_test.go#L139).
- **Non-trivial Impala DML statements don't work with the Hive driver.**
The driver does not support async statement execution, while Impala HS2 server
[does not support](https://github.com/cloudera/impyla/issues/157#issuecomment-164090890)
sync execution. Impala ignores the RunAsync field in `TExecuteStatementReq` and assumes it is `true`,
even though the default is `false`. As a result, the Hive driver doesn't wait for
DML statement to complete and closes them immediately, cancelling them in the process.
In contrast, the Hive JDBC driver always uses async execution
[by default](https://issues.apache.org/jira/browse/HIVE-5232) so it works with Impala.

Outside Go, even though the Apache Impala OSS documentation recommends using Hive JDBC or ODBC drivers,
the commercial Cloudera Impala includes
[dedicated drivers](https://docs.cloudera.com/documentation/other/connectors/impala-jdbc/2-6-35/Cloudera-JDBC-Connector-for-Apache-Impala-Install-Guide.pdf),
which are free to use, but not OSS. In Python, there is a first-party OSS driver for Impala and Hive -
[impyla](https://github.com/cloudera/impyla) (cool name, btw). To support both engines, that driver
includes Impala-specific [hacks](https://github.com/cloudera/impyla/blob/ab1398a/impala/hiveserver2.py#L108)

Authors of PRs in this library are encouraged to also submit them to [sqlflow.org/gohive](https://sqlflow.org/gohive),
if the code can help the Hive driver too.
Binary file added docs/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
119 changes: 119 additions & 0 deletions docs/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 0264232

Please sign in to comment.