database/sql: allow driver to entirely override Scan #67546

jackc · 2024-05-21T15:29:58Z

Proposal Details

I am the creator of the https://github.com/jackc/pgx/ PostgreSQL driver.

pgx implements a very flexible and convenient type system for its native interface. Most types "just work" without needing to add any pgx or database/sql interfaces. Slices like []int64 can be used directly as query arguments and scan targets. A map[string]string can be used as a PostgreSQL json or hstore value even though the encoding formats are different. This can only work because the encoding and decoding logic has access to connection specific information (e.g. registered Codecs for each type), the PostgreSQL type (OID), the value format (text or binary), and the Go type.

I've been trying with very limited success to expose the this functionality and the relevant types to the database/sql interface. The problem is that an implementer of the sql.Scanner or driver.Valuer interfaces does not have any way of accessing the connection information, the PostgreSQL value type, or the value format.

For query arguments, this can be overridden by implementing NamedValueChecker and simply passing all values through unmodified. So at this point, all of pgx's advanced type handling is available for query arguments.

However, there does not seem to be the same for Scan. I propose adding a driver.RowsColumnScanner interface.

type RowsColumnScanner interface {
	ScanColumn(i int, dest any) error
}

If this interface is implemented it could completely replace the existing database/sql scan logic. Drivers could implement arbitrary type support and it could significantly improve performance. ErrSkip could be returned to fallback to the existing database/sql logic.

See #22544 for a related proposal. However, I prefer this proposal as the scanning interface is on the row not the connection or driver. This allows the decoding logic to know what the underlying database type and value format are.

As far as I know this would be an entirely backwards compatible change. The only subtlety I am aware of is a RowsColumnScanner would have to support ScanColumn being called multiple times on the same column to preserve support for calling rows.Scan multiple times on a single row.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2024-05-21T18:16:59Z

CC @kardianos

kardianos · 2024-05-23T12:49:55Z

Given the design of database/sql, I see no issue with this interface.
@jackc Do you want to submit the Go CL? If not, I could send it up.

Implementing RowsColumnScanner allows the driver to completely control how values are scanned. Fixes golang#67546

gopherbot · 2024-05-25T15:50:39Z

Change https://go.dev/cl/588435 mentions this issue: database/sql: allow drivers to override Scan behavior

jackc · 2024-05-25T16:15:19Z

@kardianos I implemented the change in #67648. I also made a PR for pgx that implements this interface (jackc/pgx#2029).

With both of those PRs applied, the queries like the following work in pgx through database/sql:

var names []string
err := db.QueryRow("select array['John', 'Jane']::text[]").Scan(&names)

I observed a few things while making the change:

It's possible for a driver to inadvertently change the way some of the scanning edge cases work. For example, database/sql defines the behavior for scanning a time into a string as formatting the time as time.RFC3339Nano. However, when scanning to a string, pgx will simply copy any text formatted values directly. So if a timestamptz in the text format was received from PostgreSQL scanning to a string would have different behavior. This gives significantly better performance then parsing a string into a time.Time and then formatting it. But it's not strictly in spec.

There is a fair amount of work involved in creating the slice of driver.Valuer for Next(). If all columns are handled with RowsColumnScanner then that work is unnecessary. A driver could "cheat" and stub out the work. Not sure if that is a good idea or not, but I suspect it could provide a performance boost.

rsc · 2024-06-12T20:42:52Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

kardianos · 2024-06-15T12:35:10Z

@jackc This is pending the proposal review group. But a reply to your observations:

@kardianos I implemented the change in #67648. I also made a PR for pgx that implements this interface (jackc/pgx#2029).

With both of those PRs applied, the queries like the following work in pgx through database/sql:
var names []string
err := db.QueryRow("select array['John', 'Jane']::text[]").Scan(&names)

Nice. Being able scan arrays directly is useful.

I observed a few things while making the change:

It's possible for a driver to inadvertently change the way some of the scanning edge cases work. For example, database/sql defines the behavior for scanning a time into a string as formatting the time as time.RFC3339Nano. However, when scanning to a string, pgx will simply copy any text formatted values directly. So if a timestamptz in the text format was received from PostgreSQL scanning to a string would have different behavior. This gives significantly better performance then parsing a string into a time.Time and then formatting it. But it's not strictly in spec.

I'm personally okay with that.

There is a fair amount of work involved in creating the slice of driver.Valuer for Next(). If all columns are handled with RowsColumnScanner then that work is unnecessary. A driver could "cheat" and stub out the work. Not sure if that is a good idea or not, but I suspect it could provide a performance boost.

So long as the new driver interface can return an ErrSkip, because Next is called first, and thus the driver value array is created per returned data set, I don't see a reasonable way to work around that. You could work around that, but the API change would be much larger. I'm not terribly interested in single row benchmarks on a local machine; in most real situations, if you need more performance, you will coalesce many single row queries into a one many row query. For example, yesterday I was annoyed at a slowly performing function, inspected it, and found the cause to be a single row query in a loop. I pulled it out, allocated an array, then encoded a JSON parameter and did a single query at the end and I went from multiple seconds to a single millisecond operation. I'm not concerned about saving a single allocation in Next per result set. If the driver "knows" it will call the new interface on Scan, then on Next, it can just ignore that work (stub it out, "cheat"), which I think is fine, but that would be up to the driver to do.

jackc · 2024-06-15T13:21:18Z

@kardianos

I'm not concerned about saving a single allocation in Next per result set.

It could potentially remove significantly more overhead than just the allocation of a []driver.Valuer. The real win is supporting the PostgreSQL binary format for types that database/sql doesn't support. pgx uses the binary format for integers because their driver.Valuer representation is int64. But many useful types like uuid, arrays, and ranges do not have a driver.Valuer representation. In these cases, pgx uses the text format as it assumes that the caller either wants a string or is using a sql.Scanner that expects a string. These types tend to be far more expensive to parse in the text format than the binary format.

For example, I created a quick benchmark of a query that selects 1,000 rows, each of which has a single 10 element integer array. This is through pgx directly, not database/sql but the relative difference should be similar.

BenchmarkSelectRowsArray/1000_rows/text-16          	    1143	   1075135 ns/op	 1401515 B/op	   32014 allocs/op
BenchmarkSelectRowsArray/1000_rows/binary-16        	    2964	    381924 ns/op	   72817 B/op	    3024 allocs/op

If pgx knew that it would never have to produce a valid driver.Valuer it could use binary format in more cases.

If the driver "knows" it will call the new interface on Scan, then on Next, it can just ignore that work (stub it out, "cheat"), which I think is fine, but that would be up to the driver to do.

Sounds good.

rsc · 2024-06-24T18:12:09Z

Can someone post the proposal in the form of a complete definition and doc comment of driver.RowsColumnScanner? I looked at the CL but the doc comment there is quite minimal. It seems like more needs to be said.

jackc · 2024-06-25T01:19:43Z

I just pushed a commit that expands on the docs for driver.RowsColumnScanner. But I'm not really sure what else to say beyond what's there and my original post above. In short, this allows a driver to completely override the scanning of query result values.

kardianos · 2024-06-25T11:29:34Z

Proposal: allow specific driver, not `database/sql` to control scan

Forcing all results to go through database/sql to scan comes at a cost:

Data must be first pulled into []any from the driver,
The client then scans the data from database/sql using generic logic for base types.
This works well for simple, common values. However arrays or more complex custom values must either marshal to a string first, then also rely on the client setting some type of converter each time.

If the driver was able to control the actual scan process, both costs could be removed. Types could be scanned directly from the driver to the client values.

The specific changes are, in database/sql/driver, add a new optional interface:

// RowsColumnScanner may be implemented by [Rows]. It allows the driver to completely
// take responsibility for how values are scanned and replace the normal [database/sql].
// scanning path. This allows drivers to directly support types that do not implement
// [database/sql.Scanner].
type RowsColumnScanner interface {
	Rows

	// ScanColumn copies the column in the current row into the value pointed at by
	// dest. It returns [ErrSkip] to fall back to the normal [database/sql] scanning path.
	ScanColumn(dest any, index int) error
}

Then in database/sql, in Rows.Scan, ensure that if supported, call the driver ScanColumn rather then convertAssignRows. If ScanColumn returns driver.ErrSkip, then convertAssignRows still.

Drawbacks

While this proposal makes it easy for a driver to support additional types without going through intermediate types or secondary encoding, it doesn't support getting a row cursor from a field value. One solution to that would be to change the definition of ScanColumn to ScanColumn(index int, dest any, rows *Rows) error. (This would prevent the interface from being declared in the driver package), then export convertAssignRows which has been requested in the past (though I'm not immediately finding references for those requests). If this was done, it would be possible to skip the driver.ErrSkip opt out method entirely, as the driver could call sql.ConvertAssignRows.

rsc · 2024-06-27T13:12:54Z

Have all remaining concerns about this proposal been addressed?

The proposal is:

// RowsColumnScanner may be implemented by [Rows]. It allows the driver to completely
// take responsibility for how values are scanned and replace the normal [database/sql].
// scanning path. This allows drivers to directly support types that do not implement
// [database/sql.Scanner].
type RowsColumnScanner interface {
	Rows

	// ScanColumn copies the column in the current row into the value pointed at by
	// dest. It returns [ErrSkip] to fall back to the normal [database/sql] scanning path.
	ScanColumn(dest any, index int) error
}

rsc · 2024-07-25T09:26:24Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is:

// RowsColumnScanner may be implemented by [Rows]. It allows the driver to completely
// take responsibility for how values are scanned and replace the normal [database/sql].
// scanning path. This allows drivers to directly support types that do not implement
// [database/sql.Scanner].
type RowsColumnScanner interface {
	Rows

	// ScanColumn copies the column in the current row into the value pointed at by
	// dest. It returns [ErrSkip] to fall back to the normal [database/sql] scanning path.
	ScanColumn(dest any, index int) error
}

rsc · 2024-07-31T19:00:15Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

The proposal is:

// RowsColumnScanner may be implemented by [Rows]. It allows the driver to completely
// take responsibility for how values are scanned and replace the normal [database/sql].
// scanning path. This allows drivers to directly support types that do not implement
// [database/sql.Scanner].
type RowsColumnScanner interface {
	Rows

	// ScanColumn copies the column in the current row into the value pointed at by
	// dest. It returns [ErrSkip] to fall back to the normal [database/sql] scanning path.
	ScanColumn(dest any, index int) error
}

bitpossum · 2024-10-19T23:00:38Z

Hi @jackc .
Your CL is on hold for the proposal, that was accepted. Can you get it going again, so we can use this feature in go1.24?

CC @ianlancetaylor

jackc · 2024-10-23T00:16:13Z

@bitpossum Apparently it needed some additional doc files in api/next and doc/next. I think it should be resolved now.

jackc · 2024-11-02T15:22:56Z

@kardianos @rsc

Is the order of the arguments to ScanColumn significant? @bitpossum just let me know that my original implementation had the order one way, but the accepted CL reversed it. I can change the code but I wanted to first check that it was intentional.

ianlancetaylor · 2024-11-04T19:04:21Z

In the Go standard library the destination usually comes first, as with copy or append or io.Copy. That is the order accepted in this proposal. Is there a reason to use the other order? Thanks.

jackc · 2024-11-05T01:54:00Z

@ianlancetaylor No reason to use the other order. I can change it. The original implementation predated the accepted proposal and I had not noticed that it had been changed in the proposal. Just wanted to make sure it was intentional before I made the change.

Implementing RowsColumnScanner allows the driver to completely control how values are scanned. Fixes golang#67546

jackc added the Proposal label May 21, 2024

gopherbot added this to the Proposal milestone May 21, 2024

jackc mentioned this issue May 21, 2024

Add Scan and Value to pgtype.FlatArray, pgtype.Array, pgtype.Range, and pgtype.Multirange jackc/pgx#2020

Closed

ianlancetaylor added this to Proposals May 21, 2024

ianlancetaylor moved this to Incoming in Proposals May 21, 2024

jackc added a commit to jackc/go that referenced this issue May 25, 2024

database/sql: Add RowsColumnScanner interface

2ca99ce

Implementing RowsColumnScanner allows the driver to completely control how values are scanned. Fixes golang#67546

jackc linked a pull request May 25, 2024 that will close this issue

database/sql: allow drivers to override Scan behavior #67648

Open

rsc changed the title ~~proposal: database/sql: Allow driver to entirely override Scan~~ proposal: database/sql: allow driver to entirely override Scan Jun 12, 2024

rsc moved this from Incoming to Active in Proposals Jun 12, 2024

rsc added the Proposal-FinalCommentPeriod label Jul 25, 2024

rsc moved this from Active to Likely Accept in Proposals Jul 25, 2024

rsc moved this from Likely Accept to Accepted in Proposals Jul 31, 2024

rsc changed the title ~~proposal: database/sql: allow driver to entirely override Scan~~ database/sql: allow driver to entirely override Scan Jul 31, 2024

rsc modified the milestones: Proposal, Backlog Jul 31, 2024

rsc added Proposal-Accepted and removed Proposal-FinalCommentPeriod labels Jul 31, 2024

hiendaovinh mentioned this issue Aug 20, 2024

Bob is not working with pgx array types stephenafamo/bob#90

Open

kblomster mentioned this issue Sep 30, 2024

Please add a PGX adapter go-jet/jet#381

Open

This was referenced Nov 6, 2024

Unable to scan JSON data into []byte or json.RawMessage googleapis/go-sql-spanner#284

Closed

Unable to scan into an array of struct googleapis/go-sql-spanner#309

Open

jackc added a commit to jackc/go that referenced this issue Nov 9, 2024

database/sql: Add RowsColumnScanner interface

1d0c44a

Implementing RowsColumnScanner allows the driver to completely control how values are scanned. Fixes golang#67546

seankhliao mentioned this issue Nov 14, 2024

database/sql: value converter interface for rows.Scan #22544

Closed

rubiagatra mentioned this issue Jan 13, 2025

Add Scan to pgtype.FlatArray jackc/pgx#1786

Closed

doranych mentioned this issue Jan 14, 2025

JSONB issue with default_query_exec_mode=simple_protocol jackc/pgx#2231

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database/sql: allow driver to entirely override Scan #67546

database/sql: allow driver to entirely override Scan #67546

jackc commented May 21, 2024

ianlancetaylor commented May 21, 2024

kardianos commented May 23, 2024

gopherbot commented May 25, 2024

jackc commented May 25, 2024

rsc commented Jun 12, 2024

kardianos commented Jun 15, 2024

jackc commented Jun 15, 2024

rsc commented Jun 24, 2024

jackc commented Jun 25, 2024

kardianos commented Jun 25, 2024

rsc commented Jun 27, 2024

rsc commented Jul 25, 2024

rsc commented Jul 31, 2024

bitpossum commented Oct 19, 2024

jackc commented Oct 23, 2024

jackc commented Nov 2, 2024

ianlancetaylor commented Nov 4, 2024

jackc commented Nov 5, 2024

database/sql: allow driver to entirely override Scan #67546

database/sql: allow driver to entirely override Scan #67546

Comments

jackc commented May 21, 2024

Proposal Details

ianlancetaylor commented May 21, 2024

kardianos commented May 23, 2024

gopherbot commented May 25, 2024

jackc commented May 25, 2024

rsc commented Jun 12, 2024

kardianos commented Jun 15, 2024

jackc commented Jun 15, 2024

rsc commented Jun 24, 2024

jackc commented Jun 25, 2024

kardianos commented Jun 25, 2024

Proposal: allow specific driver, not database/sql to control scan

Drawbacks

rsc commented Jun 27, 2024

rsc commented Jul 25, 2024

rsc commented Jul 31, 2024

bitpossum commented Oct 19, 2024

jackc commented Oct 23, 2024

jackc commented Nov 2, 2024

ianlancetaylor commented Nov 4, 2024

jackc commented Nov 5, 2024

Proposal: allow specific driver, not `database/sql` to control scan