Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priorities #2

Open
13 of 18 tasks
felipenoris opened this issue Feb 5, 2019 · 51 comments
Open
13 of 18 tasks

Priorities #2

felipenoris opened this issue Feb 5, 2019 · 51 comments

Comments

@felipenoris
Copy link
Owner

felipenoris commented Feb 5, 2019

  • Try fix GC errors on Julia v0.6
  • Fix LOB reference handling with addRef
  • Fix TIMESTAMP nanosecond precision data loss when converting to DateTime
  • Support TIMESTAMP with time zones
  • Utility function to execute a script file
  • ResultSet API
  • Make Variables and OracleValues 1-index-based
  • remove bangs ! from API methods.
  • preserve precision for numeric values
  • Support ROWID data type
  • add execute_many helper method that operates directly on a statement
  • Docstrings
  • Doc website
  • Incremental write Lob API
  • Try fix handling of RAW fields
  • Base.show methods for data structures
  • Use BinaryBuilder to build deps
  • Tables.jl integration
@aviks
Copy link

aviks commented Feb 12, 2019

This looks great. Would you consider doing a BinaryBuilder build for ODPI-C, so that user do not have to compile it themselves?

@aviks
Copy link

aviks commented Feb 12, 2019

Some integration with Tables.jl would be nice. @quinnj might be able to assit.

@felipenoris
Copy link
Owner Author

Yes, these are all good ideas.

@felipenoris
Copy link
Owner Author

Yes, Tables.jl integration is something I should have done before, as in with XLSX.jl.

@quinnj
Copy link

quinnj commented Feb 12, 2019

Yep, happy to help however I can. If someone wants to take a stab at Tables.jl, I'm happy to review, or if you give me some pointers around the code, I could take a stab at doing the integration myself.

@felipenoris felipenoris added this to the v0.1.0 milestone Feb 16, 2019
@ScottPJones
Copy link

ScottPJones commented Feb 18, 2019

What about support for DECIMAL and NUMERIC types? (up to a point, IEEE decimal floating point formats could be used, using DecFP.jl could support up to 34 digits (with Dec128), even though numbers with 35-38 digits could not be represented exactly.
(For those, I do wish we had a package like Java's BigDecimal, even if it would be much slower than the IEEE formats [which in my testing of DecFP.jl, were faster in Julia than BigFloat types, because they are immutable types, and no finalizers etc. need to be set up for every number allocated, as is the case for BigFloat)

Note: I've found code that describes how to convert to/from Oracle's internal type to a string, that could be used with DecFP.jl to create the Dec128 values or create one of Oracle's numeric type to set values.

@felipenoris
Copy link
Owner Author

felipenoris commented Feb 18, 2019

Hi @ScottPJones !

This is something I've been thinking about. Current implementation gets values as Float64, converted from C double type.

Oracle documentation describes NUMBER columns with a precision and scale, up to 38 digits of precision. I can even get this information while querying data from the database.

One possibility I had in mind was to return values as fixed decimals, using FixedPointDecimals.jl. The common use case will be a monetary value, with 2 decimals. I think if I use DecFP.jl, the precision of the value (I mean, the information that a monetary value has exactly 2 decimals) will be lost during the process, I guess.

What do you think about it?

@cjbj
Copy link

cjbj commented Feb 18, 2019

@felipenoris ping @anthony-tuininga (I can forward email, if that's easier - see my profile) since he has done some work in this area.

@felipenoris
Copy link
Owner Author

@cjbj , @anthony-tuininga, thanks for the input!
Yes, whatever the solution is, we need to understand how the numeric internal representation is defined. I could spend some time to infer it based on my understanding of floating number representation, but it is helpful if there is some code already to show how to parse the bytes to a number.

@cjbj
Copy link

cjbj commented Feb 19, 2019

For Oracle NUMBER, look at https://github.com/oracle/odpi/blob/master/src/dpiData.c#L114 There are pros & cons to doing this on the client side.

@felipenoris
Copy link
Owner Author

@anthony-tuininga, how does cx_Oracle deal with numeric type precision?

@felipenoris
Copy link
Owner Author

felipenoris commented Feb 20, 2019

@cjbj , @anthony-tuininga, based on this code, whenever I ask a numeric value as bytes, the byte representation is actually a text:
https://github.com/oracle/odpi/blob/e95f30cc3180e86d23d343348be80b1a781c58cc/src/dpiVar.c#L728

Is there a way to get the raw byte representation?

Digging in the code, it looks like I have to create a patch to create my own version for dpiUtils__parseOracleNumber. This will avoid the overhead of converting to->from string on the client.

Also, maybe a new native type enum value could be defined to ask the driver to get numeric values in a raw representation (maybe a struct with the result of parseOracleNumber).

@ScottPJones
Copy link

Oracle internal number types encode the value so that they can be compared numerically, simply by doing string (byte) comparisons (this is a common technique, that I've used in the past also). The values themselves do not store the precision, that information is only in the metadata you get back from Oracle about the column types.
They are a variable length, from 1 - 21 bytes (stored radix 100 with offsets, a leading exponent indicator, and an optional trailing negative indicator.
That code you referenced does get the internal raw byte representation, and just converts it to text, because it doesn't expect anyone to directly use their internal raw format for NUMBER, even though that would be much more efficient.

@ScottPJones
Copy link

You could make a Julia type that directly stores the value (maybe as a 24 or 32 byte bitstype, which would probably be the most compact way of storing the full value in Julia), and then have methods that convert to/from strings (with no loss), or for performing operations, into a Dec128 (which might have to round off the last 1-4 digits of a 35-38 digit number) (since Julia doesn't have a BigDecimal type like Java that could handle these numbers directly.

@felipenoris
Copy link
Owner Author

felipenoris commented Feb 20, 2019

@ScottPJones Yes! That's exactly the idea! This is very much like the timestamp case, where there's a special struct for it dpiTimestamp, and I made a julia type OraTimestamp that gets converted to a Timestamp which is defined using julia types Date and Time, to preserve nanosecond precision.

In the case of NUMBER, I could have a struct OraNumber with exponent, mantissa information, and that could have an algebra defined on it, or could be converted to a julia type that preserves its data.

I'm just not sure if there is a julia type for it. I mean, Dec128 will cause information loss. That's why I've been looking at FixedPointDecimals, but I don't have experience with it. But the idea that the precision of the value gets encoded in the julia type is just a good fit for this use case. The type will have a variable size, and will be defined at runtime. No need to define it with a fixed size of 21 bytes.

@anthony-tuininga
Copy link

@anthony-tuininga, how does cx_Oracle deal with numeric type precision?

In cx_Oracle and node-oracledb there is no attempt to use the Oracle Number format directly. Instead, it is converted to double, integer or string depending on the needs of the driver. That is because the Oracle Number format is not directly usable and conversion to/from that format is not simple. If you really want to deal with it yourself directly I could add a "native type" which simply passes the value through to you directly. If that is the case, please add an enhancement request to ODPI-C for that purpose.

@ScottPJones
Copy link

The reason a fixed size might be better, is that using something like Vector will use a lot more memory than simply using a fixed size of 24 bytes (you need one byte to encode the "length", followed by the exponent byte, and 1-20 extra bytes (and 2 padding bytes to make it 3 64-bit words). Having a really immutable bits type means that it can be stored in registers, on the stack, etc. (which is one reason that DecFP.jl is faster than using BigFloat, and takes a lot less memory)

FixedPointDecimals don't allow for exponents, so you could easily have information loss there as well.

@ScottPJones
Copy link

If you really want to deal with it yourself directly I could add a "native type" which simply passes the value through to you directly.

That would be great to have in ODPI-C, IMO.

@felipenoris
Copy link
Owner Author

@anthony-tuininga , nice!!! Let me study oracle's format, then I'll open an issue with detailed information.

@felipenoris
Copy link
Owner Author

@ScottPJones , I see. So there is a tradeoff between memory consumption and performance here.

@ScottPJones
Copy link

Actually, both memory consumption and performance are better, if you can use a fixed size immutable or bitstype.
Julians often seem to forget the huge effect memory consumption can have on overall performance of large systems, doing small benchmarks that don't create enough items to use GC really doesn't show that impact.

@ScottPJones
Copy link

Think of a vector of these things, the vector itself might be n * pointer size, but then you will need around 64 bytes for each element in the vector (and will be doing indirections through the pointers every time you access one of them).

@felipenoris
Copy link
Owner Author

felipenoris commented Feb 20, 2019

@ScottPJones , I see. But say I have an immutable parametric type like struct OraNumber{T} <: Real. The parameter of the type can affect the size of the struct.

So, a vector of values of type OraNumber{x} , for a fixed value x , will have memory and performance issues you're pointing out? Since there's a fixed size, there will be no padding, and also it could be stack-allocated. Or not?

I might be wrong in my understanding. You're much more experienced in this area! :)
I'm trying to explore if there's a way to avoid using a 24bytes type for all numeric values, since the common use case could use much less memory.

@ScottPJones
Copy link

You could probably define an 8, 16, and 24 byte sizes, depending on whether the column is declared with just NUMBER (or DECIMAL), in which case it would need the larger size to support arbitrary Oracle numbers,
8 bytes would be able to support 8 - 1 len - 1 scale - 1 neg => 10 decimal digits max, 16 would support 23 digits, however, I think for cases where the precision is <= 7 digits, or <= 15 digits, that using Dec32 and Dec64 would be better, as well as using Dec128 <= 34 digits. I also just noticed that DECIMAL on Oracle is not the same as the NUMBER type, as on most DBMS that I am aware of, the DECIMAL type seems to correspond to the IEEE 64-bit type (i.e. either Dec64, which is in a binary form, or the other packed decimal form (used on IBM hardware), and the SHORTDECIMAL type corresponds to the 32-bit type, so you'd probably need to add support for DecFP.jl anyway.
For columns with numbers with > 34 digits, the only option would be an OraNumber type, but if it's only used for those very large numbers, it could always be a 24 byte bits type.

@ScottPJones
Copy link

Note: if we did want the (hardware supported on IBM platforms, nice since Julia runs on the LE Power), I could restart my attempt to get the decimal C library ported or wrapped to Julia, which supports the packed decimal format (it uses 10-bit triplets to efficiently pack 3 digits at a time, and make it easy for hardware to deal with).

@ScottPJones
Copy link

So, a vector of values of type OraNumber{x} , for a fixed value x , will have memory and performance issues you're pointing out? Since there's a fixed size, there will be no padding, and also it could be stack-allocated. Or not?

No, if OraNumber{x} were implemented as OraNumber{UInt64}, OraNumber{UInt128}, and OraNumber{UInt192} (where UInt192 is a 24-byte bitstype, just for this purpose), it would have much less memory performance issues than trying to use variable length fields.

I'm trying to explore if there's a way to avoid using a 24bytes type for all numeric values, since the common use case could use much less memory.

Precisely, you'd simply need to look at the precision value that Oracle gives you, to see which type to use for that particular column.

@ScottPJones
Copy link

Oh, I just thought of another thing. Maybe this numeric support should not be put into this package, but rather, a package that can be used with ODBC.jl, and another other database wrapper that needs to correctly support database DECIMAL and NUMERIC types.

@felipenoris
Copy link
Owner Author

felipenoris commented Feb 20, 2019

@ScottPJones I think we agree on the concepts. I just need to study a bit more to choose a solution.

About DECIMAL/NUMERIC, does Oracle DB has them? I mean, this page lists NUMBER and regular float and double. I don't see DECIMAL there. Isn't that for JavaDB? (I never heard of it, just googled).

@ScottPJones
Copy link

Ah, I see where I got confused: Oracle OLAP DML has DECIMAL and SHORTDECIMAL that match IEEE,
Oracle DBMS has DECIMAL and NUMBER (which is basically the same as NUMERIC in the standard)

@ScottPJones
Copy link

Here is a good description of the ANSI SQL numeric types:
https://academy.vertabelo.com/blog/understanding-numerical-data-types-sql/
In the standard there is a small difference between DECIMAL and NUMERIC, which has to do whether the precision is exact or not (DECIMAL numbers can have >= the specified precision)

@felipenoris
Copy link
Owner Author

felipenoris commented Feb 22, 2019

@anthony-tuininga, I'm making progress getting the internal representation of OCINumber (https://github.com/felipenoris/odpi/tree/fn/numeric).

But for math operations on OCINumbers, maybe I should use the functions already implemented in OCI.

What I tried to do was to load libclntsh.so and call OCINumberAdd for instance.
The function call worked, which means the lib is beign loaded dynamically. But I'm getting errors in the result, and I don't know yet how to get error information.

Should this be the way to go (call libclntsh functions directly), or for this to work these functions must be wrapped in odpi?

I'm confused if I should setup an environment to create an error handle before calling OCINumberAdd. For now I'm just passing a null pointer for the error handle, and maybe that's the reason I'm getting an error code as the result of the function OCINumberAdd.

@ScottPJones
Copy link

I'd been looking at https://github.com/JuliaMath/Decimals.jl, found some bugs/performance issues, and have been playing around with creating my own arbitrary precision decimal floating point package, that could handle the Oracle decimal numbers (and those of any of the other databases), in a fairly simple fashion, that should give it decent performance and memory utilization compared to other decimal floating point packages.
My idea is to have an immutable struct, that can either have an IEEE decimal64 floating point value, or a sign, decimal exponent, and a BigInt value. Instead of using a Union type (which requires extra storage for the selector), the code would dispatch on a constant value in the BigInt field, if it is === that constant (i.e. points to the same exact object), then it would really use the Dec64 value (from DecFP.jl), otherwise it would get the sign and exponent from the Dec64 (using the bottom 32 bits to store the signed exponent), and the mantissa would be stored in the BigInt. I feel that it would rarely be necessary to allocate the BigInt, since a Dec64 supports up to 16 decimal digits (±0.000000000000000e−383 to ±9.999999999999999e384).

Let me know if you feel this might be useful for dealing with the Oracle numbers in Julia.

@felipenoris
Copy link
Owner Author

For now I'm trying to use OCI directly, which has a set of math functions defined for OCINumber data type. As far as I can tell, all operations are done using full 22 byte representation.

I just tried to add a new public method to odpi, but all I'm getting are segfaults... :(

// [PUBLIC]
int dpiNumber_add(const dpiNumber *number1, const dpiNumber *number2, dpiNumber *result)
{
	dpiError error;
	return dpiOci__numberAdd(&error, number1, number2, result);
}

@ScottPJones
Copy link

I'll investigate those functions, but I can't debug anything, I don't have any Oracle database to test with.

@felipenoris
Copy link
Owner Author

Ref oracle/odpi#100

@felipenoris
Copy link
Owner Author

felipenoris commented Feb 27, 2019

@ScottPJones I've been taking a look at your code.
I think that, even if I could get those Oracle arithmetic functions to work, people that use this package would benefit from using numbers in a Julia native format from outside this package.

I could read native OCINumbers from the database, and this could be converted to your number format.

On this subject, I would have a few questions for you:

  1. Oracle lists these ranges for NUMBER data type. Does Decs.jl support that in full range?
  • Positive numbers in the range 1 x 10-130 to 9.99...9 x 10125 with up to 38 significant digits

  • Negative numbers from -1 x 10-130 to 9.99...99 x 10125 with up to 38 significant digits

  • Zero

  • Positive and negative infinity (generated only by importing from an Oracle Database, Version 5)

  1. How well would be the coverage of math functions on Decs.jl, given this list?

  2. How would Decs.jl compare to a C library like libmpdec regarding performance?

@ScottPJones
Copy link

ScottPJones commented Mar 2, 2019

Oracle lists these ranges for NUMBER data type. Does Decs.jl support that in full range?

Yes, it supports an arbitrary number of digits, up to the memory of the machine, and an exponent that is an Int32 (whereas Oracle has a single byte UInt8 exponent, offset so that it is 10^-130 to 10^125 instead of 10^-128 to 10^127)
It already handles -0.0, and once I switch to using DecFP for smaller numbers, it will also support Inf, -Inf, and NaN.

How well would be the coverage of math functions on Decs.jl, given this list?

As far as the math functions, it would directly support at unlimited precision +, -, *, div, rem. I also plan on directly supporting /, both with exact precision (give an error if not exact), or a specified precision (probably default to 16 digits, i.e. like an IEEE decimal64).
For the other functions (sin, arcsin, etc.), I'm not a mathematician, and I'd probably punt those to using the DecFP functions, converting the argument(s) to decimal128, and then using the DecFP library.
For those sorts of functions, I have a feeling that 34 digits of precision will be sufficient.

How would Decs.jl compare to a C library like libmpdec regarding performance?

I wasn't aware of libmpdec, it looks great, I don't know what license it uses though, hopefully BSD or MIT.
I'd been using Cowlishaw's decnum library from IBM for "densely packed decimal" format IEEE 32-, 64-, and 128- bit decimal floats.

If it's easy to interface and fast, I'd probably move Decs.jl to use libmpdec instead of BigInt for numbers that can't fit in an IEEE decimal64, or if I can make it so that only libmpdec is needed (while still conforming to IEEE 2008 decimal behavior), I'll do that.
I have no clue about performance with libmpdec.

@felipenoris
Copy link
Owner Author

felipenoris commented May 22, 2019

@JeffreySarnoff, I was wondering if you have any opinions on how to go about supporting oracle numbers as fixed decimals (not floating point). This could be the first driver to have this feature AFAIK.

I'm able to read and write them from the database in Oracle's native format, but I currently do not support the math around it, as I was discussing with Scott in this thread.

@JeffreySarnoff
Copy link

JeffreySarnoff commented May 22, 2019

give me a working definition of an oracle number -- or a reference please (the discussion is unclear)
are you talking about an SQL number type or something specific to Oracle?

@felipenoris
Copy link
Owner Author

@JeffreySarnoff , thanks for the fast reply!

The best definition I found was this one from Oracle docs:

Oracle stores values of the NUMBER datatype in a variable-length format. The first byte is the exponent and is followed by 1 to 20 mantissa bytes. The high-order bit of the exponent byte is the sign bit; it is set for positive numbers and it is cleared for negative numbers. The lower 7 bits represent the exponent, which is a base-100 digit with an offset of 65.

To calculate the decimal exponent, add 65 to the base-100 exponent and add another 128 if the number is positive. If the number is negative, you do the same, but subsequently the bits are inverted. For example, -5 has a base-100 exponent = 62 (0x3e). The decimal exponent is thus (~0x3e) -128 - 65 = 0xc1 -128 -65 = 193 -128 -65 = 0.

Each mantissa byte is a base-100 digit, in the range 1..100. For positive numbers, the digit has 1 added to it. So, the mantissa digit for the value 5 is 6. For negative numbers, instead of adding 1, the digit is subtracted from 101. So, the mantissa digit for the number -5 is 96 (101 - 5). Negative numbers have a byte containing 102 appended to the data bytes. However, negative numbers that have 20 mantissa bytes do not have the trailing 102 byte. Because the mantissa digits are stored in base 100, each byte can represent 2 decimal digits. The mantissa is normalized; leading zeroes are not stored.

Up to 20 data bytes can represent the mantissa. However, only 19 are guaranteed to be accurate. The 19 data bytes, each representing a base-100 digit, yield a maximum precision of 38 digits for an Oracle NUMBER.

The Julia struct for that is this:

const MAX_SIZEOF_MANTISSA = 20

struct OraNumber <: Real
    len::UInt8
    ex::UInt8
    mantissa::NTuple{MAX_SIZEOF_MANTISSA, UInt8}
end

I sketched a few functions to handle it in https://github.com/felipenoris/Oracle.jl/tree/master/src/oranumbers.

@felipenoris
Copy link
Owner Author

felipenoris commented May 22, 2019

... when you define a column as NUMBER(p,q) in Oracle it is represented with this OraNumber structure internally. I can fetch it in this native representation to Julia, using struct OraNumber above. The idea is to preserve its precision when loading to Julia, instead of just converting it to a Float64. This idea is very useful for financial systems.

@JeffreySarnoff
Copy link

Oh, so is your question: "given an OraNumber, an instance of the struct above, how to bring the value into Julia in a way that facilitates its use and maintains its precision and its decimal nature"?

@JeffreySarnoff
Copy link

Is it OK to convert the OraNumber to a EasierToDealWithNumber struct and then unconvert that for writing back to the database?

@felipenoris
Copy link
Owner Author

Exactly! Few options arised: Decs.jl, libmpdec, Decimals.jl, or implement support for OraNumber arithmetic from scratch.

Yes, it is ok to convert to an EasierToDealWithNumber representation, and use OraNumber just as a serialization format.

@JeffreySarnoff
Copy link

JeffreySarnoff commented May 22, 2019

are you looking to support these math functions?
https://docs.oracle.com/cd/E11882_01/win.112/e17727/serobjch017.htm#OOFOL393

(once you have the numbers, what do you need to do with them?)

@felipenoris
Copy link
Owner Author

@JeffreySarnoff well, the basic idea is that I want OraNumber <: Real, so that I can use it in Julia the same way I use a Float64. That function listing that you referenced is what Oracle supports for an OraNumber, which is a good list to pursue when adding support for it in Julia.

@felipenoris
Copy link
Owner Author

... but keep in mind that it might be EasierToDealWithNumber <: Real since OraNumber could be converted to an easier format as soon as I read the value from the database.

@JeffreySarnoff
Copy link

Decimal(precision, scale) "the precision can range from 1 to 38 [decimal digits]" [some Oracle doc]
Your life gets much easier if you can live with a maximum of 34 decimal digits.
(any values larger than that either trigger an Overflow error, or engage a second set of routines)

@JeffreySarnoff
Copy link

https://github.com/JuliaMath/DecFP.jl exports Dec128 and that gets you 34 decimal digits with very good math support. Dec128(string) works, so the conversion from OraNumber would be converting to a string. Going the other way is more of a hassle, but that will be true for any conversion into OraNumber.

@felipenoris
Copy link
Owner Author

@JeffreySarnoff Thanks! I'll check out if I can convert back and forth directly, without intermediate string representation.

@JeffreySarnoff
Copy link

ldexp10 may help

@felipenoris felipenoris removed this from the v0.1.0 milestone Feb 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants