Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataframe: add row appending methods and sql scan helpers. #50

Merged
merged 19 commits into from
Feb 14, 2020

Conversation

kylebrandt
Copy link
Contributor

wip for #49

@kylebrandt
Copy link
Contributor Author

kylebrandt commented Jan 29, 2020

Some notes:

  • Should the dataframe sql stuff be in a different package? Must not be dataframe/sql though because would be annoying on imports to always have to alias (since it would be used with database/sql). They also would no longer be methods on dataframe. Although something like a sqlframe could just be a type of frame.
  • Try two phase converter that replaces a string field with the desired output type of the conversion func with sql. In particular for use with alternate numeric types.
  • Need to check for valid types before attempting to create dataframe. Add some sort of validator method. Needed at newForSQLRows
  • Not sure if row ops should be a method on Fields instead of dataframe.

@kylebrandt
Copy link
Contributor Author

I would more time with but think it would be helpful to @sunker to get this merged soon then later if even means we change / fix some things in master.

@kylebrandt kylebrandt marked this pull request as ready for review February 13, 2020 17:24
@kylebrandt kylebrandt requested a review from papagian February 13, 2020 17:24
@kylebrandt kylebrandt changed the title dataframe: (wip) explore appending scenarios dataframe: add row appending methods and sql scan helpers. Feb 13, 2020
Copy link
Contributor

@papagian papagian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work; I enjoyed reviewing it a lot!
I have left some comments/thoughts but with my limited knowledge on the subject I think it's good to get merged.

}
}
// second loop that modifies
f.AppendRow(vals...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling AppendRow and looping over vals twice you could do the appending inside the above loop using:
f.Fields[i].Vector.Append(v)

func (v *nullablegenVector) Len() int {
return len((*v))
}

func (v *nullablegenVector) PrimitiveType() VectorPType {
return vectorPType(v)
}

func (v *nullablegenVector) Extend(i int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the use case but maybe it would be useful to accept also an arbitrary number of values to initialise the new slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly it already does. Extend grows the vector be argument i.

dataframe/sql.go Outdated
)

// NewFromSQLRows returns a new dataframe populated with the data from rows. The Vector types
// will be []*T if nullable or will be []*T if it is unknown if the column is nullable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typod - will fix

dataframe/sql.go Outdated
seen := map[string]struct{}{}
for _, name := range colNames {
if _, ok := seen[name]; ok {
return nil, nil, fmt.Errorf(`duplicate column names no allowed, found identical name: "%v"`, name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index of the identical column could be included in the error

Alternatively, instead of having this restriction and returning an error I wonder if the new field name could be named after the column name and the column index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be solved at a deeper level. I think perhaps something in MarshalArrow takes care of it. I meant to make an issue and then link it in the comment and forgot, will do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#57 (added to code comments as well).

// Nullabe types get passed to scan as a pointer to a pointer
vec = reflect.MakeSlice(reflect.SliceOf(ptrType), 0, 0).Interface()
}
if !ValidVectorType(vec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder it this condition is enough; is it necessary for all the columns have a mapper? if yes then this condition should check also if a mapping exists? otherwise maybe here there should be an additional check for this?
As it is now I think it's possible to end up with a nil mapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would only be a nil mapper if you passed a nil mapper to NewFromSQLRows. If a mapper isn't provided, and it isn't a type natively supported by both dataframe and SQL scanner, it becomes a *string (via auto-generated mapper) which anything can be scanned into.

dataframe/sql.go Outdated

// scannableRow adds a row to the dataframe, and returns a slice of references
// that can be passed to rows.Scan() in the in sql package.
func (f *Frame) scannableRow() []interface{} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the name newScannableRow that highlights the fact the a new row is created.

}
}

for fieldIdx, mapper := range mappers {
Copy link
Contributor

@papagian papagian Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be feasible to apply conversions and replacements inside the above loop.
I see there is an on-going discussion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had seen that discussion while researching this - but only at proposal stage :-(

@kylebrandt kylebrandt requested a review from papagian February 14, 2020 14:48
Copy link
Contributor

@papagian papagian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// The dataframe's Fields and the Fields' Vectors must be initalized or AppendRow will panic.
// The number of arguments must match the number of Fields in the Frame and each type must coorespond
// to the Field type or AppendRow will panic.
func (f *Frame) AppendRow(vals ...interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this?
Why someone would prefer this over the AppendRowSafe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speed. It should be faster to use if you already know that both your data and your frame will meet what is checked for, bypassing all the checks.

@kylebrandt kylebrandt merged commit 4127415 into master Feb 14, 2020
@kylebrandt kylebrandt deleted the frame_append branch February 14, 2020 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants