-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(vector): vector operation #1
base: main
Are you sure you want to change the base?
Conversation
In this commit the empty interfaces are replaces with Element interfaces, and instead of relying on type inference, we ask the vector to provide a concrete value. This has greatly increased the throughput of the program.
return &Vector{ | ||
s: make([]Element, 0, size), | ||
NewElement: func(value string) Element { | ||
num := strings.Split(value, sep)[pos] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a regular expression to capture everything from the start of the line to the first \t. It will be constant time O(1). Because I think strings.Split is O(n)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point. I will make changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came up with this regexp, but it made the operation slower.
str := fmt.Sprintf(`^([^%s]+%s){%d}([^%s]+)%s?`, sep, sep, pos, sep, sep)
re := regexp.MustCompile(str)
return func(size int) *Vector {
return &Vector{
s: make([]Element, 0, size),
NewElement: func(value string) Element {
num := re.FindStringSubmatch(value)[2]
i, err := strconv.Atoi(num)
if err != nil {
panic(errors.Wrap(err, "converting value from string"))
}
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! It also gave me ideas about the next steps
} | ||
|
||
// Sort Perform a binary search to find where to put a value in a vector. Ascending order. | ||
func Sort(v *Vector, line, sep string, pos int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the reason of the separator and why we split.
But what if the person just want to sort a file that not a CSV or a TSV. I don't know like a text file with a sentence per line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AllocateStringVector
should do it.
// Vector holds a slice of Elements for sorting. The NewElement is called each | ||
// time a new item from the file is read or inserted into the slice. The Less | ||
// function should return true if the first element is lower than the second. | ||
type Vector struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this new Vector regarding performance and clarity. It would be even better if the underlying structure does not have to be a slice. This is not top priority, but maybe someone would like to have a linked list/other data structure.
The only thing I have in mind would require another interface for the field s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with you. Let's keep slimming to the minimum requirements and then we have to refactor. As mentioned I'm not happy about the Element
and I think we can come up with a better solution that would satisfy this as well.
Thank you! I'm still experiencing with this and I need to use your other commit in these. I will rebase after that because clearly there are some hard-coded stuff that you've mentioned that needs fixing. |
This PR changes the way the Vector is managed. A new TableVector is added that can sort based on a column and a separator.
Todo