Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(vector): vector operation #1

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

arsham
Copy link

@arsham arsham commented Feb 4, 2022

This PR changes the way the Vector is managed. A new TableVector is added that can sort based on a column and a separator.

Todo

  • Proper name for the Vector
  • Use sync.Pool for the Elements
  • Refactor NewElement and Less property functions
  • Error management on NewElement

In this commit the empty interfaces are replaces with Element
interfaces, and instead of relying on type inference, we ask the vector
to provide a concrete value. This has greatly increased the throughput
of the program.
return &Vector{
s: make([]Element, 0, size),
NewElement: func(value string) Element {
num := strings.Split(value, sep)[pos]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a regular expression to capture everything from the start of the line to the first \t. It will be constant time O(1). Because I think strings.Split is O(n)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. I will make changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with this regexp, but it made the operation slower.

	str := fmt.Sprintf(`^([^%s]+%s){%d}([^%s]+)%s?`, sep, sep, pos, sep, sep)
	re := regexp.MustCompile(str)

	return func(size int) *Vector {
		return &Vector{
			s: make([]Element, 0, size),
			NewElement: func(value string) Element {
				num := re.FindStringSubmatch(value)[2]
				i, err := strconv.Atoi(num)
				if err != nil {
					panic(errors.Wrap(err, "converting value from string"))
				}
            }
		}
	}
}

Copy link
Owner

@askiada askiada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! It also gave me ideas about the next steps

}

// Sort Perform a binary search to find where to put a value in a vector. Ascending order.
func Sort(v *Vector, line, sep string, pos int) error {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reason of the separator and why we split.

But what if the person just want to sort a file that not a CSV or a TSV. I don't know like a text file with a sentence per line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AllocateStringVector should do it.

// Vector holds a slice of Elements for sorting. The NewElement is called each
// time a new item from the file is read or inserted into the slice. The Less
// function should return true if the first element is lower than the second.
type Vector struct {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this new Vector regarding performance and clarity. It would be even better if the underlying structure does not have to be a slice. This is not top priority, but maybe someone would like to have a linked list/other data structure.

The only thing I have in mind would require another interface for the field s

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you. Let's keep slimming to the minimum requirements and then we have to refactor. As mentioned I'm not happy about the Element and I think we can come up with a better solution that would satisfy this as well.

@arsham
Copy link
Author

arsham commented Feb 4, 2022

Thank you!

I'm still experiencing with this and I need to use your other commit in these. I will rebase after that because clearly there are some hard-coded stuff that you've mentioned that needs fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants