Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a trim feature for string and string_view #51

Open
bretbrownjr opened this issue Oct 24, 2024 · 12 comments
Open

Implement a trim feature for string and string_view #51

bretbrownjr opened this issue Oct 24, 2024 · 12 comments
Labels
help wanted Extra attention is needed

Comments

@bretbrownjr
Copy link
Contributor

I'm personally flexible on the details of the implementation, but it's an extremely common operation, spaces are modeled with APIs like std::isspace, and it's surprising it's not available already.

Probably the proposal should provide ltrim and rtrim as well.

StackOverflow with advice on implementing trim from scratch for reference:
https://stackoverflow.com/questions/216823/how-to-trim-a-stdstring

@sandordargo
Copy link

Do you still need help? This needs a new repo in beman-project, right?

@JeffGarland
Copy link
Member

Hi @sandordargo -- yes we're still looking for help here. @bretbrownjr I assume there's no proposal yet? Right off one might ask if this should be implemented as free functions (aka like boost string algo) instead of as part string_view

@bretbrownjr
Copy link
Contributor Author

I don't have specific design notes or plans. I do have observations that this feature is conspicuously missing from the standard.

If I were picking up this task, I would consider what existing C++ libraries like boost already provide for this feature.

@sandordargo
Copy link

Boost provides many different trims. Apart from the left/rigth/all versions, it provides copy and in-place versions, and even versions suffixed with _if where the caller can define what should be considered a space.

But maybe it would fit better with the design decisions of string_view if it were a member function, like remove_prefix or remove_suffix (and several others).

What do you think?

@wusatosi
Copy link
Member

Hey I am also interested in this and putting out more libraries in general. I think I will add this to the agenda of our next weekly sync.

I don't think we have a standard approach to adding member functions to std objects for demonstration. We probably will have to do this as helper functions, e. g. beman::string_utils::trim.

I would suggest adding a general purpose string utils library. As I think there's tons of improvements we could propose, e. g. Split.

@alexcohn
Copy link

alexcohn commented Jan 11, 2025

you can add member functions to a class, call it beman::string_view which extends std::string_view and is constructable from it. I prefer class names with feature, e.g. beman::string_view_with_trim. The typical usage is

string_view_with_trim sv { "this example will be trimmed     " };
sv.trim();

@LapNik
Copy link

LapNik commented Feb 14, 2025

Could this be implemented with range adaptors instead of member functions? That way trims could be used on any type that implements the interface for a range of characters, not just string and string_view.

Sure, you could convert a range into a string_view, but only with contiguous ranges, right? I'm not sure if there's any need for the trim algorithm to require contiguous ranges.

@alexcohn
Copy link

@LapNik same logic applies to remove_prexif(). IMHO the two approaches don't discard each other. But std::string::ltrim() is special, because only a member function can perform it at O(length_of_prefix), i.e. without malloc and memcpy.

@LapNik
Copy link

LapNik commented Feb 24, 2025

@alexcohn I'm sorry, I can't figure out what you mean by std::string::ltrim being able to perform it in O(length_of_prefix). If we are considering a modifier function, wouldn't it have to copy the suffix over the erased section like erase does? Or should string start tracking a starting position?

If the member function didn't modify the string and instead returned a string_view then I can see how it could be performed at O(length_of_prefix), but we wouldn't need direct access to string's representation for that and a free function could do the same with something like

return std::basic_string_view{ std::views::drop_while( s, is_white_space ) };

Returning a string_view would make more sense to me since the caller would have more options what to do with the results:

std::string s = " a"s;
auto trimmed = std::ranges::ltrim( s );
std::string copy{ trimmed };
s.erase( s.begin(), trimmed.begin() );

Would something like this work? https://gist.github.com/LapNik/076909e6b704e99221b6e86e243b4c94

Closure objects not included.

@LapNik
Copy link

LapNik commented Feb 24, 2025

Could we use something like trim_front and trim_back instead of ltrim and rtrim since left-to-right is not guaranteed?

@alexcohn
Copy link

@LapNik sure you are right, returning a trimmed string_view is straightforward, and does not require member function. But if you want ltrim() to modify an existing string, this does require tracking the start position, thus making changes not only to the class API, but also the internal representation. This is not as fantastic as you think, e.g. modern implementations of string include small-string optimization. Can one tweak the structure more to keep the start offset? Sure! This is all about performance, so the tweak does not have to work in all cases, but only for those that make a big difference, e.g. limited to strings that are longer than certain threshold.

@JeffGarland
Copy link
Member

Just going to drop in (haha) and ask this -- what about drop_while? The cppreference example has this for trim_left

[[nodiscard]]
constexpr bool is_space(char p) noexcept
{
    auto ne = [p](auto q) { return p != q; };
    return !!(" \t\n\v\r\f" | std::views::drop_while(ne));
};
 
[[nodiscard("trims the output")]]
constexpr [std::string_view](http://en.cppreference.com/w/cpp/string/basic_string_view) trim_left([std::string_view](http://en.cppreference.com/w/cpp/string/basic_string_view) const in) noexcept
{
    auto view = in | std::views::drop_while(is_space);
    return {view.begin(), view.end()};
}

https://en.cppreference.com/w/cpp/ranges/drop_while_view

The original plan for c++26 ranges also had a views::drop_last_while which would make the backwards trim just as simple. Of course these aren't dedicated to string processing and hence would likely be less efficient than a dedicated algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants