Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generate, modifyM & folds for mutable vectors #338

Merged
merged 20 commits into from
Apr 1, 2021

Conversation

Shimuuar
Copy link
Contributor

See #334 and linked issues for discussion. Draft since functions were only added to generic module

  • generate{,M}
  • modifyM
  • mapM_,imapM_, all variants of left folds

These I think completely uncontroversial function

@Bodigrim
Copy link
Contributor

Bodigrim commented Oct 28, 2020

I have no issues with

foldl :: (PrimMonad m, MVector v a) => (b -> a -> b) -> b -> v (PrimState m) a -> m b

But I'm unsure about monadic folds like

foldM :: (PrimMonad m, MVector v a) => (b -> a -> m b) -> b -> v (PrimState m) a -> m b

Do we really need them? What useful monadic effect could happen in b -> a -> m b? It feels dangerous that this action can modify v (PrimState m) a, for example. I could probably contrive some examples, but not convinced that they embody enough justification to enshrine such idiom.

@lehins
Copy link
Contributor

lehins commented Oct 28, 2020

foldlM is most useful of them all, because you can implement all other mutable iterators with it. Also note that it is not just foldM, because unlike lists, iterating monadically from the right is also useful and efficient.

ifoldlM is essentially a for loop from imperative languages on steroids. It gives you index, value of the element, accumulator and ability to affect the world. It also gives you freedom to mutate the vector in-place as you go through it. I think it is the most underappreciated concept in Haskell because everyone tries really hard to live in the pure immutable world.

@Bodigrim You should not feel any danger about it, because there nothing unsafe about monadic folds.

On the other hand my opinion about foldl is quite the opposite, I think it sends the wrong message of false purity:

foldl :: (PrimMonad m, MVector v a) => (b -> a -> b) -> b -> v (PrimState m) a -> m b
foldl f acc vec = ..

It gives you this feeling that the function f is pure and thus the whole operation will have no effects, while in reality this vector can be mutated concurrently by another thread while this "foldl" is going through this mutable vector.

So I would recommend is adding all four:

foldlM :: (PrimMonad m, MVector v a) => (b -> a -> m b) -> b -> v (PrimState m) a -> m b
ifoldlM :: (PrimMonad m, MVector v a) => (b -> Int -> a -> m b) -> b -> v (PrimState m) a -> m b
foldrM :: (PrimMonad m, MVector v a) => (a -> b -> m b) -> b -> v (PrimState m) a -> m b
ifoldrM :: (PrimMonad m, MVector v a) => (Int -> a -> b -> m b) -> b -> v (PrimState m) a -> m b

@Shimuuar
Copy link
Contributor Author

I don't think possibility that callback function will modify vector is a problem. mapM_ has same problem as any function taking monadic callback. Similarly imperative programmer can mutate vector in for loop. All these functions are basically for loop specialized in various ways

As for pure folds. I think about them as syntactic convenience. To avoid wrapping pure function into monads if it happens to be pure. Compare:

foldl (+) 0 vec
foldM (\x y -> pure (x +y)) 0 vec

Also I don't see how they could be interpreted as pure. Result is wrapped in monad after all. And vectors are being modified concurrently all bets are off in any case.

P.S. Agree about right folds

@lehins
Copy link
Contributor

lehins commented Oct 28, 2020

I think about them as syntactic convenience

100%

I am not against adding folds with pure functions. They definitely can't be interpreted as pure, so it is no surprise you don't see it ;) . All I am saying is that they can be mistakenly interpreted as pure by a novice user, because by itself a foldl can't do any mutation. That doesn't make them less useful as syntactic convenience as you very well put it.

-- Folds
-- -----

forI_ :: (Monad m, MVector v a) => v (PrimState m) a -> (Int -> m b) -> m ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be PrimMonad m?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. This function doesn't use anything from PrimMonad, Callback that passed as parameter usually does

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't PrimMonad m required to use PrimState m? PrimState is defined inside the PrimMonad class, so I thought that constraint would be required to be able to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late answer. Not quite. It's true that in order to define PrimState type family instance one have to define PrimMonad instance. But when it comes to use associated type work just like stand alone type families. Since we don't use any methods of PrimMonad we don't need to bring its dictionary here.

@Shimuuar
Copy link
Contributor Author

Shimuuar commented Mar 6, 2021

@lehins, @Bodigrim any onjection about functions being added or about naming? If not I'll finish this PR. Added functions:

  • generate/generateM
  • modifyM/unsafeModifyM
  • mapM_, imapM_
  • foldl, foldl', foldM, foldM'
  • ifoldl, ifoldl', ifoldM, ifoldM'

@lehins
Copy link
Contributor

lehins commented Mar 6, 2021

This sounds great. Will you add right folds in this PR too or postpone it to a later one?

@Shimuuar
Copy link
Contributor Author

Shimuuar commented Mar 6, 2021

I'll add them here. Thanks for reminding about them

@konsumlamm
Copy link
Contributor

If you're also adding right fold (I assume including monadic right folds), wouldn't it make more sense to use foldlM instead of foldM?

Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
@Shimuuar
Copy link
Contributor Author

@konsumlamm Thanks!

@Shimuuar
Copy link
Contributor Author

Only unresolved point is naming: foldM vs foldlM. I think that we should go with foldM in order to keep naming within library consistent. Immutable vectors use foldM (and don't provide foldrM) so does Control.Monad as well.

@konsumlamm
Copy link
Contributor

Only unresolved point is naming: foldM vs foldlM. I think that we should go with foldM in order to keep naming within library consistent. Immutable vectors use foldM (and don't provide foldrM) so does Control.Monad as well.

Perhaps we could do both and make foldM and alias for foldlM (that's what Data.Vector.Fusion.Bundle.Monadic and Data.Vector.Fusion.Stream.Monadic do fwiw)? One point in favor of using foldlM is that Data.Foldable provides foldlM and foldrM (and no foldM).

@lehins
Copy link
Contributor

lehins commented Mar 11, 2021

I actually was gonna suggest the same thing of naming foldlM instead of foldM as well, but then double checked how those functions are named for immutable vectors and decided not to suggest this in the end. I believe consistency is way more important then personal preference.

I am also not in favor of adding foldlM as synonyms because there is too much overhead for very little benefit. That would mean adding foldlM and ifoldlM in 16 different modules (we'd have to add them for immutable for consistency too). This would mean extra 32 functions that have no other purpose but act as synonyms. I really don't think its worth it, so IMHO foldM and ifoldM will do just fine.

@Shimuuar Shimuuar marked this pull request as ready for review March 12, 2021 16:47
@Shimuuar
Copy link
Contributor Author

I've added reexports for (hopefully) everything. Also as it turns out we didn't have reexports for exchange

Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
@konsumlamm thanks, for noticing this
And fix API discrepancies for left folds
@Shimuuar
Copy link
Contributor Author

Not ready for merge. I started writing tests and found out that I used wrong order of parameters for indexed folds. And now GHC produces tests suite that crashes. I'll start investigating this tomorrow

Note test suite crashes for me at this commit
Also fix case for negative vector size
@Shimuuar
Copy link
Contributor Author

Turns out writing tests is pretty neat idea. They caught both discrepancies in API between pure and mutable versions and violations of memory safety. Now I think PR is ready for review.

Copy link
Contributor

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor suggestions. Haven't found any correctness issues, so it looks great.

-- and fill it with the results of applying the function to each index.
generate :: (PrimMonad m, MVector v a) => Int -> (Int -> a) -> m (v (PrimState m) a)
{-# INLINE generate #-}
generate n f = stToPrim $ generateM n (return . f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does stToPrim help the optimizer or improves performance somehow? It seems redundant to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It allows GHC to work in the ST monad instead of whatever tower of monad transformers is thrown at it. GHC may or may not be able to optimize those binds. Hopefully this will reduce amount of work optimizer has to do and may even result in better code. Same for other function that take pure function as parameters

Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
-- | /O(n)/ Pure left fold (function applied to each element and its index).
ifoldl :: (PrimMonad m, MVector v a) => (b -> Int -> a -> b) -> b -> v (PrimState m) a -> m b
{-# INLINE ifoldl #-}
ifoldl f b0 v = stToPrim $ loop 0 b0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again and in all other folds, it doesn't look like we need stToPrim. unsafeRead is restricted to PrimMonad, not ST.

Also as with other suggestion we can use ifoldM to define ifoldl

Data/Vector/Generic/Mutable.hs Outdated Show resolved Hide resolved
prop_mut_foldrM' :: P ((a -> a -> Identity a) -> a -> v a -> Identity a)
= (\f z v -> Identity $ runST $ MV.foldrM' (\a b -> pure $ runIdentity $ f a b) z =<< V.thaw v)
`eq`
foldrM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, look like we don't have a corresponding pure strict right monadic fold: foldrM' for immutable vectors.
Just mentioning this in case you are in a mood of adding it for completeness. Feel free to ignore this comment if you don't feel like adding it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have lazy foldrM either. I'll add them in separate PR. This PR is large enough as it is

tests/Utilities.hs Outdated Show resolved Hide resolved
@Shimuuar
Copy link
Contributor Author

@lehins I finally addressed your suggestions. I also added changelog

Co-authored-by: konsumlamm <44230978+konsumlamm@users.noreply.github.com>
@lehins
Copy link
Contributor

lehins commented Apr 1, 2021

@Shimuuar This looks great. Changes are ready to be merged, but it would be nice to cleanup the commit history. If you aren't up to it I can just squash all of the changes into one and merge it.

Once it is merged I'll backport changes in this PR to vector-0.12.3.0 since we need to make another non-breaking release anyways that includes #372

@Shimuuar
Copy link
Contributor Author

Shimuuar commented Apr 1, 2021

Let just squash it. it not that big and changes are sometimes interwoven and couldn't be cleaned up easily

@lehins lehins merged commit 1d30b67 into haskell:master Apr 1, 2021
lehins pushed a commit that referenced this pull request Apr 1, 2021
Add generate, modifyM & folds for mutable vectors:

* Added `generate`, `generateM` for mutable vectors
* Added `modifyM` and `unsafeModifyM` for mutable vectors
* Add all variants of folds for mutable vectors:

 *  `mapM_`, `imapM_`, `forM_`, `iforM_`
 * `foldl`, `foldl'`, `foldM`, `foldM'`,
 * `foldr, `foldr'`, `foldrM`, `foldrM'`,
 * `ifoldl`, `ifoldl'`, `ifoldM`, `ifoldM'`,
 * `ifoldr`, `ifoldr'`, `ifoldrM, `ifoldrM'`

* Add tests for all new functions

* Update changelog
@Shimuuar Shimuuar deleted the mutable-api branch April 7, 2021 17:29
@lehins lehins mentioned this pull request Jul 17, 2021
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants