Skip to content

Commit

Permalink
kraaij_pohlmann.sbl: Remove conversion of y to Y
Browse files Browse the repository at this point in the history
The Snowball implementation tries to identify cases where `y` is a
consonant and temporarily changes these to `Y` which is then treated as
a consonant during stemming (then `Y` is changed back to `y` before
returning).  However the original C Kraaij-Pohlmann implementation does
not do this (it's taken from the Porter stemmers for English, French,
German and Dutch).

A quick scan of the stemming differences resulting from this change
suggests that the this extra handling only helps by conflating `royale`
with `royaal` but possibly there are additional cases and this extra
tweak is useful.  However it's getting in the way of resolving the
differences between the C and Snowball implementations so remove at
least for now and review later.

This reduces the number of words which stem differently from 65 to 45.

See #1
  • Loading branch information
ojwb committed Feb 18, 2025
1 parent 4d764b4 commit a198ee5
Showing 1 changed file with 1 addition and 5 deletions.
6 changes: 1 addition & 5 deletions algorithms/kraaij_pohlmann.sbl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
strings ( ch )
integers ( p1 p2 )
booleans ( Y_found stemmed GE_removed )
booleans ( stemmed GE_removed )

routines (

Expand Down Expand Up @@ -250,10 +250,7 @@ define measure as (
)
define stem as (

unset Y_found
unset stemmed
do ( ['y'] <-'Y' set Y_found )
do repeat(goto (v ['y'])<-'Y' set Y_found )

measure

Expand All @@ -277,6 +274,5 @@ define stem as (
do (Step_7 set stemmed )
do (stemmed or GE_removed Step_6)
)
do(Y_found repeat(goto (['Y']) <-'y'))
)

0 comments on commit a198ee5

Please sign in to comment.