-
Notifications
You must be signed in to change notification settings - Fork 26
/
Copy pathindex.tt
248 lines (211 loc) · 7.76 KB
/
index.tt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
[% header %]
<div class="row">
<div class="col-md-8">
<p>
<strong>Snowball</strong> is a small string processing language for creating
stemming algorithms for use in Information Retrieval, plus a collection of
stemming algorithms implemented using it.
</p>
<p>
It was originally designed
and built by <a href="https://en.wikipedia.org/wiki/Martin_Porter">Martin
Porter</a>. Martin retired from development in 2014 and Snowball is now
maintained as a community project. Martin originally chose the name Snowball as
a tribute to <a href="https://en.wikipedia.org/wiki/SNOBOL">SNOBOL</a>, the
excellent string handling language from the 1960s. It now also serves as a
metaphor for how the project grows by gathering contributions over time.
</p>
<p>
The Snowball compiler translates a Snowball program into source code in another
language - currently Ada, ISO C, C#, Go, Java, Javascript, Object Pascal,
Python and Rust are supported.
<p>
<h2>What is Stemming?</h2>
<p>
Stemming maps different forms of the same word to a common "stem" - for
example, the English stemmer maps <i>connection</i>, <i>connections</i>,
<i>connective</i>, <i>connected</i>, and <i>connecting</i> to <i>connect</i>.
So a searching for <i>connected</i> would also find documents which only
have the other forms.
</p>
<p>
This stem form is often a word itself, but this is not always the case as
this is not a requirement for text search systems, which are the intended
field of use. We also aim to conflate words with the same meaning, rather
than all words with a common linguistic root (so <i>awe</i> and <i>awful</i>
don't have the same stem), and over-stemming is more problematic than
under-stemming so we tend not to stem in cases that are hard to resolve. If
you want to always reduce words to a root form and/or get a root form which is
itself a word then Snowball's stemming algorithms likely aren't the right
answer.
</p>
<!--
FIXME: Incorporate this text from quickintro here:
Discuss somewhere appropriate: thread-safety
<BR><BR>
- You can look at the stemming algorithm definitions themselves, and use
them as templates for coding your own versions of stemmers in the computer
language of your choice.
<BR><BR>
- You can use the various ANSI C
and Java stemmers in programs of your own,
without bothering yourself
with the Snowball system that generated them. To do that,
download either the
<A HREF="../dist/libstemmer_c-[% version %].tar.gz">C</A>
or the
<A HREF="../dist/libstemmer_java-[% version %].tar.gz">Java</A>
version of the libstemmer library, and follow the instructions
contained in the <code>README</code> files within these tarballs.
The tarballs also contain simple example
programs which allow you to run the stemmers from the command line.
<BR><BR>
- You can get involved in Snowball itself. This is particularly worthwhile
if you want to adjust the stemmers or develop new stemmers. A typical reason
for adjusting the stemmers is that you are working with a different encoding
of accented letters from the ISO Latin I encoding assumed in most of the scripts
here. Then you need to make your own version of the Snowball compiler and
work with the Snowball scripts.
Add new backends to the Snowball compiler...
<DL><DD>
The language has a full
<A HREF="../compiler/snowman.html"> manual</A>,
and the various stemming scripts act as example programs.
</DL>
- You can get deeply interested in stemming. If you do, read the
<A HREF="../texts/introduction.html"> introductory paper</A>
about Snowball. It is a bit heavyweight, but provides essential background.
And look at the
<A HREF="../texts/howtohelp.html"> notes</A>
on how you can help.
-->
<p>
Please address all Snowball-related mail to the <a href="lists.html">snowball-discuss mailing list</a>.
</p>
<p>
Any such mail sent directly to individual developers may be answered less
speedily, and in any case they reserve the right to post their answers on snowball-discuss.
</p>
<h2>Major events</h2>
<ul>
<li>
<strong>Sep 2023</strong> - Estonian stemming algorithm contributed by Linda Freienthal.
</li>
<li>
<strong>Nov 2021</strong> - <a href="download.html">Snowball 2.2.0 released!</a>
</li>
<li>
<strong>Jan 2021</strong> - Snowball 2.1.0 released.
</li>
<li>
<strong>Jan 2021</strong> - Armenian stemmer from Astghik Mkrtchyan merged into the distribution.
</li>
<li>
<strong>Jan 2021</strong> - Ada backend contributed by Stephane Carrez.
</li>
<li>
<strong>Nov 2020</strong> - Yiddish stemming algorithm contributed by Assaf Urieli.
</li>
<li>
<strong>Oct 2019</strong> - Serbian stemming algorithm contributed by Stefan Petkovic and Dragan Ivanovic.
</li>
<li>
<strong>Oct 2019</strong> - Snowball 2.0.0 released.
</li>
<li>
<strong>Aug 2019</strong> - Hindi stemming algorithm contributed by Olly Betts.
</li>
<li>
<strong>Aug 2019</strong> - Basque and Catalan merged into the distribution.
</li>
<li>
<strong>Oct 2018</strong> - Greek stemming algorithm contributed by Oleg Smirnov.
</li>
<li>
<strong>Jun 2018</strong> - Object pascal backend from Wout van Wezel merged.
</li>
<li>
<strong>May 2018</strong> - Lithuanian stemming algorithm contributed by Dainius Jocas.
</li>
<li>
<strong>May 2018</strong> - Indonesian stemming algorithm contributed by Olly Betts.
</li>
<li>
<strong>Mar 2018</strong> - C# backend contributed by Cesar Souza.
</li>
<li>
<strong>Mar 2018</strong> - Javascript backend merged.
</li>
<li>
<strong>Jun 2017</strong> - Go backend contributed by Marty Schoch.
</li>
<li>
<strong>Mar 2017</strong> - Rust backend contributed by Jakob Demler.
</li>
<li>
<strong>Jan 2016</strong> - Arabic stemming algorithm contributed by Assem Chelli.
</li>
<li>
<strong>Oct 2015</strong> - Tamil stemming algorithm contributed by Damodharan Rajalingam.
</li>
<li>
<strong>Sep 2015</strong> - New home for snowball on snowballstem.org.
</li>
<li>
<strong>Sep 2014</strong> - Martin Porter <a href="http://article.gmane.org/gmane.comp.search.snowball/1531">retires from snowball development</a>.
</li>
<li>
<strong>May 2012</strong> - Contributed stemmers for Irish and Czech.
</li>
<li>
<strong>Jul 2010</strong> - Contributed stemmers for Armenian, Basque, Catalan.
</li>
<li>
<strong>Mar 2007</strong> - Romanian stemmer.
</li>
<li>
<strong>Jan 2007</strong> - Turkish stemmer. Contributed by Evren (Kapusuz) Cilden.
</li>
<li>
<strong>Sep 2006</strong> - Hungarian stemmer. Contributed by Anna Tordai.
</li>
<li>
<strong>Jun 2006</strong> - Supported and updated Python bindings.
</li>
<li>
<strong>May 2005</strong> - UTF-8 Unicode support.
</li>
<li>
<strong>Sep 2002</strong> - Finnish stemmer.
</li>
<li>
<strong>Jul 2002 - ISO Latin I as default</strong>
The use of MS DOS Latin I is now history, but the old versions of the
Snowball stemmers are still accessible on the site.
</li>
<li>
<strong>May 2002 - Unicode support</strong>
</li>
<li>
<strong>Feb 2002 - Java support</strong>
Richard has modified the snowball code generator to produce Java output as
well as ANSI C output. This means that pure Java systems can now use the
snowball stemmers.
</li>
</ul>
</div>
<div class="col-md-4">
<h2>Links to resources</h2>
<ul class="list-unstyled">
<li><a href="texts/introduction.html">An account of Snowball</a></li>
<li><a href="texts/howtohelp.html">How You Can Help</a></li>
</ul>
<h3>Snowball compiler</h3>
<ul class="list-unstyled">
<li><a href="compiler/snowman.html">The Manual</a></li>
<li><a href="runtime/use.html">How To Run It</a></li>
<li><a href="codesets/guide.html">Character codes</a></li>
</ul>
</div>
</div>
[% footer %]