Boolean searching capabilities
were not part of the original WAIS release, but were soon
introduced into WAIS indexing packages due to user demand. All
WAIS packages but the earliest versions support AND, OR,
NOT, right-hand truncation using the asterisk
(*), and the use of parentheses in search statements. How
these are implemented differ among the various WAIS versions.
Here we report about freeWAIS-sf. More information about
this package can be obtained by connecting to the University of
Dortmund where the package
was developed.
This document is a free adaptation
from four different documents prepared by:
Natalie Oakes Sturr
Systems Librarian, Penfield Library,
SUNY Oswego, Oswego, NY 13126
E-mail: sturr@oswego.edu
By navigating the Web, you can see these four original documents:
The use of upper or lowercase for either search terms or Boolean operators can affect the results of WAIS searches.
freeWAIS-sf processes lowercase
search terms correctly. However it returns inconsistent search
results when search statements include uppercase search
terms.
In fact freeWAIS-sf returns 0 (zero) documents
when uppercase search terms are combined with either the
AND or NOT operators.
A major challenge when searching a WAIS database is whether to use uppercase or lowercase Boolean operators. freeWAIS-sf correctly handle both upper and lowercase operators.
Boolean operators and the use of parentheses are supported in freeWAIS-sf 2.x. Since operations in parentheses are evaluated first, it is advisable to use parentheses in search statements to assure consistent results.
In freeWAIS-sf, the hierarchy of Boolean operators is:
NOT is evaluated before AND, which is evaluated before OR. Operations in parentheses are evaluated first of all.
Examples:
| Search Statement | Executed as | |
|---|---|---|
| A and B or C A or B and C (A or B) and C |
(A and B) or C A or (B and C) (A or B) and C |
|
| A or B and C not D C not D and A or B ((A or B) and C) not D |
(A or (B and (C not D))) (((C not D) and A) or B) ((A or B) and C) not D |
Right-hand truncation, denoted
with an asterisk (*), is listed as a feature of freeWAIS-sf.
This package also provides stemming, similar to automatic
right-hand truncation, as an option (see later).
freeWAIS-sf, right-hand truncation produces correct results as long as stemming is NOT turned on.
Two types of stemming are
available in various versions of WAIS: Porter and plural.
freeWAIS-sf offers only Porter stemming.
Porter stemming attempts to identify and index the word stem. If a word and its stem are different, only the word stem is indexed. Thus, it appears to the user that search terms are automatically truncated. For example, both physics and physical stem to physic.
| Search Term | Retrieves | |
|---|---|---|
| physics physical |
physics or physical physics or physical |
Plural stemming attempts to identify and index the singular form of a search term. When searching either the singular or plural form of a term, both are retrieved:
| Search Term | Retrieves | |
|---|---|---|
| letter letters |
letter OR letters letter OR letters |
|
| family families |
family OR families family OR families |
Although Porter stemming is based on a computer algorithm, the English language is not! This can cause search statements to return unexpected results. For example, play is stemmed to plai. These inconsistencies are compounded when combined with right-hand truncation.
| Search Term | Retrieves | |
|---|---|---|
| play play* plai plai* |
play, plays player, playful, playground play, plays play, plays, plain, plains, plainly, plaintiff,plait |
Data used to determine how
truncation and stemming are implemented in freeWAIS-sf are
presented below. They refer to freeWAIS-sf 1.1 that is the more
recent version tested.
All searches were performed on a small database constructed for
the purpose of testing truncation. The database has 10 records
(one word per record) and was indexed using the paragraph format
(-t para). waissearch was used to search the databases.
The words listed are those retrieved with each search statement.
The database contains the following words:
| Search Statement |
freeWAIS-sf-1.1 | Expected Results |
|
|---|---|---|---|
| No Stemming | Stemming | ||
| play | play | play plays |
play |
| plays | plays | play plays |
plays |
| play* | play plays player playful playground |
player playful playground |
play plays player playful playground |
| plai | play plays |
||
| plai* | plain plainly plains plaintiff plait |
play plays plain plains plainly plaintiff plait |
plain plainly plains plaintiff plait |
| plain | plain | plain plains |
plain |
| plain* | plain plainly plains plaintiff |
plain plainly plains plaintiff |
plain plainly plains plaintiff |
| pla* | play plays player playful playground plain plainly plains plaintiff plait |
play plays player playful playground plain plainly plains plaintiff plait |
play plays player playful playground plain plainly plains plaintiff plait |
© 1996-2003 BioPD -
University of Padova (Italy) - Author: Leopoldo
Saggin - Last version: January
28, 2003
Best efforts were made to provide correct information, however
this document may contain technical inaccuracies and/or
typographical errors.
The author declares that this material is provided "as
is" without any warranty even in the implied warranty of
merchantability or fitness for a particular purpose.
All trademarks cited inside this document are property of their
respective owners