SQL Server Full Text Search

SQL Server Full Text Search' is an inexact string matching technology for SQL Server. It is a powerful and fast way of referencing the contents of almost any character-based column on SQL Server 2000.

Full text indexes must be populated and are stored inside full-text catalogue. Full text searching is installed as a separate service called MSSearch. Full-text searching is an optional component of SQL Server 7.0/2000/2005

Types of Searches



Proximity based searches – cf. Section
Inflectional based searches – cf. Section
Weight based searches – cf. Section
Prefix term searches – cf. Section
Binary document search – cf. Section

A Common Example

The following article contains SQL code and simple worked examples. These examples refer to an imaginary table called "Athletes". This table contains information about various athletes their events and results in the Olympic Games. Inside this table you will find the following column names:
id
name
surname
country
notes

There exists a prior full-text index on the column "notes". The primary key of the table is on the column "id"

Full Text Query Syntax



There are 4 main keywords that can be invoked when full text searching is enabled: They are: "CONTAINS", "CONTAINSTABLE", "FREETEXT" and "FREETEXTTABLE". Their functionality and difference are consolidated in the following Keyword 's Meaning and their Returns
CONTAINS
Supports complex syntax to search for a word or phrase using boolean operators,prefix terms,proximity operators.
Return:
A simple boolean

CONTAINSTABLE

Supports complex syntax to search for a word or phrase using boolean operators,prefix terms,proximity operators.
Return:
A table with rankings of how well the match was.

FREETEXT
Intuitive syntax to match the meaning and not the exact wording of the words in the search condition.. Has the ability to match plurals and other inflectional forms of words. Multiple words are considered with "OR".

Return:
A simple boolean

FREETEXTTABLE
Intuitive syntax to match the meaning and not the exact wording of the words in the search condition.. Has the ability to match plurals and other inflectional forms of words.Multiple words are considered with "OR".

Return:
A table with rankings of how well the match was


The rank is just a number from 0 to 1000 that indicates the accuracy of the search result. The higher the value the more accurate the match.

Proximity Searches

Full-text searching has the ability to find words that are located physically near each other. This is achieved by using the keyword "NEAR" between the 2 search targets.

Microsoft does not explicitly state what exactly it means by the term "NEAR". However it is considered to be around 8 to 10 words in most circumstances. The keyword alludes to the possibility that the algorithm used to calculate proximity may be a Nearest neighbor search (NNS) or possibly a Levenshtein distance algorithm, both of which are distance-based similarity functions. Given a set S of n objects lying in some space U with a similarity so that given a query q ∈ U, one can locate the point that is most similar to q among the points in S.

In T-SQL syntax, the keyword "NEAR" can also be substituted with the "~" operator to achieve the same results.

Proximity Search Simple Example #1 using the "NEAR" keyword

SELECT name, surname
FROM CONTAINSTABLE(Athletes, *, '("Gold" NEAR "Finalists")') As ct
JOIN Athletes As e
ON ct.[KEY] = e.id

Proximity Search Simple Example #2 using the "~" operator

SELECT name, surname
FROM CONTAINSTABLE(Athletes, *, '("Gold" ~ "First") OR ("Silver" ~ "Second") OR ("Bronze" ~ "Third")') As ct
JOIN Athletes As e
ON ct.[KEY] = e.id

Inflectional Searches

Full-text searching has the ability to search for word inflections, that is, for differing forms of the same word that have the same general meaning. For example: "swim", "swam" and "swimming"

Both singular and plural forms of a word can be returned - It can also handle gender and neutral form of words, verbs and adjectives. This is achieved by using the keywords "FORMS OF" and "INFLECTIONAL" before the word of interest.

 Inflectional Searches Simple Example #1

SELECT name, surname
FROM Athletes
WHERE CONTAINS(Notes, 'FORMSOF (INFLECTIONAL, swim)')

Prefix Term Searches



Full-text searching has the ability to search for word beginning with a certain term. This is achieved by using a clever combination of single and double quotes followed by an * symbol.

Prefix Term Simple Example #1

SELECT name, surname
FROM Athletes a
WHERE CONTAINS(*, '"swim*"')

Binary Document Search

Full-text searching has the ability to search through binary data. The supported types are as follows:
MS Word (*.doc)
MS Excel (*.xls)
MS PowerPoint (*.ppt)
HTML (*.htm or *.html)

Weighted Searches


Full-text searching has the ability to associate a higher preference for certain terms over others. This is achieved by using the function: “ISABOUT” and the “WEIGHT” keyword. The keyword is associated with any decimal value between 0.0 and 1.0. The higher the weight the greater preference the match will be given

Weighted Searches Simple Example #1


SELECT name, surname
FROM Athletes a
JOIN CONTAINSTABLE(Athletes, notes, 'ISABOUT(Bronze WEIGHT(.2), Silver WEIGHT(.4), Gold WEIGHT(.8))') ct
ON a.id = ct.[KEY]
ORDER BY Rank DESC

Comments

Popular Posts