Spell correction, also known as:
and so on, is a software functionality that suggests alternatives to or makes automatic corrections of the text you have typed in. The concept of correcting typed text dates back to the 1960s when computer scientist Warren Teitelman, who also invented the “undo” command, introduced a philosophy of computing called D.W.I.M., or “Do What I Mean.” Instead of programming computers to accept only perfectly formatted instructions, Teitelman argued that they should be programmed to recognize obvious mistakes.
The first well-known product to provide spell correction functionality was Microsoft Word 6.0, released in 1993.
There are a few ways spell correction can be done, but it’s important to note that there is no purely programmatic way to convert your mistyped “ipone” into “iphone” with decent quality. Mostly, there has to be a dataset the system is based on. The dataset can be:
Manticore provides the commands CALL QSUGGEST
and CALL SUGGEST
that can be used for automatic spell correction purposes.
Both commands are available via SQL only, and the general syntax is:
CALL QSUGGEST(word, table [,options])
CALL SUGGEST(word, table [,options])
as option_name[, M as another_option, ...] options: N
These commands provide all suggestions from the dictionary for a given word. They work only on tables with infixing enabled and dict=keywords. They return the suggested keywords, Levenshtein distance between the suggested and original keywords, and the document statistics of the suggested keyword.
If the first parameter contains multiple words, then: * CALL QSUGGEST
will return suggestions only for the last word, ignoring the rest. * CALL SUGGEST
will return suggestions only for the first word.
That’s the only difference between them. Several options are supported for customization:
Option | Description | Default |
---|---|---|
limit | Returns N top matches | 5 |
max_edits | Keeps only dictionary words with a Levenshtein distance less than or equal to N | 4 |
result_stats | Provides Levenshtein distance and document count of the found words | 1 (enabled) |
delta_len | Keeps only dictionary words with a length difference less than N | 3 |
max_matches | Number of matches to keep | 25 |
reject | Rejected words are matches that are not better than those already in the match queue. They are put in a rejected queue that gets reset in case one actually can go in the match queue. This parameter defines the size of the rejected queue (as reject*max(max_matched,limit)). If the rejected queue is filled, the engine stops looking for potential matches | 4 |
result_line | alternate mode to display the data by returning all suggests, distances and docs each per one row | 0 |
non_char | do not skip dictionary words with non alphabet symbols | 0 (skip such words) |
sentence | Returns the original sentence along with the last word replaced by the matched one. | 0 (do not return the full sentence) |
To show how it works, let’s create a table and add a few documents to it.
create table products(title text) min_infix_len='2';
insert into products values (0,'Crossbody Bag with Tassel'), (0,'microfiber sheet set'), (0,'Pet Hair Remover Glove');
As you can see, the mistyped word “crossbUdy” gets corrected to “crossbody”. By default, CALL SUGGEST/QSUGGEST
return:
distance
- the Levenshtein distance which means how many edits they had to make to convert the given word to the suggestiondocs
- number of documents containing the suggested wordTo disable the display of these statistics, you can use the option 0 as result_stats
.
call suggest('crossbudy', 'products');
+-----------+----------+------+
| suggest | distance | docs |+-----------+----------+------+
1 | 1 |
| crossbody | +-----------+----------+------+
If the first parameter is not a single word, but multiple, then CALL SUGGEST
will return suggestions only for the first word.
call suggest('bagg with tasel', 'products');
+---------+----------+------+
| suggest | distance | docs |+---------+----------+------+
1 | 1 |
| bag | +---------+----------+------+
If the first parameter is not a single word, but multiple, then CALL SUGGEST
will return suggestions only for the last word.
CALL QSUGGEST('bagg with tasel', 'products');
+---------+----------+------+
| suggest | distance | docs |+---------+----------+------+
1 | 1 |
| tassel | +---------+----------+------+
Adding 1 as sentence
makes CALL QSUGGEST
return the entire sentence with the last word corrected.
CALL QSUGGEST('bag with tasel', 'products', 1 as sentence);
+-------------------+----------+------+
| suggest | distance | docs |+-------------------+----------+------+
with tassel | 1 | 1 |
| bag +-------------------+----------+------+
The 1 as result_line
option changes the way the suggestions are displayed in the output. Instead of showing each suggestion in a separate row, it displays all suggestions, distances, and docs in a single row. Here’s an example to demonstrate this:
CALL QSUGGEST('bagg with tasel', 'products', 1 as result_line);
+----------+--------+
value |
| name | +----------+--------+
| suggests | tassel |1 |
| distance | 1 |
| docs | +----------+--------+
This interactive course demonstrates online how the spell correction feature works on a web page and experiment with different examples.