Windows 7 “Indexing” vs. DocuLex WebSearch Document Management System

December 2, 2011adminTechnology UpdateComments Off

The term indexing is ambiguous and is defined differently in Windows 7™ Vs how it is used in Doculex’s ‘Document Management’ System called ‘WebSearch’.
What Windows™ refers to as ‘Indexing’ a document is what is called ‘A Text Search’. That is, it searches every word in every document as well as words contained in the names of documents. When a user types in a particular word that they believe is in the document, then ALL the documents that contain that word will be presented. That could be a few, or thousands. Users can ‘fine-tune’ the search by looking for the word(s) contained specifically in a particular document type such as a *.Doc or a *.PDF, etc. But that still means the user has to look through numerous files to find the exact document for which they are searching. Also, they are left wondering whether the exact document for which they are searching has been found. Also, Windows 7™ will only index the documents on a particular PC. In business, there are usually several Desktops and servers.
This weakness is understood in the world of managing Business Documents. Indexing here refers to assigning key ‘search fields’ to a particular type of document and then assigning ‘search words’ to those fields. This way the user can search by more than one word or words and get exactly the document for which they are looking. This ability allows for a number of additional search methods to find documents, including:
1. Indexing documents using ‘index fields’ and ‘specific index words’ allows one to index document that are contained in a database on a server, Vs only being able to search the documents on a particular PC.
2. Ability to find by key ‘index fields’ to find a specific document. (E.g. All ‘Invoices’ might have been indexed by ‘Invoice Number’, ‘Date’, ‘Vendor Name’, etc. Searching for ‘Invoices/Invoice Number’ will find that document and only that document if the invoice number is unique.)
3. Ability to fine-tune a search by multiple index fields. (E.g. Searching by ‘Vendor Name’ would find all Invoices by that Vendor, which might be exactly what the user wants. Searching by ‘Vendor name’ and ‘date’ would find just that particular invoice, etc.)
4. Ability to search by ‘date range’, as well as multiple index fields.
5. Ability to search by ‘automated information’ (e.g. upload date, system date, time, etc.)
6. Ability to search by ‘Google Style’ search parameters such as:
a. Fuzzy Logic searching will find a word even if it is misspelled. For example, a fuzzy search for apple will find aple, or appl, etc.
i. Fuzzy searching can be useful when the user is searching text that may contain typographical errors.
ii. Fuzzy Searching is also good for finding text that has been scanned using optical character recognition where, for example, a O is mistaken for a 0.
iii. Fuzzy Searching is useful for finding text that has been ‘OCR’d’ by scanners for the same reason as shown in ‘ii’ above.
b. Stemming is a search capability that extends a search to cover grammatical variations on a word. For example, a search for ‘fish’ would also find ‘fishing’. A search for applied would also find applying, applies, and apply.
c. Phonic searching looks for a word that sounds like the word you are searching for and begins with the same letter. For example, a phonic search for Smith will also find Smith and Smythe.
d. Natural Language search uses “any words”, which is any sequence of text, like a sentence or a question.
e. Synonym search can find words that are synonymous with the search words. So when someone remembers that the document might have contained a word similar to another word, running a synonym search finds all similar words. Such as, “The article I am looking for is about cars” then it would search for documents that have the word, Automobile, Motor Vehicle, etc. After all, an article about cars may never use the word ‘car’ but that is how the user may remember it.

In Windows 7™, searching by all the words in all the documents on a PC requires that the user go through several, if not all, documents on the PC to find the particular document for which they are looking. Plus, they can only search for the documents on their PC. They cannot search for documents on other servers or those on other users PC’s.