To our valued Customers and Dealers
DocuLex, Inc. is pleased to announce that we have expanded our product and service offerings by partnering with three other companies under the corporate banner of Protected Trust, LLC.
The new entity, Protected Trust, brings decades of experience in managed services and infrastructure that provide solutions to keep digital information assets secure, private, and compliant. Protected Trust believes trust and privacy are essential for building meaningful relationships.
Because of that, we want you to feel comfortable with the changes and to answer whatever questions you may have. Contact us or visit our announcement page for additional information.
Expanded products and services for Protected Trust include email services, email encryption, document management, email archiving and discovery. Data center services, such as server colocation, virtual dedicated servers, private data suites, and much more. Protected Trust provides security and privacy throughout the data lifecycle – from creation to destruction.
Thank you for your trust. Let us know how else we can help your business manage its risk and accomplish its mission.
You can download the Protected Trust Press Release here.
David Bailey, VP Business Development
(formerly CEO of DocuLex, LLC)
Metadata for Document Management
Part Two: The three roles of metadata
Metadata can be considered to have three applications within any document management system:
• Search – by searching on specific metadata files rather than the full text content of documents, the reported results are almost always delivered much faster and with better accuracy, thus saving valuable time.
• Display – it is often very helpful, when viewing a list of documents retrieved through a search, to view the metadata fields for each document along with the original file. With WebSearch the user is able to customize the metadata fields that are displayed with search results.
• Organization – It is often helpful to organize the display of documents based on their metadata. For example, if a collection of documents have metadata for client name and contract id, then it is possible for a document management system to allow documents to be presented in a multi-level tree that has clients listed at the top level, and the contract id for each client branching off underneath the branch for the respective client. WebSearch allows an organization to define any number of such trees (known in WebSearch as File Rooms), organized into any hierarchy desired by the client (for more information on the WebSearch File Rooms, see next months Part Tree “Searching and Beyond – WebSearch’s File Rooms” ).
Additionally, WebSearch allows clients to specify how documents are physically stored in its directory structure by metadata, which can make it practical to access managed content through Windows Explorer if needed (generally not a recommended practice).
Once you start to see the world from a document management perspective you’ll start to see potential metadata everywhere. Imagine, for example, importing data from a spreadsheet, in which every column has a header. The column headers become metadata names, and the data underneath them become just more metadata. The rows of data in a database table, coupled with the names of the fields in the database, once again can suddenly start to be viewed as metadata.
Perhaps the greatest example of hidden metadata is the information in the heading of an email The ‘from’ and ‘to’ addresses, as well as the ‘subject’ and ‘date received’ of email messages, would all make very fine metadata for search and retrieval.
A good document management system will not only recognize and index all the metadata contained in files, it will also have the ability to generate some of its own. For example, it could add a field indicating the date and time the document was entered into the system (which could then be used by document retention algorithms), or the size of the file. The “document type” is a common field utilized by EDMS systems that is not a common native file meta data type. The “document type” field is particularly useful in a document management system since it becomes an organizations primary electronic repository for all company documents. Due to this particularly specialty, the document type description is important to users when seeking out a particular type of a document, such as “invoice” , “applications” , “purchase order” or “contract”.
Imagine that email messages are being indexed into a document management system – maybe by some automated process that catalogs every email message passing through a company’s email domain. The algorithm could be configured to flag, via metadata, any emails that contain certain designated ‘watch words’ or the names of competitors or competing products (or virtually any other criteria).
When a user electronically signs a document, that signature is almost certainly calculated and stored as metadata as part of the document.
The addition of calculated metadata greatly increase the value an organization realizes from incorporating an electronic document management system (EDMS) into its infrastructure..
Associating Metadata with Files at Import Time
It is not necessary that metadata be associated with content at the time it is created, or even through use of the software that created the content. It is typical that EDMS systems provide tools and/or support protocols for the addition of new content to the system. It is not uncommon for these interfaces to the EDMS to allow for the association of metadata with content as it is being added. The means provided can range from providing a user interface for manual data entry to the parsing of XML and/or other file formats that associate metadata with the content to be imported. Additionally, the EDMS may automatically associate calculated metadata (such as the current data and time) as the content is being added.
Upcoming Part Three: Searching and Beyond
Metadata for Document Management
Part One: Introduction
The concept of metadata is one of the most important and fundamental concepts in document management. The term literally means “data about data”. If you’ve been working in technology for any time you’ve undoubtedly seen many examples of metadata – however it is so widespread that it’s easy to take for granted. For example, when editing a document in Microsoft WORD™ and clicking on File and Properties you are presented with a dialog box that allows you to enter information about your document which may not appear explicitly in the document itself. In this case, you are allowed to enter (among other fields) the title of the document, the name of the author, the date published and associated keywords (which may someday be used to search for and retrieve the document).
If you inspect the source of virtually any commercial web page you will see normally-invisible lines near the top that contain metadata about that page.
<title>About Us</title> <meta name=”robots” content=”index,follow” />
<meta name=”revisit-after” content=”7 days” />
<meta name=”keywords” content=”EDMS, electronic document management”/>
<meta name=”description” content=”The world’s top EDMS solution.” />
LISTING 1 – An example of metadata embedded in a web page.
In this case the value of the embedded metadata is quite apparent; it instructs Internet search engines how to navigate the site for indexing, and provides a recommended revisit criteria. Additionally, it contains keywords that are intended to bring Internet visitors to the site when they conduct searches on those search engines.
Some image file formats, such as GIF images, have provisions for embedded metadata that might identify the name of the owner of the intellectual property. Like MS-Word documents, Adobe PDF™ documents provide for the embedding of metadata. In fact, with the exception of regular text files you may be challenged to find a modern file format that does not make provisions for the embedding of some form of metadata. Take it to the next level and of PDF and imagine having the ability to customize the meta data field names, something that directly relates to your business and each particular document.
Custom metadata allows computer users to better manage their files by keeping them more aware of the contents and attributes of those files. Imagine how nice it would be to have your computer run a program that scans millions of files and quickly establishes a visual queue of all relevant documents.
Upcoming Part Two: The Three Roles of Metadata
Storage consideration for document management solution
Once you have selected your document management software and specified hardware, the next important consideration is mass storage. Organizations should provide a secure system and environment for operating the document management software, the database server and the mass storage server. Security and continuity are top of mind when making your selection for hardware and the environment in which it lives.
Building a mass storage hardware solution for documents need to be done after defining the requirements of storage capacity. In order to do so, the following points should be noted:
- Type of digital documents to be stored: This will help in getting an approximate size of file. For instance, scanned in PDF documents take up more space compared to Microsoft .docx documents. Color JPG documents generally take more space than color TIF documents. Each scanned image of a document might take up 50k of space per page vs. one multipage .docx file using the same amount of space.
- File compression can play an important role when considering the storage requirement. Group IV compression is the industry’s most common and effective use of valuable storage. JPG compression is commonly used for color, but for black & white business documents JPG loses archive quality pixels, rendering the file lossy and not a legal representation of the original.
- If scanned documents are OCR’d for purposes of searching full text content, storage requirements for each page will increase by approximately 10%.
- When a document is archived, all the parts of the document should be stored. For instance, email messages that are archived may contain attachments of text based documents and images. The proper storage protocol is referred to as “enveloping”, thus petrifying all original dates, hidden meta data and enclosing all attachments.
- Documents stored in repository should be stored in an ideal format. For instance all documents can be converted to a format that saves more space in storage. But legally, and to maintain archival quality, should the document be stored in the native format.
It is important for the organization to understand that capacity planning is a must; otherwise the electronic document management experience will not work appropriately, users will voice their disappointment and the system will not demonstration peak performance and productivity. Proper planning of storage helps in determining the right hardware configuration for the solution. It is also best to make an approximation for the next 3 -5 years of storage needs before procuring the hardware.
A final consideration is whether your organization might be better served by assigning your document management and mass storage hardware to a managed service provider. Managed service providers, such as DocuLex, host and manage your hardware and software in a SAS 70 / SSAE 16 standard certified environment. The entire solution is professionally managed in a turn-key secure server hosting facility which is transparent to users of the software.
Implementation plan for your Document Management or Email Archiving / Discovery Solution
For successfully executing and implementing a Document Management System (EDMS) project, a lot of important documents need to be created. These multiple documents state the scope, constraints and the vision of the project to be achieved by implementing a document management or email archiving and discovery system. Below are some of the documents that can be useful for planning the project execution:
• Project Execution Plan – This is a project management plan for the implementation of the document management / email archiving solution. How many libraries are to be created. User permissions per library. Specific field information on the various document types. Methods of capturing searchable meta data, such as, key from image, barcode and zonal OCR.
• Scope of Execution Plan – The functionality that is to be included should be identified, for example, the extent to which the document management system is to be interfaced with the line-of-business solution.
• Project Schedule – Supporting the Project Execution Plan, this document provides a list of activities, tasks, milestones, target dates, resource utilization and costing details of the project.
• Risk Management Plan – In case the Project Execution has any issues then there should be a contingency plan for it. This document manages the issues that arise during project execution.
• System Changeover Plan – If there is any existing EDMS system, then this document has the plan for migrating the legacy systems. Details such as whether the legacy system will be decommissioned are included as part of this document. Some legacy document management systems have archaic storage methodology and are quite often designed for keeping the data hostage. There may be a loss of some legacy data after the changeover is complete.
• Data Migration Plan – The document will define the scope, methodology, and deliverables for addressing the requirements of migrating data from existing system. Migrating data from one system to another may compromise some of the data and could also take months to complete the migration.
• Training Plan – This document will define the scope, methodology, and deliverables necessary to meet the requirements of the Training Needs Analysis/Specification. Train the trainer is the preferred practice when DocuLex performs the training sessions.
• System Integration Test Plan – This document defines the testing requirements for different systems, backup and recovery, and any associated technical infrastructure. ODBC connectivity testing between the line of business database and Archive Studio is common prior to going live. DocuLex add-on’s and plug-in’s for Microsoft Office.
• System Acceptance Test Plan – This document defines the scope, methodology, and deliverables to meet the requirements. A complete rollout into production can take from two weeks to six months, depending on the complexity requirements, multiple training locations and amount of users. This may not include the complete migration of legacy data.
• Post implementation Review Document – Once the document management / email archiving system is implemented this document has the parameters based on which it can be ascertained if the document management / email archiving solution met the objectives identified for the implementation.
Apart from the various documents, it is also important to identify the objectives of the system and document them. The follow apply:
o Improve efficiency through reduction of business processes and systems.
o Provide consistent data to customer queries through well formatted reports.
o Define business rules and automate them with business process workflow.
o Define ownership of data and regulate permissions.
o Improve document control and security.
o Provide architecture that is:
a. Scalable and extensible for growth and wider deployment.
b. Flexible to cater to diverse business needs and litigations.
o Rationalize different data repositories and, wherever needed integrate different databases.
o Decommission systems that are redundant and those not important to business.
Though documenting can be a tedious process, the benefits of documentation are important. These documents can act as a design and implementation roadmap that is easily understood by all parties involved in the implementation of the document management / email archiving solution. This documentation depicts the boundaries of interoperability by all parties and products.
Document Management and the extensive use of XML
A common buzzword across the Internet and developers is Extensible Markup Language (XML). Well to a normal user it might not mean anything as they are just interested in browsing the web. But from a technology perspective XML is a powerful tool to describe data. If you wish to transmit data from one place to another or store it in Web applications XML can be used. Not only web but also in non web applications, XML is used to transmit data from one system to another. XML is widely utilized in non-proprietary document management and content management environments.
XML is composed of numerous technologies and specifications. Some of the important pieces that are needed for XML are
• XML 1.0 is the base specification upon which the XML is built. The basic syntax for XML documents, the rules that XML parsers need to follow and other specification need for creating XML document are part of base specification.
• DTD (Document Type Definition) is a schema language of XML that define a document. The successor of DTD, XSD (XML Schema Definition) is recommended by W3C.
• Namespaces help to distinguish XML vocabulary from one another.
• XPath helps in addressing different parts of an XML document. For instance, XPath can be used to get “all the names” from a XML document.
• In order to display a simple XML on a web, we can use Cascading Style Sheets (CSS). For more complex cases, Extensible Stylesheet Language (XSL) can be used to transform documents from one type to another and display formatted data on web.
• XQuery provide a means of querying data directly from XML documents.
If you are a developer who wants to explore XML then there are a wide variety of tools and resources available on the web. XML being platform and language independent can be used anywhere independent of the operating system. XML is an ideal way for communication among web applications. Below are a few examples where XML can be used
• The load on the web servers can be reduced by keeping all information on the client for as long as possible, and then sending the information to those servers in one XML format.
• XML is used with the DocuLex WebSearch API when communicating between applications.
• XML can be used instead of HTML to display a web page. This can be achieved by transforming XML into HTML via XSLT, or display directly in browsers via CSS.
• XML can also be used as a means of sending data for Remote Procedure Calls (RPC). RPC is a protocol that allows objects on one computer to call objects on another computer to do work, allowing distributed computing.
• For sending data across e-commerce websites, XML is used extensively to send data.
• XML is used when DocuLex document management software is receiving files from 3rd party applications with meta data values.
Much of the XML functionality is achieved by writing data in the format mentioned in XML specifications. XML being a text-based technology, you can create XML documents in Notepad or any text editor. Using any latest browser you can output your XML documents.
Document Management systems (DMS), such as DocuLex Archive Studio, have been focusing on managing documents in an effective way much before XML was born. But by using the power of XML, DMS can better manage documents at granular level and interact with other systems easily. Another major use of XML in DMS is that information can be easily shared across without storing it in multiple places. The benefits are plentiful but it all depends on how effectively XML is used to get the maximum benefits.
Document management software limits liability with automated document control
Every business process in an organization creates some form of document. Such documents are prone to mismanagement due to volumes, type and format of the document. The mismanagement can, in-turn, impact the organization’s capability for minimizing their legal liability.
In a business environment characterized by the accelerating creation of business information, it is not just important to be able to store, find, and retrieve that information quickly, but also to be able to easily delete that information from your system when its usable legal self-life has expired.
The continuing increase in government regulation from Sarbanes Oxley to HIPAA, as well as the increase in frivolous litigation further punctuates the need for companies of all sizes to create and execute appropriate records retention, legal holds and document deletion policies. Consider the following FREE webinars to learn more about records policy management and implementing a productive Document Management solution for your organization.
Conserving resources and reducing environmental impact by using available technology.
This is what you get with DocuLex WebSearch document management software and cloud solution. Sharing documents with others in your organization or with those parties on the outside generally comes with a price and a certain degree of risk. Hard costs include the expenses associated with copying or printing materials, plus the labor for reproduction and re-filing the original documents. Depending on the routing method for in-organization or third party delivery, the documents could pass through several hands and several modes of transportation before arriving to its destination. Reducing the energy consumption associated with this exercise, enforces the “green” standard for responsible members of our society.
Learn how WebSearch’s WorkSpace SharePortal will address these concerns and save organization’s time, money and environmental resources!
Learn more about conserving resources.
Discover the many different ways of Searching with DocuLex Archive Studio’s WebSearch and leverage this knowledge by adding more value to your organization
Discover the many different ways of Searching with DocuLex Archive Studio’s WebSearch and leverage this knowledge by adding more value to your organization.
Your document’s content typically contains 80% of an organization decision making data. Tapping into this information with your document management solution is like accessing corporate memory on-demand. Being able to find this data in a timely and predictable manner will minimize discovery costs and provide the much needed historic information to make accurate present day decisions. Searching options for discovering your information is predetermined by the decisions made when filing the data. When filing documents electronically, meta data values (searchable keywords) are important when it becomes time to search and retrieve the file or files that qualify with those values. In addition, making all of the words on the document searchable is also an added benefit for executing a strategic search.
Documents may have been filed using custom meta data values that allow the user to execute a targeted search by one or many categories, such as “customer name”, “date” and “document type”. The user can also peruse though electronic folders that are categorized by name and document type. This is a common method of finding files, as you would with a paper filing system. When searching for electronic document with Archive Studio’s WebSearch, a more tactical approach can get you to your requested documents, quicker and exactly to the page and section of the page where your decision making information is located. In addition, strategic searching ensures that you have located “every” instance of information you are interested in.
Email archiving with WebSearch is another benefit to your organization if your corporate policy is to save every relevant message and attachment, within your retention period, for future discovery. Every email message is archived into WebSearch, real-time, even prior to the recipient receiving the message. Staff members have the ability to search their messages even after they are removed from Microsoft Exchange. The super-users, Admin or security officer “may” be permitted to search across all email messages. Everything that is text is searchable; meta data such as From, To, Subject, Date as well as the Body of the message. In addition, all text contained in the attachments is searchable. WebSearch will instantly display all qualifying messages for your review. Even the specific page of an attachment will display with the targeted content highlighted for easy discovery.
The cost of creating information assets on paper, MS-Word, Excel, MS-Exchange or other data containers is monumental. Leverage that investment to your benefit, today and in the future with a solution for tapping into that valuable data.