|
|
|
Community Publishing
|
|
Turning the High Cost of Hidden Data into High Value Knowledge
By Dave Bernard on Monday, February 19, 2007
Introduction "The greatest crisis facing us is not Russia, not the Atom Bomb, not corruption in government, not encroaching hunger, nor the morals of the young. It is a crisis in the organization and accessibility of human knowledge. We own an enormous 'encyclopedia' - which isn't even arranged alphabetically. Our 'file cards' are spilled on the floor, nor were they ever in order. The answers we want may be buried somewhere in the heap, but it might take a lifetime to locate two already known facts, place them side by side and derive a third fact, the one we urgently need. Call it the crisis of the Librarian. We need a new 'specialist' who is not a specialist, but a synthesist. We need a new science to be a perfect secretary to all other sciences." (Robert A. Heinlein in 1950, in a piece originally called "Where To?")
At a recent Microsoft-sponsored database conference, Paul Flessner, Microsoft Senior Vice President of Server Applications, discussed the growth of data capacity requirements in layman’s terms: · 1 MB (one million characters or so) = two novels (about 500 pages) · 1 GB (1,000 MB) = about 1900 novels · 1 TB (1,000 GB) = about 1.9 million books (requiring 15 miles of bookshelves and 50,000 trees) · 10 TB = about the size of the Library of Congress (LOC), or 19M books · 1 PB (1,000 TB) = 100 LOC's; in dollars, more money than in all the world's banks · 12 EB (1,000 PB) = total of human knowledge through 1999 (about 1.2 million LOC's) · The next 12 EB were created by 2002; 7 EB was created in 2003 alone
Clearly, information is being created at a rate that no human could possibly keep up with and we are drowning in it! Incredibly, the good news is that the cost and capacities of data storage continue to improve at rates faster than the data is growing. A gigabyte of disk space that cost $40K in 1980 costs just 38 cents today and continues to slide. The storage of data is fast becoming a commodity, even in the face of rapid expansion of information, shifting the value proposition of information systems away from accumulation and toward where and how the data is used.
The bad news is that our ability to mine this data effectively is shrinking rapidly; studies have shown that “information overloaded” business people spend up to 30% of their time searching for what they need in difficult-to-use databases, using complex tools such as reporting engines and query languages. Many IT teams themselves are struggling to meet the reporting demands of their organizations. End-user expectations have increased dramatically, as has the strategic importance of providing useful information to decision-makers.
A company needs to maximize its most important assets—data and people. Information disasters are caused not by lack of information, but rather by not connecting the right information to the right people at the right time. The ultimate goal is to unlock control of our data from applications, databases and file systems.
A New Approach
What we need is an entirely new approach to information retrieval, one that is more people-centric, that allows them to use information within the context of what they are doing. They need to have access to the right information, but only when they need it, and they need to be assured that access is guaranteed, easy, fast and reliable.
In everyday human living, we usually obtain information by asking questions in a natural language, such as English. When a human being considers a problem in the corporate world, their view of it may be only distantly related to the physical organization of the data involved. This is why there exists a myriad of complex query languages and reporting tools attempting to bridge the gap between human questions and computer data storage.
This is not a software or hardware problem; it’s a human computer interaction problem. And framing the problem this way points us to a new approach to helping solve it: what if a knowledge worker could use plain English to get answers to their questions? Wouldn’t that be the ultimate in usability?
The convergence of natural language processing (NLP), database and web technology is now making it possible to create state-of-the-art products and services that provide fast, easy and reliable ways to get immediate answers to ad hoc questions out of databases using everyday written or spoken English. The natural language engine alone supports semantic modeling of data that enables a great deal of ad-hoc capability with little or no IT intervention. The real problems NLP systems attack are related to accessibility; getting to the desired answer (or information) immediately without any other people or project intervention (and cost). Its strength is in easily and immediately handling the unpredictable question without any interface complexities.
The goal of an NLP-based database query capability is not to replace existing reporting and querying systems, but to augment them. This new capability easily integrates within an existing IT infrastructure and ensures long-term viability through open architecture standards, common platforms and standard operating systems.
NLP solutions are best deployed attacking two particular problems: (1) the inability of C-level executives to easily and quickly get ad-hoc information from their systems and (2) the high cost of training new hires in high-turnover positions like the help desk. It is not yet been proven that it would be effective in day-to-day operational reporting tasks; if you’re doing a lot of ad-hoc day-to-day reporting, then you probably don’t have a good handle on day-to-day operations, which is the real problem in that case.
Responding to ad-hoc or even recurring queries of data warehouse information usually requires the intervention and considerable resources of an overworked IT department. For all practical purposes, the data is there but cannot be retrieved in a timely manner. A common complaint of C-level executives is the inability to get particular ad-hoc information out of their systems without getting in a long line to wait for IT to get it for them, either in the form of a one-off spreadsheet pull or a new custom report. This is no knock on IT, which is typically overworked and has long lead times (not to mention the fact their funding levels are set by that same CxO!). NLP can solve this problem by directly converting English requests into SQL queries, which are then executed by a specialized piece of middleware which, in turn, returns results to the user in a form they can use. The current crop of tools attempting to address this problem, such as those from MicroStrategy, have too long an installation process, setup and learning curve for many C-level executives.
Training costs are significant in most help desk departments, due primarily to high turnover, less-skilled workers and complex systems. A system you can talk to (either via keyboard or voice) in plain English gets users up to speed much quicker than otherwise has been the case. In this particular area, it makes sense because you can often just ask the system the same question your customer is asking you, without having to translate that into a bunch of pull-down-driven criteria. Query results are transformed into viewable output (a report, graph, etc.), just like with any other system, and sent back to the user, either in real time or via some other transport mechanism (FTP, email, etc.) Often, query results can be fed into existing processes and systems in keeping with existing uniform report presentation standards.
Getting It Done The key implementation piece of deploying an NLP-based system is the creation of a semantic model that overlays the database organization. The semantic model defines, semantically, how the various pieces of existing data are related to each other. For example, “Customers purchase products,” “Sales people sell products” and “Products have prices” are all semantic rules. Once a robust set of semantic rules has been defined, then an NLP system can resolve questions like “How many customers purchased XYZ product last year?” and “Which salespeople have sold the most of our highest priced products?”
NLP systems can be created that are smart enough to, themselves, ask questions of the user when the user poses a question that it cannot readily answer. In that way, the system automatically learns from real world usage how better to “tune” the semantic model, and it does it on the fly.
Natural language-based querying systems promise to be one of the most disruptive new technology tools we’ve ever seen, delivering instant low cost information retrieval to users who are armed only with a question and the ability to use the English language. These systems can be non-invasively deployed over existing data stores and easily integrated into existing reporting and information delivery mechanisms. No longer do knowledge workers have to spend up to a third of their time searching for answers that they need right now.
Dave Bernard The Intellection Group, Inc.
DBernard@IntellectionGroup.com
|
Page: 1 of 1
Previous Page | Next Page
| Comments |
By
John Todor
@
Tuesday, February 20, 2007
9:55 AM
|
You are touching on a very critical topic. Information overload is clearly a problem. People want answers not more information. However, if all they get is bombarded with more information it becomes stressful. To cope they engage in avoidance, become indifferent or adversarial. The business challenge is for find ways to effectively communicate. To quote the playwright, George Bernard Shaw, “The problem with communication is the illusion that it has taken place.” Businesses cannot simply push more information at customers and expect better results.
I believe that Dan Pink (A Whole New Mind) is on the right track by arguing that we are intransition from an information age to a conceptual age. People want help in making sense out of a world where information not only comes at them at an accelerating rate, it goes out of date just as quickly. In her book, The New Culture of Desire, Melinda Davis says that businesses will serve the needs of their customers if they act a Yoda and help them reduce the uncertainty that comes with change and innovation. And, by doing so, will increase their customer equity and profitability. John I. Todor, Ph.D., author of Addicted Customers: How to Get Them Hooked on Your Company (www.AddcitedCustomers.com)
|
|
|
By
Dave Bernard
@
Tuesday, February 20, 2007
12:03 PM
|
Thanks for the thoughtful comment, John. Looks like I will have to somewhat expand my bookshelf! I agree with Dan that we are transitioning to more of a conceptual age, but I think the transition is going to be long and painful. I am hopeful that systems will be created that can make sense of chaotic masses of data, but I don't yet see systems "smart" enough to figure those things out beyond relatively simple BI-oriented key performance indicators (KPI) and such. Our company is helping to push the transition along by allowing humans to naturally interact with their databases, knowing little or nothing about the inherent physical data organization. Systems can be built today that can take these questions and, in real time, "learn" more about the real world relationships between the various data entities, which ultimately gets us much closer to being able to create useful conceptual data (and semantic) models.
|
|
|
By
Maximilian Immo Orm Gorissen
@
Thursday, February 22, 2007
1:58 PM
|
I remember in the early 1980´s testing a financial software (don’t remember its name – I was running it in Digital mini computers) that allegedly gave you natural language questioning to search or to gather financial data information/ analysis from the company databases. It was simple, I think it used a pre set list of questions like “What’s this year profit?”…and it gave you an answer. I tried it for a couple of days and it worked fine for simple questions but, when you combined questions to get real information, it never delivered what promised. Remember, at that time, COBOL and Pascal for mini/ super-mini and mainframe and DOS + the first Windows for PCs was still called high tech. If you go back in time, this was an amazing piece of software….Reading your article, I understand we are still in this technology initial developing phase but, I look forward to be able to really extract from databases what we need without any hassle.
|
|
|
By
Dave Bernard
@
Thursday, February 22, 2007
7:03 PM
|
|
Great story, Maximilian. Like many "leading edge" technologies, natural language actually has quite a long (40+years) research history, but a poor commercialization history. My research into the reasons for this lead me to believe that two particular issues have been barriers in the past. First, we've just not had adequate horsepower, until now, to make a viable run at the problem. The other, more interesting issue is that it's a topic often confused with voice recognition and text retrieval paradigms. I believe that these "sexier" technologies have siphoned significant funds and interest away from implementing true semantic NLP interfaces that work against structured corporate data.
|
|
|
By
Georges Valade
@
Tuesday, March 06, 2007
4:26 PM
|
|
Good articles, it explains why NLP is the future. However I must say the exposé on the principles of NLP is, probably purposely, simplistic. We, at Delphes, have been developing NLP for 10 years now. Before you can apply semantics you must first develop a solid morphology and syntatic foundation, otherwise "garbage in, garbage out" as the saying goes. See www.Delphes.com if NLP interests you.
|
|
|
You must be logged in to post a comment. You can login here
|
|
|
|
|
|
Introduction
|
|
|
Sir Winston Churchill once said, "If you have knowledge, let others light their candles with it."
TechLINKS Community Publishers share their knowledge with the Georgia technology industry in order to help illuminate the many top-of-mind issues important to your business. Their community participation demonstrates the significant expertise and generosity contained within the Georgia technology industry.
Knowledge has no value when it is stored - it only has value when it is shared and applied. We know that businesses, educational institutions and government agencies sit upon rich veins of untapped information.
Community Publishing's mission is to tap into and release that knowledge to Georgia's technology community.
We encourage you to read the following articles on our website. To create and submit your own article and join this growing, distinguished group, click on "Create Article." |
|
|