NLP – natural language processing – has been around for decades. NLP has promised the world to be able to “soften” the edges of the computer by humanizing it. Much research, many start ups and many Phd’s have been given for the topic of NLP.
But for all of the hype and all of the research projects dedicated to NLP, the real world applications for NLP have been scant. The most notable NLP application has to be Siri.
So what have been the drawbacks to NLP? Why hasn’t NLP fulfilled its potential and promise? NLP certainly has been given the chance to succeed. Why has it consistently fallen short of its promise?
There are a whole host of reasons for the limited success of NLP.
NLP makes the assumption that there is an IT department or a consultant available to make the implementation successful. The problem is that most IT departments are not motivated and do not have the knowledge to make NLP successful. Nor are there a lot of consultants around that have a sterling track record with NLP either. Making the assumption that there is a motivated and knowledgeable person around to implement NLP is a very dicey assumption. It has not proven to be true in most cases.
NLP has a lot of moving parts. There are many ways to become side tracked in an NLP project. For example, one of those parts is a taxonomy/ontology. Taxonomies/ontologies are necessary in the implementation of NLP. It is easy to get lost in the building and/or management of the taxonomy. Because there are so many parts to an NLP implementation, it is easy to lose focus. NLP is in many ways like an erector set. There are so many possibilities and so many variations that it is easy to get lost in the implementation process.
NLP focuses on content. Content refers to the attention that is paid to the text that is being processed. It certainly is necessary to focus on content. But focusing on content is only one aspect of managing text. Another aspect of text that is usually an afterthought is that of context. In order to do a proper analysis of text it is necessary to focus on both content AND context. The problem is that content is relatively easy to manage while context is much harder to manage than content.
NLP is academic oriented. NLP – for the most part – has been built as an academic exercise by academics. There are many problems associated with being built by academics. Academics focus on theory and are sidetracked by theoretical discussions and issues. Academics have never built a real world application. Academics have never learned how to manage a budget. Academics do not know what it means to be “business value” driven. Academics would rather study something than build a working, useful product. And these are just a few of the issues that arise when academics are allowed to lead the charge.
NLP processing has limited functionality when it comes time to actually process the text. For the most part NLP focuses on taxonomy/ontology processing. Taxonomy/ontology processing is absolutely necessary. But it is not all that needs to be done with text by a long shot. There are MANY other aspects of text that need to be encompassed by NLP in order for NLP to be considered a full package. Unfortunately, the developers of NLP seem to have a blind side when it comes to the full range of processing that is necessary.
NLP often confuses the activities of gathering and analyzing text with producing analytical results from the text. This is unfortunate because there are two very distinct activities that occur here. It is a full time job reading text and preparing text for analysis, then doing analytical processing from the text that has been prepared. By not understanding the differences in processing, the analyst has sewn great seeds of confusion. To use an analogy, there is a big difference between sowing seeds in the ground and growing and harvesting a crop, then making and baking bread from the crop that has been grown. We have farmers who do one job well, and we have bakers and cooks who do another job well. But NLP analysts like to combine the two activities. And the mixture simply is a big mess.
NLP focuses on language. Language is one of the most complex of subjects that there is. There are many nuances of language that are VERY difficult to be discerned by the NLP processor.
NLP makes the assumption that the context of text can be discerned by looking at the text itself. In some cases this is true. But in many other cases, context must be derived externally from the text itself. Unfortunately the developers of NLP have not advanced to this understanding.
These then are just a few of the reasons why NLP has never delivered on the promises made. Put another way – if you were to have a heart operation, would you want Siri to be telling the surgeons what to do? Of course you wouldn’t. There simply is limited functionality behind even the most sophisticated of NLP interfaces.
These reasons then explain why NLP has not lived up to its hype. These very fundamental reasons explain why – despite all the investment, despite all the hype, despite all the promises – that NLP has not delivered very much commercial success.
Now you can hear and see Bill Inmon on the Internet. Take a look at his new videotape education series on safaribooksonline.com . Bill covers IT topics from A to Z.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Provide your comments below
What's Your Reaction?
William H. Inmon (born 1945) is an American computer scientist, recognized by many as the father of the data warehouse. Bill Inmon wrote the first book, held the first conference (with Arnie Barnett), wrote the first column in a magazine and was the first to offer classes in data warehousing. Bill Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions.