Text Mining: Classification, Clustering, and Applications
Text Mining: Classification, Clustering, and Applications

Author: Ashok Srivastava & Mehran Sahami
Publisher: Chapman and Hall/CRC, 2009
Pages: 328
ISBN: 978-1420059403
Aimed at: Academic
Rating: 4
Pros: Tackles and important topic
Cons: Dry and dense style; does not put text mining into context
Reviewed by: Mike James

Text Mining is a hot topic. Its importance has been growing as the size of the text mountain online has grown. Does this book help us deal with the sheer volume of text data available?

 

 

Author: Ashok Srivastava & Mehran Sahami
Publisher: Chapman and Hall/CRC, 2009
Pages: 328
ISBN: 978-1420059403
Aimed at: Academic
Rating: 4
Pros: Tackles and important topic
Cons: Dry and dense style; does not put text mining into context
Reviewed by: Mike James

 

Text Mining is a hot topic. Its importance has been growing as the size of the text mountain online has grown. Today we are in a position to analyse large amounts of text in statistical ways that provide sensible results. Not only has the subject become possible because of the Internet and the web in particular but it has also become important because of it. The ability to make "sense" out of the text mountain promises all sorts of money-making opportunities.


Banner

 

This book is about the statistical treatment of text. It's a typical academic book created by putting together chapters that are fashioned in the style of formal papers. It deals with the statistical techniques needed to apply classification and clustering to a source material that at first sight doesn't seem to have a statistical structure. In this case we aren't interested in grammar or meaning, just frequencies of co-occurrence or frequency of occurrence within particular types of text.

The chapters are all very dry and all very mathematical. If you are looking for a simple explanation of the algorithms involved then be prepared to work hard to extract the practical from the theoretical. Even the examples and case studies are presented in a way that maximises the difficult in understanding what is going on.

Some of the chapters also suffer from poor English and this is not to criticise the authors but to point out that the sub-editors could have don't a better job of correcting the misuse of phrases.

The topics discussed include kernel methods; detecting bias in news reports using a range of methods; relaxation labeling; topic models; non-negative tensor factorisation; text clustering; adaptive filtering and text search. All these are presented in full mathematical glory and without much attempt at making anything seem easy. However, if you are prepared to put in the work and look up alternative explanations of the basic statistical methods (or if you know them already) then you can begin to see how methods that you might not have thought of as applicable to text analysis do actually do good work.

What you will not find in this volume is much discussion of standard text mining tools nor statistical packages that might help with the task. This is a book mainly on theory with some examples of application.  Obviously the book doesn't cover more "semantic" approaches to text mining, but it also doesn't really mention them as adjuncts or to put the entire field into context. 

This most certainly isn't a book for everyone interested in natural language processing and it will leave many practically-minded programmers completely baffled. However, if you have the math and the time there are ideas that can be turned to good advantage.

My best guess, however, is that because of the style adopted, this will be another academic book that sits on academic shelves.


Banner


MySQL Cookbook, 3rd Ed

Author: Paul DuBois
Publisher: O'Reilly
Pages: 836 
ISBN: 9781449374020
Print:1449374026
Kindle: B00M7EN798
Aimed at: MySQL developers
Rating: 5
Reviewed by: Kay Ewbank 

Is MySQL Cookbook the best book on MySQL? This latest edition certainly keeps up its reputation as the go-to reference.



Interactive Project Management: Pixels, People and Process

Author: Nancy Lyons & Meghan Wilker
Publisher: New Riders
Pages: 192
ISBN: 978-0321815156
Audience: Anyone concerned with managing a web-related project
Rating: 4
Reviewer: Kay Ewbank

This look at developing interactive applications such as websites, apps and kiosks, is from a project manager’s v [ ... ]


More Reviews

Last Updated ( Thursday, 26 August 2010 )
 
 

   
RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2015 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.