Big Data: A Very Short Introduction
Article Index
Big Data: A Very Short Introduction
Chapters 5 - 8; Conclusion

Author: Dawn E. Holmes
Publisher: Oxford University Press
Pages: 125
ISBN: 978-0198779575
Print: 0198779577
Kindle: B076645GRH
Audience: Everyone
Rating: 4.5
Reviewer: Ian Stirk

Chapter 5 Big data and medicine

Having set the background as to what Big Data is, and some of its analytic techniques, the book now moves towards industry specific usage. The healthcare industry can use Big Data to identify patterns to reduce costs and optimise profits (e.g. mine social media for unpleasant side-effects of drugs).

The chapter next looks at the techniques underlying Google’s Flu Trends project, where the aim was to quickly identify the spread of flu (from Google searches) to predict, target, and reduce its subsequent impact. Some useful techniques are discussed, before concluding the project was ultimately a failure, largely due to the assumption that people behave the same during an epidemic as in its initial stages. However, lessons were learned, the 2015 Nepal earthquake used mobile phone records to track and predict the movement of people and was able to supply targeted assistance faster than previously.  

Next, the use of Big Data with smart medicine is examined. Although we may each take similar medicines, we each have an individual response, Big Data can be used to identify patterns that show what medicines work optimally for certain individuals (i.e. targeted smart medicines).

The chapter has some interesting facts, for examples, the average US hospital in 2015 stores over 600 Tb of data, much of it unstructured. Similarly, IBM estimates that by 2020 medical data is expected to double every 73 days. In 2007 IBM produced a supercomputer that won a quiz contest against 2 former Jeopardy winners, the US game show that covers a vast array of human knowledge. This supercomputer was later adapted to be used in medical diagnosis, to much critical acclaim.

Throughout the book, reference has repeatedly been made to the importance of security and privacy – it’s discussed in greater detail here, in relation to patient data. Techniques such as encryption and anonymising data are explained, together with some high-profile hacking attacks.

Chapter 6 Big data, big business

Perhaps the industry with the greatest use for Big Data, from a profit perspective, is business itself. Some history of computing and business is given, including the adoption of PCs, email, word processing, spreadsheets, and the phenomenal growth of eCommerce. The use of social media as a means of sentiment analysis and its potential impact of business is highlighted.

Various giants of the online world are examined. Amazon, Google, Facebook, eBay, and Netflix all produce vast quantities of data that require Big Data analytical techniques. eBay alone generates 50 Tb of data daily. An example is provided of how a Recommender System, that examines products purchased, can be used for subsequent targeted recommendations.

The chapter ends with a look at the relatively new field of Data Science, this rapidly growing and innovative area is experiencing a skills shortage, and consequently rewards can be high.

Chapter 7 Big data security and the Snowden case

This section of the book looks at security and uses the Snowden case largely as its example. Various elements of security are discussed including: digital signatures, firewalls and encryption. This is followed by details of various hacking attacks (e.g. Target, Home Depot, and Yahoo). It’s notable that Yahoo’s 2016 hack involved more than 1 billion users.

Edward Snowden stole many classified documents from the US National Security Agency (NSA). Of the 1.5 million documents, about 200,000 have been released, causing damage and dismay. Additionally, some of these documents showed that the NSA had been illegally spying on US citizens. The case highlights the astonishingly bad security measures that existed at this security agency – many of these are briefly discussed. Snowden has been both praised and rebuked for what he did.

Similarly, WikiLeaks is examined, this site’s leader is Julian Assange. Bradley Manning stole classified documents relating to both the Iraq and Afghan wars and passed them to the WikiLeaks site.

The general reaction in the case of Snowden has been both praise and disapproval. The reaction is probably tempered because documents relating to identifiable individuals were limited. In the case of WikiLeaks, this filtering seems not to have been done, attracting a more negative response. In both cases, awards have either been proposed or given (e.g. Nobel Prize for Peace nominations).

The chapter ends with a brief overview of TOR and the Dark Web, which while offering anonymity, is also the home for many cyber criminals and their nefarious dealings.

While this chapter is fascinating and contentious, I wonder if it strays off topic. The documents are not Big Data, but their content, and how it was gathered, would have involved Big Data techniques.

Chapter 8 Big data and society

This chapter opens with a look at the impact of technology on job loss. In the 1930s Keynes thought machines would do the laborious work in the future, and the problem would be how to use leisure time (after a working week of 15 hours). I guess it highlights how difficult it is to predict the future.

The expected impact of the Internet of Things (IoT) is briefly examined, together with smart vehicles, smart homes, and smart cities. The remote control of these items also highlights their susceptibility to hacking attacks. Accordingly, various attacks are described, included taking over a moving car.

The chapter ends by highlighting Big Data is here to stay, it offers many potential benefits, if we can prevent its abuse.

Appendices

Appendices are included for: Data Sizes (Bit to Yottabyte), and ASCII table for lower case letters. Additionally, there’s a very useful chapter-based Further Reading list provided – it might have been useful to annotate this, saying why the item is important.

Conclusion

This book aims to introduce the general audience to Big Data, and succeeds. The book is easy to read, interesting, detailed, wide-ranging, and not a single word is wasted. Hopefully it will give you some ideas on how Big Data might be useful for your own work.

If you come from a programming background, you might be disappointed in the content, however this is not a Big Data programming book (there are many of these, please see Reading Your Way Into Big Data for further info). On my initial read (several months ago) I was a little disappointed, I was expecting a more technical book, with an emphasis on distributed processing, Hadoop and Spark. However, the fault lies with me – this book is for the general reader.

I have a small confession… I absolutely love this series of books, they are aimed at taking you from no real knowledge of the topic to perhaps first year undergraduate level. There are currently around 625 books in the series (a few more are added each month). I have around 100 of them, with about 70 still to read. They cover a diverse range of topics, including:

  • Arts and Humanities

  • Dictionaries and Reference

  • Law

  • Medicine and Health

  • Science and Mathematics

  • Social Sciences

I would encourage to explore the series website.

There is, of course, a lot more to say about Big Data, but this book covers a lot of ground in only 112 packed pages. If you would like a general and interesting introduction to Big Data, what it is, how its used, together with some real social issues, I highly recommend this small book.

 

To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.

Banner


Discovering Modern C++, 2nd Ed

Author: Peter Gottschling
Publisher: Addison-Wesley
Pages: 576
ISBN: 978-0136677642
Print: 0136677649
Kindle: ‎ B09HTJRJ3V
Audience: C++ developers
Rating: 5
Reviewer: Mike James

Modern C++ who would want to write anything else? Is this a suitable introduction for the rest of us?



Software Mistakes and Tradeoffs (Manning)

Author: Tomasz Lelek and Jon Skeet
Publisher: Manning
Date: June 2022
Pages: 426
ISBN: 978-1617299209
Print: 1617299200
Audience: C# developers
Rating: 4
Reviewer: Mike James
We all make mistakes - do you want to read about them?


More Reviews

 



Last Updated ( Tuesday, 16 October 2018 )