Is the hype around data science / big data over?
Every innovation has a hype cycle. The curve of interest rises long before the product development is complete and it goes through a hype peak, when it seems it can do no wrong, through a through of despond when all the people who hate it have open-season and eventually it settles down to practical uses.
Datascience went through all of this a decade or two ago. For a while it was the answer to everything, then the haters started blaming it for everything that ever went wrong. It came through and has been at the core of business for a decade now. CEOs who used to apply datascience like a magic potion to fix their broken companies have still gone bust, but many more have profited from it and built better businesses, opened up new opportunities and done a better job for their users and customers.
The test of this is how the big term has broken into several pieces, all with their own hype cycles and futures. AI is in the trough of despond currently with academics telling us how it will put everyone out of work and cause the end of the world. Remove the word artificial and it makes real sense, however, empowering everything we do with an intelligent layer. Machine Learning has moved through and broken into applications - computer vision etc. - which are making big positive differences. Others - quantum, nano etc. are still on their way up the hype curve - lauded by many for things they will probably never achieve, while ignoring the things they will.
The hype has gone - datascience can stand on its own feet now.
I don't know. It's a tricky thing, because data sets are growing very quickly and require different tools to make sense of them, but it isn't necessarily automatically the case that it translates immediately into commercial value, and I am sure that there will be some disappointments ahead.
The real benefit of technology often comes when people have adapted the way they do business to what's newly possible but it takes time, and the payoff is much higher in some fields than others.
I am convinced by the chief economist of GE that the industrial Internet of things is something real that often has a quite short payback time. (and obviously the way that it's useful involves being able to make sense of larger data sets). In some other fields it is still so early.
In my own area of finance, outside of high frequency data and text data, we really don't have big data. But we have data sets that are bigger than those we are set up to be able to make sense of easily, and that requires a different approach. Is it worth doing? I think it will be, if one is careful not to spend too much on things before they are proven, but it's no panacea.
People talk about analysis. That's the last thing we need to emphasise. Analysis means breaking things into pieces to better understand them (see Stanford Dictionary of Philosophy). But you can never get to an understanding of the whole by adding up the pieces - it's important not to forget that insight transcends mere statistical results. I think the new tools are most useful in understanding and exploring problems rather than being depended on as a substitute for having to think and make judgments.
Finally, it's the web guys that have the glamour and influence, and because their edge comes from data more than techniques and code they speak more publicly about things. One should strive to remember that ones own field may be quite different and have distinct considerations, challenges, and appropriate approaches. It's easy to forget that, but anything useful needs to be adapted to the problem domain.
No. Unless you consider data science to be something like analyzing the data in excel and then giving insights about statistical distribution. (Of course inferring the data is one of the prerequisite for data science) This is not a hype. I deal with billions of rows everyday for building model. We are beginning to see new dawn of data explosion. It's not hype anymore as big hospitals in U.S. rely on sophisticated next generation sequencing machines to sequence genomes for hundreds of thousands of patients. For eg: It's now possible to sequence the genome of individuals for less than 1000$. Each genomic data takes around 200GB to store in memory. If I can sequence genome of millions of patients and store in cloud, I can build predictive model on top of it that can be used to identify mutations responsible for cancer at an early stage. There is so much that big data and data science has to offer in the area of healthcare industry. Please read more on how genetic testing for BRCA1 mutations saves lives of millions of women from breast cancer every year.
Big data was frankly a term with very ambiguous meaning. Data science though has a more clear cut definition. To be frank data science is just a combination of computer science and statistics. But both of these terms seem to be used less and less in actual business these days. What is gaining more significance is AI and ML.
AI takes a fundamentally different approach than data science. AI focuses on solving a problem by making the machine more intelligent. It may work with large amount of data or small amount of data or no data at all. ML is a subset of AI where the machine learns from data over time. Because of this definition of ML, it necessarily relies on data. Thus there as an intersection between machine learning and data science. The difference is that data science relies too much on human analysis whereas ML eliminates this need.
Tl;dr: The hype around data science/big data is definitely going down and is partly being replaced by hype around AI and ML
As long as we have all this Data, Big Data and Data Science will always be a pertinent topic - as we progress into the future there are three huge intrinsic problems we have to deal with:
- Security (How to secure all of our data and keep it away from the wrong hands, there are some cutting edge startups working on this, one of these is Barricade - www.barricade.io
- Analysis (How do we analyse all this data and extract the most we can out of it, How do we make it searchable and what is the best way to organise it) Google is a leader in this direction.
- Storage How can we keep so much data, where can we store, how can we compress it and what are some sustainable energy processes for storing it.
Developing and Enterprising with these three areas is going to be the biggest foreseeable phase yet to come in the technological world and they all have one thing in common, they are all intrinsic to Big Data & Data Science.
Big Data is in no way a 'fad'
Hope this helps.
I'm not sure if this is a proper question as it's like asking whether the hype surrounding applied statistics is over. Data science and the resulting big data methods is simply a recent evolution of technologies that have been around for decades; they just have different names now. Personally, I like telling people I am a "Data Scientist" rather than what I had to say 10 years ago - a computer scientist/applied mathematician/statistician. It is easier to understand and describe. I do understand that the industry hype-cycle has commandeered these terms so that many wonder if they will go the way of the Dodo. Possibly so, but it would be just another re-naming or re-purposing. Machine learning isn't going to disappear once the so-called hype diminishes.