The following post is adapted from something that I shared in my workplace. We are a medium-sized SME that provides a ‘data-analysis’ service to our customers around issues of website quality and digital governance. We don’t do ‘proper data science’. The document that I wrote up and shared around contained my thoughts on how we could profitably begin doing data science, based on my rudimentary knowledge of the field.
What is data science?
Data science is, to quote the author and data scientist John Foreman, “The transformation of data using mathematics and statistics into valuable insights, decisions and products.”
Today the hot topic in data science is Big Data, and the two terms are often conflated. However, Big Data is a subset of data science. While a precise definition of Big Data is hard to come by, I believe it is safe to think of it as the application of data science techniques to complete data sets. Until about ten years ago, due to technological constraints, it was impossible to deal with complete data sets, and data scientists used random sampling to garner representative sets of a manageable size from the entire data set.
Why I have chosen to blog about this?
We don’t do data science in my workplace, so why did I choose to write about it? I wanted to write a blog post, and in terms of things that are directly relevant to our work, I didn’t feel there was much that I could write about that others in our dev function didn’t already know. I’m still only a junior developer 😦 Data science is something that I find very interesting personally, and I spend a bit of my spare time learning about it. We are moving to a new architecture that has a bit of a ‘Big Data’ flavour to it, and as we grow our client-base we are managing more and more data. Therefore I thought an introduction to data science may be of interest to my colleagues.
Data science that we’re currently doing
We currently do a lot of data analysis – as in we take in data about our clients’ web estates and run it through our proprietary analysis engine. It could be fairly argued that we are doing Big Data, since we use more or less all of our clients’ data as opposed to a small sample of it. However we are not doing data science as it is generally understood. We are not using techniques such as regression or clustering, for example, to derive insight from the data. We are merely analysing and commenting on the quality of the client’s web estate.
Now, we are not a data science company, and our clients do not pay us to provide them with insight gathered from the website data. However, I argued that we can gain internal value from doing data science that will help us to both serve our customers better and to run a tighter and more efficient ship.
Broader data science environment and opportunities
Big data and data science are big news at the moment, and have been for a while. How did this interest come about? Essentially, computing power and storage have become cheap, giving even small organisations the ability to store and access enormous data sets. This new access to such data sets led to the creation of new open-source technologies which are themselves hot topics in the tech community.
Many firms have used this new frontier in data science (enabled by cheap computing and storage) to gain competitive advantage. Amazon are probably the best known, having found ways to use their vast troves of data to provide their customers with targeted product recommendations based on the customers’ past purchases, as well as the customers’ similarity to other customers in Amazon’s data stores. However, as mentioned above, it is not only large organisations that can take advantage of the new frontier. Firms of all sizes, in all sorts of industries are profiting. This has been proven through empirical research, with MIT being just one institution that have evaluated the performance of data-driven companies against others. They have found conclusively that these companies perform better than their competitors, and this result holds for all sizes of companies across all industries. In short, data science is good news, and today there are more opportunities in data science than ever before.
You may be wondering, do all firms have enough data to make an investment in data science worthwhile? Perhaps there is a certain threshold of ‘data-wealth’ that a firm must lie above? Well, certainly it is true that you must have data in order to do data science. However, we are all more data rich than we probably realise (speaking of both organisations and individuals). There are many, many open APIs out there. Private firms such as Google and Twitter have made much of their data open and free to the rest of the world. Similarly, the US government has made vast amounts of its data open to the world. The UK government has followed suit. Theoretically (and realistically!), an organisation could profitably employ data science without any data of its own.
However, at my company we do have data, and lots of it. We have the data that we pull from our clients’ web properties, and we have data generated by our own systems. This data is tiny compared to the data held by the Twitters, Facebooks and Googles of this world, but it is significant. And this data may be more valuable than it seems at first. Information generated for one purpose can be reused for another purpose. Old datasets that seem obsolete on the surface can be merged with other datasets, old or new, to create new value.
What exactly can data science do for us?
In my next post I will explain some specific data science opportunities that I see for my company. For now, I will try to outline in broad terms what the field can bring to an organisation. Simply put, it can help us to solve business problems. Think of any of the problems and challenges that the average company faces. How can they make more sales? How can they retain more of their customers? How can they convince new and existing customers to pay more for their service? How can they have less system downtime? How can they provide a faster service to their customers? All of these problems, and many more, can be expressed as data science problems.
But that’s not all! Data science can also provide us valuable answers to questions that we didn’t even ask, and might never have thought to ask. Cool eh?