Dale Russell, CTO, Talksum
As our understanding of data science problems evolves, we find that effective solutions apply a systematic approach to testing, measuring, and building knowledge of the whole data system. In order to effectively and efficiently create this holistic view of data, first consider the entirety of the data landscape from Infrastructure to Layer 7. A comprehensive data science solution should not have biased access to data from any one layer more than another. When architecting a solution, keep in mind that business requirements will change, message types and objects will change, and the volume of data from various OSI layers will change, especially as the Internet of Things (IoT) becomes more of a reality.
To best deal with an ever-changing data landscape, follow this important principle: Never leave work for a downstream process. Datasets will continue to grow in volume and diversity, and solutions will be expected to take less time to process data or make it actionable. Store-and-sort is a costly strategy regardless of who owns the infrastructure. We found the best approach is to sort first, then store.
Over the last 15 years, exceptional and innovative storage solutions have been developed utilizing distributed software and socket libraries and advanced cloud services. These come with substantial performance increases, benefiting data center environments where concerns about latency, growing storage, or increased demand for analytics on datasets arise. As innovations in this sector brings more data into your landscape, you can enable great data science by taking a broader approach.
While some solutions focus on a subset of problems, a great data science solution deals with the entirety of information across the data landscape. In working with our customers and partners, we found that any acceptable solution must not only accommodate changing data requirements, it must do so in a manner that maintains the highest level of data fidelity. If new analytical processes are created, the solution should easily direct the correct data streams to new processes without a lot of work for your team.
A proper data science solution empowers the organization to focus on asking forward-looking questions of their data, not requiring them to constantly invest time searching for new data solutions every time the data landscape changes (as it will continue to do).