Data science needs data...and most data scientists can't get enough of it! However, we'd rather not do the work to land and process it :) Data ingestion and pipelines are becoming increasingly important as the use of data science and business intelligence permeates business. After all, without robust and consistent delivery of data these business functions wouldn't exist. But how do you land and process terabytes of data per day? And how do you architect a platform that can elevate said data to products? Geoff Holmes and Robert Burcham will talk about how they have approached the problem of processing ginormous data.
High Volume Near-RT Data Processing in Telecom
Geoff Holmes
At Pinsight Media+, we ingest and process scores of terabytes of data every day through our Data Monetization Platform, creating value by generating datasets for internal analytics and by generating data insights products. A few months ago, we accepted the challenge of a taking a real-time stream of data - over 1M records/second - that needed to be cleaned, enriched, transformed, and packaged into (1) a near real-time insights data feed product for our customers and (2) multiple feeds to internal batch analytics systems. Our talk will share what we've learned as we successfully implemented a system that met that challenge. Specific areas to be addressed are:
Hadoop and Service Delivery in the Enterprise
Robert Burcham
Pinsight Media+ provides data monetization for carriers in the fourth wave and beyond. Using a federated Service Delivery Platform comprised of inter-operating elements both in the cloud and its own on-premises data center, Pinsight operates systems that collect and groom huge volumes and varieties of data intelligently broker ad requests from any supply to any demand participate in RTB relationships as both bidder and exchange intelligently promote and distribute content on millions of mobile devices empower end-users to manage their privacy and preferences.
The delivery of each of these services share the common dependency on secure, policy-compliant access to behavioral and profile data of various type and policy of use. At Pinsight, Hadoop forms the basis of of the system architecture that fulfills this need.
Rob will describe the architecture of Pinsight's Service Delivery Platform, the roles of the elements within it, and the role of the Hadoop based Data Management Platform at its center. Using Hadoop services including MR, TEZ, Spark, Hive, HBase, Kafka and Storm, Pinsight executes the daily business of data ingest, processing and promotion/delivery on the multi-terabyte scale.
High Volume Near-RT Data Processing in Telecom
Geoff Holmes
At Pinsight Media+, we ingest and process scores of terabytes of data every day through our Data Monetization Platform, creating value by generating datasets for internal analytics and by generating data insights products. A few months ago, we accepted the challenge of a taking a real-time stream of data - over 1M records/second - that needed to be cleaned, enriched, transformed, and packaged into (1) a near real-time insights data feed product for our customers and (2) multiple feeds to internal batch analytics systems. Our talk will share what we've learned as we successfully implemented a system that met that challenge. Specific areas to be addressed are:
- Architectures to consider when building a real-time insights workflow, from ingest to delivery
- So many dials to turn - tuning streaming data systems is way harder than batch systems
- Latency or throughput - which is most important?
Hadoop and Service Delivery in the Enterprise
Robert Burcham
Pinsight Media+ provides data monetization for carriers in the fourth wave and beyond. Using a federated Service Delivery Platform comprised of inter-operating elements both in the cloud and its own on-premises data center, Pinsight operates systems that collect and groom huge volumes and varieties of data intelligently broker ad requests from any supply to any demand participate in RTB relationships as both bidder and exchange intelligently promote and distribute content on millions of mobile devices empower end-users to manage their privacy and preferences.
The delivery of each of these services share the common dependency on secure, policy-compliant access to behavioral and profile data of various type and policy of use. At Pinsight, Hadoop forms the basis of of the system architecture that fulfills this need.
Rob will describe the architecture of Pinsight's Service Delivery Platform, the roles of the elements within it, and the role of the Hadoop based Data Management Platform at its center. Using Hadoop services including MR, TEZ, Spark, Hive, HBase, Kafka and Storm, Pinsight executes the daily business of data ingest, processing and promotion/delivery on the multi-terabyte scale.
0 Response to "August 23rd: Data Science KC Meeting"
Post a Comment