As data is exponentially growing in organizations, there is an increasing need to consolidate silos of information into a single source of truth, a Data Lake to feed hungry Analytics and Machine Learning Engines that can gather insight at scale. In this workshop, we will detail how to architect data infrastructure services using Red Hat OpenShift, Ceph Storage, Spark, Kafka, Prometheus, and Seldon.
As part of the workshop, we will deploy the Open Data Hub and cover the entire end-to-end infrastructure. Some of the things we will demonstrate are:
- Deployment of a Ceph S3 instance for storage of data
- Usage of Jupyter notebooks and spark to interact with data sets using the S3A filesystem client on different clouds.
- Using Spark schema detection and SparkSQL to query and transform stored data.
- Deployment of a simple model as well as the supporting messaging tools for interacting with the model.
- Monitoring of the entire infrastructure as well as deployed models with Prometheus and Grafana.