mercoledì 8 maggio 2019

Voxxed Days Milan 2019 review

Finally I found few minutes to share my impressions after attending the first Voxxed Days event in Italy, which happened in Milan on April 13th 2019.



I was one of the speakers there: my talk was about Deep Learning on Apache Spark with DeepLearning4J (a follow up of some topics from my book). There were 3 sessions in parallel. The level of the talks was really high and it was hard for me and any other participant to choose which one to follow at a given time slot. The good news is that all of the sessions have been recorded and yesterday the first videos (those from the main session) have been published on YouTube. Once they will be online, I suggest you to watch all of the videos you can, but here are some suggestions among those I had a chance to attend in person at the event. I put my comments to a minimum to reduce spoiling ;)

Opening key note by Mario Fusco: he was the main organizer of the event. In the opening key note he presented the agenda. He recently wrote a book for Manning. In the late afternoon he signed and gave some copies of his book for free and was available for attendees' questions.

Key note by Holly Cummins, The importance of fun in the workplace: the title says everything, the content was brilliant. Highly recommended.

Boosting your applications with distributed caches/datagrids by Katia Aresti: I really enjoyed the talk even if I am definitely not a fan of the Harry Potter's saga (all of Katia's examples referred to characters and/or situations from those books). But if someone mentions reactive microservices and Vert.x I can bear Harry Potter's stuff too :)))

Performance tuning Twitter services with Graal and Machine Learning by Chris Thalinger: just in case you're among those people that still don't believe that Machine Learning could help you from a DevOps perspective to improve tuning and performance of your applications/services. A real world use case from Twitter.

Concurrent Garbage Collectors: ZGC & Shenandoah by Simone Bordet: a detailed overview of the new Java 11 and 12 Garbage Collectors. Simone goes very deep on this topic. If you, like me, didn't have a chance yet to play with the latest 2 Java major releases, would find this talk very informative.

Interaction Protocols: It's all about good manners by Martin Thompson: an interesting history of distributed systems protocols and their quality attributes. A more philosophical than technical talk, but absolutely enjoyable.

Not only the talks were fantastic. I really enjoyed the networking with the organizers, other speakers and participants and I was positively impressed also by the very high level of questions raised by attendees during the Q&A sessions. Definitely an overall ultra positive experience.

sabato 4 maggio 2019

The Kubernetes Spark operator in OpenShift Origin (Part 1)

This series is about the Kubernetes Spark operator by Radanalytics.io on OpenShift Origin. It is an Open Source operator to manage Apache Spark clusters and applications.
In order to deploy the operator on OpenShift Origin, the first time you need to clone the GitHub repository for it:

git clone https://github.com/radanalyticsio/spark-operator.git

Then login to the cluster using the OpenShift command-line oc:

oc login -u <username>:<password>

Assuming, like in the OpenShift Origin environments me and my teams used to work, that developers don't have permissions to create CRDs, you need to use Config Maps, so you have to create the operator using the operator-com.yaml file provided in the cloned repo:

oc apply -f manifest/operator-cm.yaml

The output of the command above should be like the following:

serviceaccount/spark-operator created
role.rbac.authorization.k8s.io/edit-resources created
rolebinding.rbac.authorization.k8s.io/spark-operator-edit-resources created
deployment.apps/spark-operator created


Once the operator has been successfully created, you can try to create your first cluster. Select the specific project you want to use:

oc project <project_name>

and then create a small Spark cluster (1 master and 2 workers) using the example file for ConfigMaps available in the cloned repo:

oc apply -f examples/cluster-cm.yaml

Here's the content of that file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-spark-cluster
  labels:
    radanalytics.io/kind: SparkCluster
data:
  config: |-
    worker:
      instances: "2"
    master:
      instances: "1"


The output of the above command is:

configmap/my-spark-cluster created

After the successful creation of the cluster, looking at the OpenShift web UI, the situation should be:



To access the Spark Web UI, you need to create a route for it. It is possible to do so through the OpenShift Origin UI by selecting the Spark service and then clicking on the route link. Once the route has been created, the Spark web UI for the master (see figure below) and the workers would be accessible from outside OpenShift.



You can now use the Spark cluster. You could start testing it by entering the master pod console, starting a Scala Spark shell there and executing some code:



In the second part of this series we are going to explore the implementation and configuration details for the Spark operator before moving to the Spark applications management.

mercoledì 1 maggio 2019

Installing Minishift on Windows 10 Home

Minishift is a tool to run OpenShift Origin locally as a single node cluster inside a Virtual Machine. It is a good choice for development or doing PoCs locally before deploying things in a real OpenShift cluster. In this post I am going to explain how to install and run it on a Windows 10 Home machine, where no Hyper-V support is available.
The only available alternative to Hyper-V is Oracle VirtualBox. You need to install it before going on with the Minishift installation: follow the instructions in the official website or use the Chocolatey package manager.
If you don't have Chocolatey in the destination machine, you can install it by opening an Admin PowerShell and first checking which execution policy is set, by running the Get-ExecutionPolicy command. If it returns Restricted, then install by executing:

Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))

The fastest and secure way to install Minishift is through Chocolatey as well. From an Admin PowerShell execute:

choco install -y minishift

At the end of the installation process, double check that everything was fine by executing:

minishift version

This command should print the version of the installed Minishift.
You need now to install, using Chocolatey as well, kubectl, the Kubernetes command-line tool:

choco install -y kubernetes-cli

Finally, you need to install oc, the OpenShift command-line tool (using Chocolatey of course):

choco install -y openshift-cli

Before starting the cluster, set up VirtualBox as default driver for Minishift:

minishift config set vm-driver virtualbox

You can now start Minishift:

minishift start

The first time the start would be slower as the ISO image needs to be downloaded.
Once the server is up and running, you can access the web UI at the following URL:

https://<minishift_ip>:8443/console



Login using developer as username and any value as password. I suggest to use Firefox or Chrome for the web UI.
You can also login using the oc client.
Minishift is now ready to be used. Enjoy it!

giovedì 11 aprile 2019

See you tonight at the ODSC Dublin Meetup!

I hope to see you tonight at the April ODSC Dublin Meetup @ Jet.com in 40 Molesworth St. I am going to be the second speaker for the night. I am going to talk about importing pre-trained Keras and TensorFlow models into DL4J and the possibility of re-training them on Apache Spark.
The first speaker would be John Kane from Cogito.




lunedì 4 febbraio 2019

The book is finally available on Packt!

My book "Hands-on Deep Learning with Apache Spark" is finally available on Packt. Here's the final cover:



This is the book content:
1: THE APACHE SPARK ECOSYSTEM
2: DEEP LEARNING BASICS
3: EXTRACT, TRANSFORM, LOAD
4: STREAMING
5: CONVOLUTIONAL NEURAL NETWORKS
6: RECURRENT NEURAL NETWORKS
7: TRAINING NEURAL NETWORKS WITH SPARK
8: MONITORING AND DEBUGGING NEURAL NETWORK TRAINING
9: INTERPRETING NEURAL NETWORK OUTPUT
10: DEPLOYING ON A DISTRIBUTED SYSTEM
11: NLP BASICS
12: TEXTUAL ANALYSIS AND DEEP LEARNING
13: CONVOLUTION
14: IMAGE CLASSIFICATION
15: WHAT'S NEXT FOR DEEP LEARNING?

DeepLearning4J (Scala), but also Keras and TensorFlow (Python) are the reference frameworks.
More topics on Deep Learning on the JVM and Spark would be covered in the next months in this blog.