Notes on running spark-notebook
These days Docker makes it extremely easy to get started with virtually any application you like. At first I was a bit skeptical but over the last couple of months I have changed my mind. Now I strongly believe this is a game changer. Even more when it comes to Windows. Anyway, these days kitematic (GUI to manage docker images) allows you to simply pick the spark-notebook by Andy Petrella.
When running your docker host in VirtualBox, you still need to set up port forwarding for port 9000 (the notebook) and ports 4040 to 4050 (spark-ui) Assuming your docker host vm is named default:
VBoxManage modifyvm "default" --natpf1 "tcp-port9000,tcp,,9000,,9000"
These days Docker makes it extremely easy to get started with virtually any application you like. At first I was a bit skeptical but over the last couple of months I have changed my mind. Now I strongly believe this is a game changer. Even more when it comes to Windows. Anyway, these days kitematic (GUI to manage docker images) allows you to simply pick the spark-notebook by Andy Petrella.
When running your docker host in VirtualBox, you still need to set up port forwarding for port 9000 (the notebook) and ports 4040 to 4050 (spark-ui) Assuming your docker host vm is named default:
VBoxManage modifyvm "default" --natpf1 "tcp-port9000,tcp,,9000,,9000"
Now you can browse to http://localhost:9000 and start using your new notebook:
You may want to copy the default set of notebooks to a local directory:
docker cp $containerName:/opt/docker/notebooks /Users/timvw/notebooks
Using that local copy is just a few clicks away with Kitematic:
Offcourse you want to use additional packages such as spark-csv. This can be achieved by editting the your notebook metadata:
You simply need to add an entry to customDeps:
When your container did not shutdown correctly, you may end up in the awkward situation that your container believes that it is still running(). The following commands fix that:
docker start $containerName && docker exec -t -i $containerName /bin/rm /opt/docker/RUNNING_PID