Viz and Math in Apache Zeppelin

For some years, I have been going back an forth between Spark-Notebook and Apache Zeppelin for different use-cases. Already 2 years ago, I made a little comparison of the two technologies. I used Spark-Notebook in order to do develop the...

Data Intuitive at Spark Summit 2016

Short version No, I’m not at the spark summit in Brussels. Time is not on my side, too much things to do. But, being there is not necessary when other people make sure they do the publicity for you. Thank...

Bullet Proof Data Science in Scala

In this post, we go over some typical aspects and challenges that occur in typical data science projects in order to extract some requirements for data analysis in the broad sense of the word. We then illustrate how we tackle...

A Practical Approach to Model Error

In this post, I want to get a better sense of the effects of model error by simulating a very simple model for the spreading of a virus. It’s based on an analysis performed by Nassim Taleb. I used the...

Thoughts on the Spark Summit Europe 2015

In this post, I summarize some of the things I picked up at the Spark Summit. Some of these require more thought or research, but at least I’ll have a to do list of things to look back at. In...

New site after two weeks

I’ve been running for about 2 weeks on Jekyll now. I spent about 2 hours during that week cleaning out some glitches and doing some more tuning. Permalinks The permalinks where not as I wanted them to be, and they...

New website location and layout

I’ve long been tempted to recreate my website using Jekyll. I had already been experimenting with it for a course website. But now I finally made the switch for my homepage. For the migration of the content from Wordpress to...

Code Snippet Repository

I’m jumping between Scala/Spark coding, some Javascript in between, Python/PySpark and then some R every now and then. This in itself is already a challenge, but the worst thing is that I frequently encounter situations where I think: I’ve encountered this...

Transposing a Spark RDD

I have been using Spark quite a lot for the last year. At first using the Scala interface, but lately more using the Python one. In one of my recent projects, I received a dataset that contains expression profiles of...

Writing Workflow and Reproducible Data Analysis

I’ve been writing about my writing workflow before. Since some aspects of it are related to reproducible research and especially reproducible data analysis, I have collected some material and tips in a presentation I gave last week on my Github:...