Blog Name Description

Mining data, extracting knowledge, driving improvement...

postgres + python + jdbc = magic

I just happened to have a a really crazy bug that just deserves a small post.

After changing something completely unrelated in a Farm Management ERP system, my colleague Christine said that suddenly the test for the economic report wasn't working anymore.

The economic report is a function living on postgres written in plypthonu (= postgres compatible python) and I came to be the owner of that part of the code.

Since it basically aggregates all the data in the DB, it is kind of sensitive to changes, so I thought, well - let's take a look.

It turns out that a line in a query within the python code was suddenly different.

Something like
  JOIN {type_info} ON ... 
became
  JOIN TIME ypeinfo ON ... 
for no obvious reason.

It was still sound and correct in git, the IDE, ... and it just was different in the DB (yes, not even in the source SQL it was wrong).

Moreover, it could have decided to become "altered" for a few weeks (= the code was a few weeks old), yet it only did so on that day.

After all lots of confusion, I could "fix" the error by renaming the variable {type_info} to {xtype_info} (and another a {type_column} to {xtype_column}).

If you know anything about this other than that something seems to replace "{t(.*)}" by "TIME $1", please let me know!

PS: After thinking a little, I suppose it's neither the fault of postgres nor of python because the problem only occurred in our tests, where all the SQL was transmitted via postgres driver + JDBC. Via pgadmin I could not reproduce the problem.

Some thoughts on Bitcoin

When it comes to the topic of Bitcoin there are lots of opinions and even more opinions that claim to be the right ones.

I will take myself a more humble approach and try to explain what I think and why I think it and how this led me to my current of affairs with respect to bitcoin.

The short story is: I ignored it, I studied it, I bought, I sold, I won money, I lost money.

The long story goes like this:

I ignored


I knew of the existence of Bitcoin, mostly from HackerNews, but I didn't really take it serious. There's nothing more to say other than I began interesting myself when the price was above a few dollars (more like ~ 100$).
This was not so much because I wanted to gamble or I felt left out but rather because I tend to think "If people bet money on it, there could be something to it".
This is not to say that there is always something to it and it depends to a great deal who is smarter: you or those people? In any case, I could not afford an opinion at that time because I basically knew nothing. Since bitcoin is quite a libertarian project, I thought that I should know about this and while I'm at it, also know enough to either bet for or against it. (As a side note, I believe this is the only way of knowing whether an opinion is serious, ask the issuer of the opinion to bet on it)
Furthermore, IT is one of my strength's so I suppose I'll have a clearer view on the things and have an advantage, at least when compared to mainstream (i.e., dumb) money.

I studied


So, equipped with some Mathematical understanding and a good knowledge of Game Theory (which I think is the fundamental theory to understand economics, politics and evolution) I took the white paper of Satoshi Nakamoto and read it a few times. I remember that it took me approximately a week from first reading the paper to actually taking a decision.

My main concerns were:
  • Cui bono? (Why should anyone invest? Are there other entities who benefit other than holders of BTC?)
  • What are the IT risks (What if the software has a bug or a weakness? What if the network is partitioned? What if it turns out that you need to change an aspect of the protocol?)
  • Who would dislike bitcoin? (governments, banks, ISIS, North Korea, the NSA, ... you get the picture)
  • How resilient is bitcoin? (What can dislikers do against it?)
The short answers are:
  • You invest because the number of bitcoins is fixed. Only holders of bitcoin benefit from your investing.
  • Whatever is necessary can be decided by entities either holding bitcoin or investing big money in order to obtain bitcoin.
  • Well, I am not yet 100% sure of this answer. I suppose the biggest dislikers are those who feel they have something to lose. Initially, these were entities that issue money (Central Banks and indirectly governments). Hackers actually don't dislike Bitcoin because after, e.g., stealing a few coins, they want to keep the system in place in order to sell them. Think of it this way: It was never so easy to directly steal money via the internetz.
  • It turns out Bitcoin is quite resilient because it spans the globe and even when forbidding it in every single country of the world it will live on just as crime and drugs do. So unless one switches off the internet forever, it's kind of hard to effectively get rid of it.
 Of course, there's much more to say but I will get into detail only later on after I finished the complete story.

Tools of the trade and obtaining data

Before venturing into the quest of value, it is crucial to have some data with which we can work.

There are often example datasets for various purposes but they are not comparable to real world data because they are clean and they contain no errors.
Instead, we'll settle for some simple statistics on top of time series obtained by yahoo finance. My tool of choice is clojure and incanter here and by the end of this post you will understand why (hopefully, cause otherwise I did a bad job).

You can find the complete repository here

I'll create a new project called finstats which will allow for a quick analysis of some stock available on yahoo finance 
 lein new finstats
First, we need to fetch the data from yahoo with something like this We need to create the following file in src/yahoo.clj as given in Let's make finstats.yahoo accessible by modifying the first two lines of core.clj to

(ns finstats.core
  (:require [finstats.yahoo :as y]))
In this way, we can ask the REPL:

finstats.core> (y/create-http-request "SIE.DE" [2012 1 1] [2013 1 1])
"http://ichart.finance.yahoo.com/table.csv?g=d&ignore=.csv&s=SIE.DE&c=2012&b=1&a=1&f=2013&e=1&d=1"
Looks like a valid URL to me! Since we're in clojure, we can easily request the data and save it via

finstats.core> (def raw-data (slurp (y/create-http-request "SIE.DE" [2012 1 1] [2013 1 1])))
Now contains a big string with all the data we need (Siemens EOD data for 2012 - well, almost)

Of course, we can't do a whole lot with a big string, so we need to parse it.

There are nice libraries but it's so easy, we might as well just do it on our own: Add the following function to finstats.yahoo

(defn parse-string
  [s]
  {:pre [(string? s)]}
  (->> (str/split-lines s) ;; splits by newline
       rest ;; drops the first line (the header)
       (map #(str/split % #",")) ;; splits each line by comma
       (map #(cons (first %) (map read-string (rest %)))) ;; parses each element of the row except the first
       (mapv #(zipmap [:date :open :high :low :close :volume :adjusted-close] %)))
       
  )
We can test again the function by calling

finstats.core> (y/parse-string raw-data)
[{:adjusted-close 72.5961, :volume 2979000, :close 77.4255, :low 77.1348, :high 78.5883, :open 78.4042, :date "2013-02-01"} {:adjusted-close 73.4501, :volume 233990 ...
Excellent!

Let's save the result in variable ohlcs (open-high-low-close with a plural s at the end) with

finstats.core> (def ohlcs (y/parse-string raw-data))
#'finstats.core/ohlcs
... and test obtaining data via

finstats.core> (first ohlcs)
{:adjusted-close 63.9867, :volume 4775100, :close 70.8652, :low 69.4504, :high 71.5725, :open 69.6054, :date "2012-02-01"}
... and

finstats.core> (get-in ohlcs [100 :volume])
2163600
to get the volume of the 100th day.

Note that at this point we have full access to the EOD API of yahoo with a mere 23 lines of code.

To finish the post, I just want to have a nice bar chart that I can use to marvel at price development of Siemens. For this purpose, include the line

[incanter/incanter "1.5.5"] 
in the project.clj dependencies and modify the ns declaration of finstats.core to

(ns finstats.core
  (:require [finstats.yahoo :as y]
            [incanter.charts :as charts]
            [incanter.core :as ic]))

Now you can run

finstats.core> (ic/view (charts/bar-chart (map :date ohlcs) (map :close ohlcs)))
and a nice chart should pop up.

Note that yahoo returned the data the otherway round, so this chart must be read from right to left. A (def ohlcs (vec (rseq ohlcs))) will help here.

 The end result should look like this:
However, bar charts are usually lame for this type of data, so we should opt rather for candle-sticks: Add the following function to finstats.core:

(defn- create-dataset
  "creates an incanter dataset from ohlc data"
  [rows]
  {:pre [(every? map? rows)]}
  (let [ks (keys (first rows))]
    (ic/dataset ks
                (map (apply juxt ks) rows))))
and run

(ic/view (charts/candle-stick-plot :data (create-dataset ohlcs)))
which should give you a nice candle stick chart like the one here:
Later we'll see how we can actually obtain real value instead of just some visual bling bling.

Value in Data

The information age is now at least one decade old but businesses have not yet adapted.
Well, those in the Silicon Valley may have but not those in Toulouse, Utrecht, Luxembourg, Florence or Frankfurt.

While most of the companies are sitting on treasures of data, few know there is additional value in it. And even fewer know how to do obtain that value.
Since I'm doing something like this (transforming data into value) for a few years now, I thought I might as well write about it in the hope to help the interested.

The tools to unearth this value are
  • a business man's mind: knowing where there could be value, who would benefit and how, how big the impact is and how big the value of the new information is in relation to the cost of making it accessible. Ultimately, it's about the question: Is it worth it?
  • a researcher's mind: the treasures are not hidden all in finance or car manufacturing. Instead, you must adapt to each new domain, find out where there are opportunities and find the correct way how to build on these opportunities. Curiosity and life-long learning are indispensable traits on this quest. And mathematics and statistics a steady companion.
  • a developer's mind: rarely you will find the diamond that you can immediately use. Instead, you will have to transform the data, combine it with data from somewhere else, often in a live system, and produce your analysis or predictions continuously, using web services, cloud technology, databases and the like.
In the coming weeks, I will start to write a little bit more about each of these topics.