Wednesday, September 22, 2010

Experiences from my first published project

Yesterday I reached a point where I felt comfortable with my implementation of a Double Array Trie, so I uploaded it to github. It was an instructive experience, one I plan to repeat ASAP with some
other ideas I have about useful libraries. Here is a superficial report of the things I learned during the last four days.

The project
The code started its life as a component of a larger project, while I was working  at the department of Informatics at the Athens University of Economics and Business. We implemented an algorithm known as Context Tree Weighting, a very effective solution to the problem of finding the most probable model that describes a data stream. The details are not important, however one of the problems of the implementation was that we needed to store information for every prefix of a string for a large number of very large strings (the performance of the algorithm is more or less proportional to the length of the strings stored). Since the initial implementation was in Matlab (which has no concept of pointers) we needed to implement a trie structure using arrays. I was directed towards the solution of the double array trie, I found an explanation of the algorithm at Theppitak Karoonboonyanan's site and I got down to coding. However, memory utilization and performance suffered (whether it was Matlab's fault or mine still is unclear to me) so, with C being out of the question due to portability reasons, I chose Java to re-implement the solution. The end result was functional and relatively stable, but, this being an academic project and all, had no relation to proper engineering. It was blazingly fast though and, what is more, extremely fun to code.

After school
Ever since this project ended, I wanted to release this code to the public, mostly as brag rights, since it was a difficult piece of coding. However, working in a software house does not leave much time off so it was postponed indefinitely. The opportunity came disguised as the bankruptcy of my employer, leaving me all the time in the world to work on everything I want. So, after some time on doing nothing at all, this code was unburied from a heap of bits known as the "old code archive" and I got down to some coding.

Experience applied
I have looked at and worked on a lot of public code, studying, measuring and solving (or even introducing) bugs. I have some expectations when it comes to quality, both of the code and of the documentation, but I also understand what over-engineering is (and the tendency I have to indulge into it). I reached the conclusion that the mess of a code I had at the time did not qualify for coding quality, since although it was not wrong, it was un-unravelable. Also, it was so tightly bound to the CTW algorithm that was useless.

Lessons learned
The fact that I knew the code proved the most difficult thing to overcome. I knew it was a mess but I could not see it. Finally I managed to prevail by trying to make myself use the code for something that it was not built for but users will expect to be available. This lead to implementing interfaces, providing assertions and test cases, separating concerns and adding a degree of modularity that allows for simple extensions as well as core rewrites. I learned to implement what I expect to see in other people's code, a task easier said than done. And loads of comments.
Another point was finding a hosting solution. I ended up using github, which meant I needed to learn a new VCS complete with integration with eclipse, my workflow and mindset. Nothing difficult but it was a new experience.
Finally, now I have a constant nagging feeling that comes from the things I haven't done correctly. I have an empty wiki, I know I have to provide usage examples, I know I must do some performance comparison with another implementation, I know there is not complete test coverage. In other words, I know I will never reach perfection. I knew however that if I expected to reach what I perceive as satisfactory I would never do what I have now done. I guess this is the biggest lesson I have learned from this small endeavour. To be realistic.

On to something bigger now...

No comments:

Post a Comment