Final stretch


We have made quite a lot of progress since our last post. We have completed the crawler and all the functionality for our proof-of-concept is in place. The crawler can now crawl PubMed in its entirety and compare the words in the abstracts of the articles to search words in our database, and if there are matches, the metadata of the potentially interesting articles will be saved in to our database.

The most challenging aspect of this project in terms of future development lies in optimisation. The bottleneck of the crawler is in the word checking function where every word of the abstract is compared against a list of thousands of words. Granted, our cloud server is quite limited in processing power, so if we were to upgrade the CPU on our server, crawling performance would surely be improved. A distributed processing system would also be something worth considering.

In terms of the schedule, our project is now only a couple of weeks away from completion. The last week or so we have worked on refactoring and documenting our code and we have decided to dedicate the remaining hours of the project to further documentation and testing. We plan on delivering a well-documented and comprehensive product to our customer, which should hopefully make future development for new developers a breeze.

We will write one more devblog post at the very end of our project in late April to wrap everything up.

Until then, we wish you all a nice spring.


Jussi, Joni, Joona & Miikka


  1. Esko Kauppinen

    Great news and refreshing to see a programming project going ahead so smoothly.

    Nice spring to all of you.

  2. Tero Nieminen

    Many thanks for your work until now. I appreciate it very much.
    Nice spring also to You and hope to hear of that project again.


Leave a Comment

Your email address will not be published. Required fields are marked *