Final stretch

Hello!

We have made quite a lot of progress since our last post. We have completed the crawler and all the functionality for our proof-of-concept is in place. The crawler can now crawl PubMed in its entirety and compare the words in the abstracts of the articles to search words in our database, and if there are matches, the metadata of the potentially interesting articles will be saved in to our database.

The most challenging aspect of this project in terms of future development lies in optimisation. The bottleneck of the crawler is in the word checking function where every word of the abstract is compared against a list of thousands of words. Granted, our cloud server is quite limited in processing power, so if we were to upgrade the CPU on our server, crawling performance would surely be improved. A distributed processing system would also be something worth considering.

In terms of the schedule, our project is now only a couple of weeks away from completion. The last week or so we have worked on refactoring and documenting our code and we have decided to dedicate the remaining hours of the project to further documentation and testing. We plan on delivering a well-documented and comprehensive product to our customer, which should hopefully make future development for new developers a breeze.

We will write one more devblog post at the very end of our project in late April to wrap everything up.

Until then, we wish you all a nice spring.

Regards,

Jussi, Joni, Joona & Miikka

2 Comments

Esko Kauppinen April 10, 2015 at 11:27

Great news and refreshing to see a programming project going ahead so smoothly.

Nice spring to all of you.

Reply ↓
Tero Nieminen May 8, 2015 at 09:19

Many thanks for your work until now. I appreciate it very much.
Nice spring also to You and hope to hear of that project again.
Tero

Reply ↓

Final stretch

2 Comments

Leave a Comment Cancel reply