Categories
About ModularWebCrawler News

MWC Milestone 0.3 has been Released

Today is a good day :-), today I released Modular Web Crawler v0.3!

This milestone is a huge one and is very significant, it makes MWC from a proof of concept to a working web crawler, yes, now you can easily start a web crawl and it will work as it should albeit missing advanced features, but still a crawl can be initiated and the expected results will return to the user.

Not only basic crawling but also some optimizations are implemented in this release, mainly in the field of binary data downloading.

What are the main new things introduced in this release?

  • Redirect support
  • Binary file handling
  • Head fetches
  • fetch-by-size limiting
  • Many many bugs crushed
  • much better SSL handling
  • Upgraded code compatibility to java8
  • editor-config compatibility
  • Singleton services
  • Lombok usage for data boilerplate and logs
  • GitIgnore

A special care was given to the unit tests in this release

  • Tabulated results
  • Added an internal web server to host testing pages
  • Made all tests pass on every build
  • Lots of work on organizing the tests

And a lot of attention was given to the actual project

For the full changelog you are invited to the closed bugs grouped by the v0.3 milestone

Categories
News

MWC Milestone 0.2 has been Reached

MWC has reached a new milestone v0.2.

All milestones are important and this one is no different, MWC is now in a working state, though not a perfect one as there are so many edge cases in the world wide web, but MWC can be easily started, it’s internal code structure makes sense, and the most basic crawling features are implemented.

All of the above being said, there is still so much work to be done till MWC will be in a presentable state.

Basic features should still be polished, exception handling should be upgraded, some 3rd party implementations of the main components should be added and of course there is the matter of advanced features (Proxy support, authentication support, proper politeness support and many more).

Anyway, this is a start, and a good one at that, so I am pleased

Modular Web Crawler