Today is a good day :-), today I released Modular Web Crawler v0.3!
This milestone is a huge one and is very significant, it makes MWC from a proof of concept to a working web crawler, yes, now you can easily start a web crawl and it will work as it should albeit missing advanced features, but still a crawl can be initiated and the expected results will return to the user.
Not only basic crawling but also some optimizations are implemented in this release, mainly in the field of binary data downloading.
What are the main new things introduced in this release?
- Redirect support
- Binary file handling
- Head fetches
- fetch-by-size limiting
- Many many bugs crushed
- much better SSL handling
- Upgraded code compatibility to java8
- editor-config compatibility
- Singleton services
- Lombok usage for data boilerplate and logs
A special care was given to the unit tests in this release
- Tabulated results
- Added an internal web server to host testing pages
- Made all tests pass on every build
- Lots of work on organizing the tests
And a lot of attention was given to the actual project
- Added github project (to have a more agile experience)
- Added a CI/CD build in Github
- Much better readme.md
- Created a website home for MWC