Victor is a web page cleaning tool. It is aimed at removing menu, ads, footers, headers, etc. from HTML web pages, so that only main web page content remains. Victor is based on a conditional random fields algorithm.
Tool for creating own tools and resources
web data, text processing, web-service
Computational Linguistics, Linguistics
Charles University in Prague