SourceForge Logo

WebLech URL Spider

Download Latest   ·   SourceForge Project Page   ·   User Survey

WebLech is a fully featured web site download/mirror tool in Java, which supports many features required to download websites and emulate standard web-browser behaviour as much as possible. WebLech is multithreaded and will feature a GUI console.

Similar in some aspects to tools such as wget (in recursive retrieval mode), WebSuck or Teleport Pro, WebLech allows you to "spider" a website and to recursively download all the pages on it. You can then browse the site offline for your convenience, or even "mirror" the website and re-publish it yourself. Note that WebLech is not suited to downloading single URLs -- use wget for this kind of thing.

Latest news

WebLech 0.0.4 In-Progress 12th June 2004  
It's been a while, but we're now working towards a new release this will include a fully functioning GUI. And some fairly major code re-structuring to support future enhancements such as regex URL filtering and java script parsing. For any developers interestedin extending WebLech, a working build from the new 0.0.4 version can be downloaded frm sourceforges' CVS (as described below) [Tom Hey]
 
Help Wanted! 14th June 2002  
If you want to write a graphical console for configuring and running the WebLech Spider, please mail me and let me know. I've tried and just can't get stuck into Swing. Any help appreciated!
 
WebLech 0.0.3 Released 10th June 2002  
The new release has URL categorisation support for downloading "interesting" URLs first and leaving the "boring" ones until later, checkpointing so the Spider state can be saved and resumed without starting from scratch, and a number of bugfixes. Download now!

Features (or "Why do I need this?")

WebLech has a number of features that make it useful:

Status

WebLech is in a pre-alpha state at the moment. The basic spidering code is written and is functional, so you can spider a website with it. Basic authentication and referer preservation are in, so is depth first vs breadth first search, and URL filtering to keep the spider on a single site. The multi-threading code is in and works great, so you can download multiple URLs at once during a spider session. There is no GUI yet, so configuration of the Spider is performed using a simple configuration file.

Get involved

You can get involved with the development of WebLech simply by downloading it and trying it out. Please send feedback to weblech@hotmail.com or visit the discussion forum on SourceForge. WebLech is in its infancy, and the more feedback we get the better it will become. All comments are welcomed (especially bad ones -- tell us what's broken and we'll fix it!).

If you're a Java developer, each of the releases contains the full Java source to WebLech. Patches and suggestions would be great! You can also access the code via CVS. Instructions for accessing via anonymous CVS are available here. The module you need to check out is "weblech". Please make sure any patches you create use unified diff format. To do this, use "cvs diff -u". Patches should be mailed to weblech@hotmail.com.


Last updated: 12th June 2004
CVS Header: $Header: /cvsroot/weblech/weblech/www/index.html,v 1.5 2002/06/13 19:23:26 weblech Exp $