KWiCFinder Related Links

Here is a selection of my links to topics and resources related to Internet searching and personal search engines / agents, distributed indexing and archiving, concordancing and corpus linguistics, copyright and digital libraries, programming, and useful utilities. Still in initial stage of development.

Web datasets compiled with KWiCFinder

From the preliminary version of a Web corpus with 97,198,272 tokens and 525,509 types

PIE Web Corpus 2006 100 or more HTML
HTML version of list of 30,524 types occurring 100 or more times
PIE Web Corpus 2006 100 or more TAB
Tab-separated text version of list of 30,524 types occurring 100 or more times
PIE Web Corpus 2006 10 or more TAB
Tab-separated text version of list of 104,675 types occurring 10 or more times
 

Internet Searching

 
 

Top

Personal Internet Search Engines & Agents

Links for information only -- software not evaluated yet

Alkaline
Search engine precompiled for various flavors of Unix, Linux and Windows, free for non-commercial use.  Mentions indexing up to 500K pages.
ASPSeek
Freeware search engine for Linux. Crawls the Web, indexes pages, and provides user search facilities; claims capable of searching millions of pages.  Open source.
DataparkSearch
Open source search engine akin to ASPSeek and mnoGoSearch, but under more active development than the former (April 2004).
mnoGoSearch
GNU general public license search engine; runs under Unix, Linux, or Windows.  Appears to have both free and paid versions.  Includes support for various languages.
Perlfect Search
Freeware Perl search engine script; runs under Unix, Linux, or Windows.
WebSPHINX
A Customizable Personal Web Crawler; GNU license freeware; Java (= multi-platform).
ht://Dig
Complete WWW indexing and searching system; GNU license freeware; requires Unix or Linux. Would be useful for monitoring a selection of sites or as the basis for a specialized search engine.

Top

Distributed Web Crawling / Indexing / Archiving

Links for information only -- software not evaluated yet

Grub.org
"Grub provides a free for download, free to run, distributed crawling client, which is used to create an infrastructure (database + volunteers) that will eventually provide URL update status information for nearly every web page on the Internet. Grub's distributed crawler network will enable websites, content providers, and individuals to notify others that changes have occurred in their content, all in real time."
Herodotus
Timo Burkard, Herodotus: A Peer-to-Peer Web Archival System, MIT Master's Thesis, June, 2002. (.PDF file)
Building a Distributed Full-Text Index for the Web
Paper from the Tenth International World Wide Web Conference 1-5 May 2001, Hong Kong by Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-Molina, Computer Science Department, Stanford University.

 

Top

Concordancing and Corpus Linguistics

 
 

Top

Copyright and Digital Libraries

KWiCFinder could be extended to support creation of special-purpose online corpora from online documents as outlined in this paper. Clarification of the copyright issues is an essential prerequisite to such an initiative. 

Internet Archive Copyright Links
Links to various resources, including the National Academy Press' book The Digital Dilemma:  Intellectual Property in the Information Age, the Association for Computing Machinery's Intellectual Property page, and to the Archive's amici curiae brief to the Supreme Court in the "Sonny Bono" Copyright Term Extension Act case arguing that ., which argues
Internet Archive "How People Envision Using Internet Libraries"
Discusses the many reasons for archiving the Internet and for digital libraries in general.
Kenneth D. Crews' Copyright Information Center
Legal opinions on topics related to intellectual property in a university framework, from Indiana University-Purdue University Indianapolis.  Prof. Crews in the principal investigator of copyright issues on the National Science Foundation-funded Digital Music Library project, whose Copyright Page links to papers and opnions on the issue of digital libraries.
.
.
 

Top

Programming

PowerBasic
Superfast Windows 32 implementation of Basic which does all the "heavy lifting" for KWiCFinder. Very active and supportive user community. Highly recommended for programmers who want the power of C without the arcane 

Top

Useful Utilities for Windows

BK ReplaceEm
Powerful search and replace utility which can operate on groups of files; supports regular expressions; freeware. Valuable for webmasters and programmers.

Top


KWiCFinder Home
Screen Shots | Report Formats | Sample Search Reports
Download KWiCFinder | Support

Feedback Questions or Suggestions
Author William H. Fletcher
Version 8 December 2006
URL http://KWiCFinder.com/RelatedLinks.html