- Web Site Indexing and Searching

[ Home | Whats New | Recommended Books | Search | Subscribe ]
[ TP | DB | Java | JavaBeans | C++ | Design-Pattern | CORBA | Server | Script | SCM | Dev-Mags | Net-News | WebDev | Net | XML ]

Get the FREE SwTech e-mail newsletter :
Support this site by buying one of our Recommended Books

Search the SwTech.com site:   

Servers : Web Site Indexing and Searching

Technology and systems for indexing web sites and page contents.

See Also:
^Web Servers

* EWS - Excite for Web Server's
A free search engine for web servers, available on a wide variety of platforms and operating systems.
* CheckBot home page
Checkbot is a tool to verify links on a set of HTML pages. Checkbot can check a single document, or a set of documents on one or more servers.
* ht://Dig - WWW Search Engine Software
The ht://Dig system is a complete world wide web indexing and searching system for a small domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like AltaVista - it is meant to cover the search needs for a single company or campus.
* Web Glimpse
WebGlimpse adds search capabilities to your WWW site automatically and easily. It allows the search to cover the neighborhood of that page or the whole site.
* The Harvest Information Discovery and Access System
Harvest is an integrated set of tools to gather, extract, organize, search, cache, and replicate relevant information across the Internet. With modest effort users can tailor Harvest to digest information in many different formats from many different machines, and offer custom search services on the web.
* Roll Your Own Search Engine
A walk-through of what it takes to write a search function for your web site.
* WWW::Search
WWW::Search is a collection of Perl modules which provide an API to WWW search engines like AltaVista, Lycos, Hotbot, WebCrawler, and so on. It includes two demo applications built from this library: AutoSearch (an program to automate tracking of search results over time), and a small demonstration program to drive the library.
* WWW::Search Windows Support Page
This site provides a self-installing Windows version of John Heideman's WWW::Search API. WWW::Search found on this site has been slightly edited for Windows, but are been tested under both Windows 98 and NT4.
* AutoSearch WEB Searching
AutoSearch is a demo program which performs a web-based search and puts the results set in a web page. It periodically updates this web page, indicating how the search changes over time. AutoSearch is distributed as a part of WWW::Search, a Perl API to a set of web search engines.
SWISH-E is the "Simple Web Indexinf System for Humans - Enhanced"! It's a fast, powerful, flexible, and easy to use system for indexing collections of Web pages or other text files.
* SWISH++: File indexer and searcher
An enhanced derivitive of SWISH, considerable faster than SWISH-E and automatically splits / merges large indexing jobs. Also has a utility for aiding in the indexing of non-text files.
* Dublin Core Metadata
The Dublin Core is a 15-element metadata element set intended to facilitate discovery of electronic resources.
* eXtense
eXtense is a suite of internal search engines (some free, some commercial) for web sites and intranets.
* Worldwide Web Search Engines
A reasonably comprehensive list of WorldWide Web Search Engines, many of them providing meta-searching capabilities.
* Alkaline Search Engine
Alkaline is a powerful search engine from Vestris Inc. in Switzerland, using a "cellular expansion" search algorithm! It is free for non-commercial use, and runs on NT, Linux, Solaris, and several other flavours of UNIX.
* ASF - Advanced Search Facility
The Advanced Search Facility provides tools for gathering and organizing information within and among information communities.
If you want to:
* Pavuk Home Page
Pavuk is UNIX program used to mirror contents of WWW documents or files. It transfers documents from HTTP, FTP, Gopher and HTTPS (SSL) servers. Pavuk is free software distributed under terms of GNU public license.
* ASF Administrator's Guide
This document describes how to set up and use the various components which make up the ASF software distribution.
* Advanced Search Facility FAQ
This is a list of Frequently Asked Questions about the Advanced Search Facility (ASF) project.
* Isearch
Isearch is software for indexing and searching text documents. It supports full text and field based search, relevance ranked results, Boolean queries, and heterogeneous databases. Isearch can parse many kinds of documents "out of the box," including HTML, mail folders, list digests, SGML-style tagged data, and USMARC. It can be extended to support other formats by creating descendant classes in C++ that define the document structure. A CGI interface is also included for web based searching.
* Zebra
The Zebra system is a fielded free-text indexing and retrieval engine with a Z39.50 frontend. You can use any compatible, commercial or freeware Z39.50 client to access data stored in Zebra.
* The YAZ Toolkit
YAZ is a programmer's toolkit supporting the development of Z39.50v3 clients and servers. A sample client and server is included with the distribution.
* Z39.50 Maintenance Agency Home Page
The the place to go for the latest official word on the state of the protocol!
The Z39.50 standard, "Information Retrieval (Z39.50): Application Service Definition and Protocol Specification", is represented as both ANSI/NISO Z39.50 and ISO 23950. This page provides information pertaining to the development and maintenance of Z39.50 (existing as well as future versions) and the implementation and use of the Z39.50 protocol.
* Z39.50 and the World Wide Web
a brief discussion by Index Data in Denmark about the role of Z39.50 in the global information community.
* Online Component Repositories in the Global Engineering Enterprise
An analysis by Index Data in Denmark of the possibilities of the use of Z39.50 in software re-use libraries. A good example of the versatility of the Z39.50 model.
* Lucene
Lucene is a high-performance, full-featured text search engine written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
* The Lucene search engine
Lucene is a Java-based open source toolkit for text indexing and searching. It is easy to use, flexible, and powerful -- a model of good object-oriented software architecture. Powerful abstractions and useful concrete implementations make Lucene very flexible, and allow new users to get up and running quickly and painlessly. This JavaWorld article, explores what Lucene does, how it works, and what software engineers can learn from its design.

DevLynx - Developer Links

Add your own Developer Links:

You can now suggest your own DevLynx to include on this page.
Search the Software Technologies site:   

Home Copyright © 1996-2006 Software Technologies Ltd.
All rights reserved. All trademarks acknowledged. E & O E.
Privacy Policy.
Designed for
Microsoft Internet Explorer
Designed for
Netscape Navigator
webmaster@SwTech.com http://www.SwTech.com/server/websvr/wsindex/