S E A R C H
|
fravia's how to search ~ Lesson
10 ('light' version)
Fravia's Nofrill
Web design
(1998)
|
|
June 1998
|
Lesson 10 LET THE BOTS SEARCH FOR YOU
...and build your own search-bots :-)
'LIGHT' version |
Based on some original private
emailings from +ORC |
Searchengines' strings
cracked by Master Accmailer G.E.
Boyd |
Preceding lessons:
lesson_5 about general
agora http:// retrieving ~ July 1996
lesson_6 about ftping
files agora queries and emailing altavista ~ December 1996
lesson_7 about the
W3gate, search spiders, error messages and evaluation of results ~ March 1997
lesson_8 about advanced
searching techniques (combing
and klebing) ~ November 1997
lesson_9 about "effective"
searching techniques (infoseek 'finalised' and dejanews filtering) ~ January 1998
|
|
WARNING: For reasons that are better explained in
my bot wars section, I have decided to publish openly only
'light' versions of my own work. Complete versions are of course available, but only
for those readers and searchers that exchange valuable knowledge with me... send your own tricks and essays
and you'll get
the 'full version' URLs... Yes, you'll have to pay... but not with useless money (quelle vulgarité):
you'll have to pay my knowledge with the only real money that exists on this web of ours:
your own knowledge! |
Go to Never forget the bots!
Go to The 'pasted stringsearch' method
Go to FTPmail: mailing and re-mailing :-)
Go to Is gopher dead?
LET THE BOTS SEARCH FOR YOU
...and build your own search-bots :-)
'to know answers is easy, the difficult
part is knowing how to find any answer' (+ORC)
Never forget the bots!
I have decided to 'resume' some of the must know
techniques for automated searching and data retrieval on the web for all those
readers that keep writing me that some of the ftpmailer listed in my older lessons
don't work anymore. Kids: the Web is a Quicksand! Lotta sites and servers and bots
DISAPPEAR, but this does not mean anything at all: since you (should) know the sublime art:
how to
search, you'll always be able to catch the same (or analoguous) sites and services
elsewhere!
As you already know (since I assume you have read the preceding lessons and
have learned the basic of all 'getweb' techniques :-) there are many automated servers,
out there, that will send you pages/files/source code and/or will answer
your queries... of course for free, this is still 'our' web after all, the evil powers
of commercialisation and advertisement don't dominate the net (yet)
As usual, since you're going to work with email, first of all check how
much info you are leaking around with your own emails: send right now an email to
echo@tu-berlin.de
write 'test' both in
the
'Subject' and in the 'Text' fields and examine with attention what you will get back as
automated answer in a couple of seconds from this German echo
bot...
OK? Everything ok? Your emailing traces are nice enough?
Now let's start this lesson 10...
Let's list the main services we'll deal with:
1) I wanna get pages, files and images from da net!
AGORA
agora@dna.affrc.go.jp [01]
agora@kamakura.mss.co.jp [02]
agora@www.eng.dmu.ac.uk [03]
AGORA-LIKE
w3mail@gmd.de [04]
w3mail@enigma.gex.gmd.de [04]
webmail@www.ucc.ie [05]
2) I wanna search da net
GETWEB
getweb@unganisha.idrc.ca [06]
getweb@lanic.utexas.edu [07]
getweb@usa.healthnet.org [08]
ILIAD
iliad@algol.jsc.nasa.gov [09]
iliad@rosy.tenet.utexas.edu [09]
3) I wanna patrol da net
E-MAIL-QUERY
Email-Queries@Reference.COM [10]
4) Oldies but useful
GOPHER SERVERS AND VERONIKAS
gophermail@eunet.cz [11]
gopher@dna.affrc.go.jp [12]
http://veronica.psi.net [13]
[01] the most used one by those who know this stuff
[06] a beautiful one for searches:
[04] a very powerful one for images retrieval
[08] very fast but with a 200.000 bytes weekly quota
[09] iliad has a "get url" or a "iliad query" function
[10] a very powerful 'filter' possibility to automatically patrol usenet
Each one of the preceding services will give us the possibility to learn
a different face of searching... we'll now examine them all (only three in the 'light' version
of this lesson)
agora@dna.affrc.go.jp [01]
Who knows if these nice people from Japan really grasp how IMPORTANT their
fantastic service
is for any Interenet user? This is the "mother of all agoras", because it's 'speedy
quick' and allows the three famous commands SEND (your
target URL's text), SOURCE (your
target URL with all its
HTML formatting, so that you can browse it off line, pretty important in order to
browse 'almost' anonymously a delicate target site :-) and DEEP
(one URL with
all linked URLs on it... yet whatch it! You can get hundred of emails if your target is
a page that links to a lot of pages, like my aca300.htm).
Agora allows the retrieval of zipped files as well, btw, if you for instance ask for:
send ftp://ftp.crl.com/users/iv/iverham/ua.zip
agora will
deliver you Uzi Paz's famous (and invaluable) file on Usenet access, techniques and
newsgroups.
The 'pasted stringsearch' method
So, how do you do a search with an agora? Well, the trick is to do a search
exactly as you would do it in your own browser... therefore you must first of all learn
how you should search using your own browser, which many readers still don't know: i.e. the
'pasted stringsearch' searching method... very useful indeed if you until
now only searched using the ready-made searchengines forms, like the altavista
one below or, if, even more slowly, you only used the
advertisement overloaded front pages of the search engines themselves :-)
1) copy the following
line (highlight it and then CTRL+C)http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&kl=XX&q=bozo
2) paste it into your browser's "URL" small window (CTRL+V, duh)
3) replace the "bozo" keyword with your search phrase, separating different words with a plus (+)
sign, not with blanks... [ida+disassembler+regged] for instance... :-)
4) Press ENTER and up you go... much quicker than accessing altavista's real site isn't it?
Actually it's even quicker than using a form like my own one:
Try both the form and the 'pasted stringsearch' methods for searching on line
now... which one is quicker? :-)
Now, the same 'stringsearch' method
can be used (with an agora server), per email. The
advantage in this case of course is NOT rapidity, is automation... the following
pre-prepared email form can be your first 'home-made' generic search agent... just cut and
past the following block as TEXT in a email to agora@dna.affrc.go.jp and you'll see what
I mean (send after having search-replaced [bots+source] with [your+own+searchstring], duh):
send http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web\
&kl=XX&q=[bots+source]
send http://webcrawler.com/cgi-bin/WebQuery?searchText=bots+source
send http://search.dejanews.com/dnquery.xp?QRY=bots+source\
&defaultOp=AND&svcclass=dncurrent&maxhits=20&ST=QS&format=terse&DBS=2
send http://search.dogpile.com/search?q=bots+source&fs=web&ss=stop\
&to=twenty
send http://www.excite.com/search.gw?trace=a&search=bots+source
send http://www2.infoseek.com/Titles?qt=bots+source&col=WW
send http://www.lycos.com/cgi-bin/pursuit?cat=lycos&query=bots+source\
&mtemp=lite
send http://www.metacrawler.com/cgi-bin/nph-metaquery?general=bots+source\
&method=0&sort=relevance<arget=window&useFrames=1&iface=int1
send http://search.opentext.com/omw/simplesearch?SearchFor=bots+source\
&mode=and
send http://guaraldi.cs.colostate.edu:2000/search?KW=bots+source\
&Boolean=AND&Hits=10&Mode=MakePlan&df=normal&AutoStep=on
send http://search.yahoo.com/bin/search?p=bots+source
See?
Now you can automate the whole process: prepare a batch file that will compose your
'agora search' email, say every two days... with (some of) the selected search
engines above, with your
preferite search strings... and you are set for fishing the deep deep web without
much work...
getweb@unganisha.idrc.ca [06]
OK, admittely the 'pasted searchstrings' method above has got a strong 'concurrence'
from the new 'breed' of getweb servers... unganisha, for instance is
a beautiful canadian robot. The getweb servers make it extremely easy to use any
form based search engine, and have moreover integrated automated facilities for
three difefrent search engines: SEARCH ALTAVISTA, SEARCH YAHOO and SEARCH INFOSEEK.
Just email getweb@unganisha.idrc.ca leave the subject blank and
compose in your text the following:
begin
SEARCH YAHOO "automated retrieval" bots
end
Notice the blank lines BEFORE begin, after begin, before end and after end. Since these
blank lines are required by some of the getweb systems, you better get used to use them with
EVERY getweb system, just in case. Of course you can substitute SEARCH ALTAVISTA or SEARCH INFOSEEK to
the SEARCH YAHOO command above. SEARCH INFOSEEK has two important additional switches
that will give more power to your search: NN (search the usenet) and NW (search
only among the past MONTH of news)
Just email getweb@unganisha.idrc.ca leave the subject blank and
compose in your text the following:
begin
SEARCH INFOSEEK NW "automated retrieval" bots
end
Getweb's limits
There are limits on all these automated servers, these vary and
lay currently between 10 and 100 documents requests
every week OR between 100.000 and 700.000 kilobytes every week, of course
you can use different email accounts to multiply your allowed quotas. Week limits
regenerate after seven days from trespassing, NOT on monday morning :-)
Email-Queries@Reference.COM [10]
The emmail query service provides a powerful interface that lets
you refine queries by author, author's organization, subject,
newsgroup or e-mail list
So, how d'you use it? Well, first of all TRY IT right now with a
"on the fly" query...
FIND 'software reverse engineering' WHERE AGE <14 DAYS
And then send for HELP and learn how to create your own automated filtering bots...
here you have a very simple example:
DEFINE QUERY botsscri AS
FIND agents
AND scripts
AND source
AND NOT fan money jobs sell help buy god
END
Ok, that should be enough for a start... and I believe that
if you never used this service before
you'll thank me a long time for this... more on the 'full' version of
this lesson...
FTPmail: mailing and re-mailing :-)
This is the 'light' version, I'm sure you have had enough info for to-day...
Is gopher dead?
This is the 'light' version, I'm sure you have had enough info for to-day... anyway, you
should at least understand that gopher of course is not dead, the www notwithstanding... :-)
Should you want to retrieve large zip files (say MPEG huge files)
that are accessed via
a web page (and don't refer to any FTP site... else we should
use ftpmail :-) you should by all means learn what gophers are
and how to use them. The idea to download huge files on-line is
IMO pretty silly: the aleas of the web and the number of accesses
to *ahem* pretty sensible files make such downloads a very difficult
enterprise at times. Once you have mastered the gopher techniques you'll never
download huge files on line again (get them sent to you by an automated
bot that will automatically retry to connect every time its connection
breaks... isn't it nice?)
Go ahead, enjoy!
(c) fravia+ 1998, work in progress, all rights reserved nevertheless
Back to how to search
how to search 5
how to search 6
how to search 7
how to search 8
how to search 9
homepage
links
+ORC
tools
students' essays
antismut
anonymity
javascript wars
academy database
counter measures
cocktails
bots wars
search_forms
mail_fravia
Is reverse engineering legal?
(c)
Fravia 1995, 1996, 1997, 1998. All rights reserved