HOW TO SEARCH THE WEB
by fravia+
~
Letter 008 - November 1997
ADVANCED SEARCHING TECHNIQUES
(Combing and klebing)
(Based on some original private emailings from +ORC)
~
This stuff has been gathered and written by fravia+, so if you
leech, copy,
use and spread it have at least the decency to give
credit
__Combing__
(Some other specific combing examples are to be found inside my antismut pages)
What is combing?
Combing is a very effective search strategy: instead of simply searching,
you 'milk' (or 'comb') various other net resources:
- The continuously updated "Top 100", "Top 1000", "Top whatever" URL-locations
('Real' combing)
- Usenet newsgroups and their various "vigilant filters" and "short range queries"
(Usenet combing)
- Relevant site links pages.
(a form of 'crumbs gathering', see my anti-smut pages)
Real combing: The WebSideStory example
Best way to learn combing is to have a try by yourself:
Let's take as example (yet there are THOUSAND of these 'top whatever'sites)
one counter-related site that I have been using myself (it offers quick
text-only stats and awful graphic stats): WEBSIDE STORY.
Here is websidestory's self-praise:
Updated every four hours. Last update Mon Nov 3 12:00:01 1997 - PST
WebSideStory, Inc. currently monitors 30,783 sites who have 15,264,666 visitors per day.
There are 22,750 sites listed in 36 categories, averaging 1,996,241 visitors per day.
The problem with all these 'top whatever' sites is that you oft
have to wade trough a lot of pages to get where you are interested,
because the poor sods want you to read their awful ads.
Famous Listing of the Best Sites on the Internet
In the case of Websidestory you'll
for instance first land at http://www.hitbox.com/wc/world.html:
WebSideStory's first page
Yet you'll eventually land inside this second page (divided by categories):
Websidestory's second page
And here you'll be eventually
able to choose among the various categories that this counter-related database depot
has chosen, for
instance the following ones (I have of course chosen the ones I reckon could yeld some results:
Now you have seen it...
Obviously combing is an important technique for whatever interest you may
have, quite effective and pretty
useful in order to spare an incredible lot of Internet searching hours.
For combing purposes you may also use:
1) ftp search, looking for "hidden" subdirectories with relevant names
As anybody that knows how to use
ftp search ("This server is located in
Trondheim, Norway")
already experienced, the ftp search approach (that fishes hidden directories)
can fish incredible (if tricky to interpret) results.
Just do a quick search for
'warez' and you'll see what I mean.
2) the "big page provider" search engines
(Like the search engines that work page specific for geocities at
http://www.geocities.com/search/
or for mygale, or for angelfire, or for fortunecity, or for chez, or for you name one
of the thousand existing free pages providers that have specific search engines)
There are THOUSAND of 'top whatever' counters and many carry some form
of 'top side listing' within... you may want to examine a list with
MANY counters on this good page:
Web
Counters and Trackers (Access Counters for Web
Sites; Free Counters; Web site auditing)
Usenet combing
Usenet combing can work "on the fly" or "regularly" through the "Vigilant"
filter at
filter@vigilant.bc.ca
I'll show you for instance one of my favourite simple queries:
FIND how-to-search tutorial manual
NOT spam
NOT top position
NOT advertising
MAX 8
Such a query would give you useful information about "searching techniques" on
the Web, you may of course construct how many queries you like and *register*
(for free) by the vigilant filter, in order to get
the results of your usenet queries emailed to you every day or week or
month.
The vigilant robot
Learn the secrets of usenet FILTERING! Email
filter@vigilant.bc.ca with
the word "help" inside BOTH subject and text
and learn how to use it as soon as you get vigilant's
automated answer... this robot
is capable of sending you automatically ALL usenet
messages that contain
the wording that you have chosen... vigilant is NOT a
usenet depot, like Dejavu or
reference.com... vigilant will send you (obviously for
free) "on-the-fly" all usenet
messages that transit around dealing with matters that
may interest you, at times
inside newsgroups you do not even know the names of...
to master well its filter
capabilities is quite
tricky though... study it and use it... you'll never
regret it and I'm sure you'll
thank me for this tip
UNFORTUNATELY DOWN SINCE THE BEGINNING
OF AUGUST!
Why? Has anybody any clue? Are
there other "vigilant" services? This is another of the
"mysteries" of the Web: good services are retired and
awful bogus and useless "push" services abound:(
Dejanews
Remember that you can gather an INCREDIBLE amount of
information through the following
Usenet "depot":
DejaNews
__ONE OF THE *SCARIEST* BIG BROTHER SNOOPER ON THE
WEB__
You'll use it a lot, it allows you to reconstruct a
personality profile as soon
as somebody uses newsgroups (like all do). As a matter
of fact I tried to understand
who the hell hydes behind this service... have a look
at my deja.htm page if
you are interested too in this kind of things... hey,
did you know that there exists
also a nice stalking page of mine where
these matters are
explained a little more?
And did you know that you
may even
"snatch" information
from people browsing your pages?
Reference.com
Finally, you can gather an INCREDIBLE amount of information through the following
Usenet "depot":
reference.com
here you'll be able to "register" your
automated queryes... and THAT, believe me, is
really useful to snoop what's going on and where are the sites
that you are looking for...
In fact usenet combing could be
translated in 'let other people do the searches for me...": you'll
simply find email snippets of people that has found the solution to your
query inside some
usenet group you do not even know the name of!
Usenet queries that can be done through the two big Usenet "depots":
Dejanews
and
email query, are possible ALSO through the major search engines (if you know
how to use them) and using the 'klebing' techniqe explained below:
Many of the main search engines allow such querying
too, and they use (of course) the services of either Dejanews or emailquery.
NOTE THAT THERE ARE MANY MORE 'usenet-depot'... I recently found an 'italian'
one at http://www.mailgate.org/mailgate/index.html who
knows how many more there are around!
__klebing__
Fishing query strings and locations
Klebing is a 'reversing search' technique that goes ways beyond
"combing". And which offers incredible value. We will clear out
what klebing is, below, using a ready made example on a site
that you'll probably already know (it is an important hacker site and I link
to it myself inside
my links page): here is the 'normal'
URL of that site: L0pht heavy
industries.
We can use LOpht for this example because LOpht
has (publicly) the 'row material' that we need for klebing: the 'remote connexions' list.
It is basically a very
simple CGI-script, that updates inside its own database (LOpht updates every day)
all the "remote" URL locations (i.e. the sites the various visitors come from) accessing any of the pages of a given site.
You may easily write such an analogouus spider and add
it to your site! In order to write quickly (and dirty)
a 'crude' CGI-script like this you
just need to list all the
var where = document.referrer variables that any
lamer's browser carries inside (well... not our reversed and 'ameliorated'
browsers... in order to learn the relevant techniques you may want to have
a look at Mammon_'s Reversing Netscape's buttons and
menus essay... my copy of Netscape carries for instance a different random
-and of course faked- document.referrer variable everytime it accesses a new
site :-)
Well, have a look at the next link and you'll understand what I mean:
Here you have the real, updated LOpht's location you'll use yourself
in order to perform your updated klebing endeavours:
http://www.l0pht.com/ref.html
And here you have a copy of it that you should examine
NOW in order to better follow what I'm telling you.
In order to
discuss together with you some of the 'results' of
our klebing activities I have copied a 'still image' of this
continuously updating database inside my site, talen from the location
above
on 4 Nov 1997 (to-day), here it is:
lophtrev.htm
So, now that you had a look at them, let's say a couple of things:
1) The utility of such a script from the Webmaster's point of view
is obvious: he can immediately see
WHO is sending hits to him and WHERE inside his site does he link to (and
he can 'punish' eventual
'fastidious' linking inside his site simply modifying
the name of the branched
pages, like I'll do soon with the academy section of my site if you keep entering from the
sides to my pages :-(
2) The utility of such a script (if publicly presented, like
this by LOpht, or else
if 'somehow' findbar inside a /cgi subdirectory -see my antismut pag for the
relevant CGI-cracking techniques :-) is for our search purposes HUGE! If the site
has some attinence with fields you are interested in (and LOphts for sure
has it with sites that may interest us!)
you are in for a surprise... in fact one wonders what's the point
of laboriously browsing the web in search of possible new intersting
sites where you could eventually learn something! Let
those same sites COME TO YOU all by themselves alone... isn't it nice?
In fact, what do we have here?
Let's have a look at some intersting little fishes:
Yahoo and excite for instance, find both this site through the cdc cult
1409 | http://www.yahoo.com/Society_and_Culture/Religion/Humor/Parody_Religions/Cult_of_the_Dead_Cow/ -> /cdc.html
125 | http://www.excite.com/search.gw?trace=1&search=hackers -> /cdc.html
'our' astalavista is also present:
124 | http://astalavista.box.sk/cgi-bin/marek/robot/robot?srch=warez -> /lounge.html
Note thet there is already something that may be interesting for you (albeit
well known by all search-experts): the FORM that an
excite or astalavista query takes!
Yes, if you have read my previous letters, you'll have seen that it is possible
to query search engines per email using URL addresses like:
http://lycos11.lycos.cs.cmu.edu/cgi-bin/flpursuit?first=1\\&maxhits=30\\
&minterms=1\\&minscore=0.01\\&terse=standard\\&query=linguistic+phenomena
Therefore we have here a simple 'template' that we can immediatly use for OTHER
queries... c'mon: try it out: cut and paste the following line:
http://www.excite.com/search.gw?trace=1&search=hackers
that we have found through our klebing work, and paste it inside the 'location'
window of your copy of navigator...
Have you done it?
Well, now backspace over hackers and digit instead
crackers
.
Now press enter and have a look: your own ready-made excite search string!
And youll find THOUSAND of powerful and frequent or funny and seldom used
'query string' possibilities trough
this klebing method... d'you understand now how POWERFUL this can be?
New strings
Back to our klebing page... as you can see, in order to land
somewhere at LOft a part of these visitors has used Yahoo and has
searched for 'hackers', 'attress', 'spycamera' and more
Now, some of these are banal, like 'hackers', yet some are quite
interesting, like 'email intercepting'.
This can also be quite interesting... I have quite a lot of ready-made
strings that I use with the search engines, and some of them I have gathered
klebing sites... else I would probably never have come to some ideas.
Watch the watchers
Some of our enemies have sites somewhere that tehy use to check us... it may
be quite interesting to snoop onto those sites... through klebing you'll get
them... have a look at what we have here at LOpht:
316 | http://www.microsoft.com/security/ntprod.htm -> /advisories.html
103 | http://www.microsoft.com/security/issues.htm -> /advisories.html
Unknown mysteries
This one links to a cgi-bin page... why?:
105 | http://nowhere/nothing.html -> /cgi-bin/Count.cgi
well, this tells us
FIRST
That there is indeed a cgi-bin directory here with a Count.cgi script and
SECOND
that nowhere/nothing is interested in it.
Old friends
And who the hell is this next one? Our good old friend Bokler from Deja? (See my
deja.htm page)
14 | http://spider.bokler.com/bokler/crak_body.html -> /index.html
Well... rich fishing, isn't it?
And the following ones could be interesting too, don't you believe?
106 | http://astalavista.box.sk/cgi-bin/marek/robot/robot?srch=warez&submit=+search+ -> /lounge.html
114 | http://netfind.aol.com/search.gw
Yes, when you start klebing, you never finish off experimenting! :-)
Go ahead, enjoy!
(c) fravia+ 1997, work in progress, all rights reserved nevertheless
how to search 5
how to search 6
how to search 7
homepage
links
+ORC
tools
students' essays
antismut
anonymity
academy database
counter measures
cocktails
search_forms
mail_fravia
fravia+ 04 Nov 97