[rbldnsd] long url

Wed Oct 19 13:40:33 MSK 2016

On Tue, Oct 18, 2016 at 08:12:54AM +0000, Nicola Piazzi wrote:
> Hi,
> How is posible to load and use a list containing phishing urls ?
> 
> For example a line like this :
> http://www.m2pr.org/images/1-9/Atualizando/Cliente_Santander_845215245/Atualizacao_de_Seguranca_788151154/=9521314511/index1.php

This would be tricky at best.

To start with, the character set allowed in DNS is quite restricted
(basically letters, digits, hyphen, underscore). While you may
try to use other characters if you query rbldnsd directly, there
could be problems if the query is made through a standard resolver.
Then there are limitations in the total length, and in the length
of each token.

You may attempt to go around these limitations by tricks such as, say,
taking a hash of the URL (MD5, SHA etc) and listing the hash. For
instance you can produce a 20-byte hash with SHA-1, this would
give you 40 hex digits that you can break into five 8-digits
tokens and you would be done, you would query something like
dd936181.516c69ab.33846f9f.4236d092.6e8f4a3d.www.m2pr.org.<dnsbl_domain>.

*But* another much more serious problem is that URLs are very
often "personalized", coding the spam recipient inside, and you
do not know how the spammer would code it.  For instance, it
could be through a dictionary word just after the / following
the hostname.  Some spammers even personalize hostnames, sometimes
even using other people domains (wildcard DNS record injection on
compromised nameserver). So essentially it would be pointless to list
URLs since every spam instance would have a different one.  Without 
doubt, if major BLs started doing this, all spammers would quickly
adapt and go around it.  At the same time, the personalization would
result in an immense blowup of lists based on big detectors,
and the list would become so big to become unmanageable.
In short, a predictable failure.

If you do this on a local basis using your own spam flow as
data source, you will not blow up anything but the effectiveness
is likely to be quite limited.

We believe that the current approach of going after domains or possibly
full hostnames, using separate return code spaces for spammer-owner
resources and compromised resources, is the most effective to
get security problems solved even if there could be occasional false
positives.

Alex Lasoriti
Spamhaus Technology