Current time: 11-29-2024, 01:02 PM Hello There, Guest! (LoginRegister)


Post Reply 
Mail traffic accounting
Author Message
ispcomm Offline
Junior Member
*

Posts: 93
Joined: Apr 2008
Reputation: 3
Post: #1
Mail traffic accounting
While working on another issue, I stumbled upon some (possible) bug in the mail traffic accounting script.

The problem is that it depends tightly on the server name (in /etc/hosts) and whether the logs contain "localhost", "127.0.0.1" or "hostname" for the localhost.

This is an issue on etchfor example because the /etc/hosts file contains:

127.0.0.1 <servername.local> localhost

On sarge the same file is
127.0.0.1 <servername.local> localhost localhost.localdomain

There's another interaction with the postfix configuration file. If "localhost" or "127.0.0.1" is used in main.cf (for the hooks of antivirus/antispam etc) the resulting logs can contain either the "server name" or "127.0.0.1" or "localhost".

All these variations don't play game with the way mail logs are analyzed in ispcp-vrl-traff. On line 343 This code is present:
Code:
        if(exists  $main::cfg{'MAIL_LOG_INC_AMAVIS'} && $main::cfg{'MAIL_LOG_INC_AMAVIS'}) {
            # Logs including AMAVIS/SPAMASSASSIN/CLAMAV entries
            $cmd = "$cat $smtp_delivery_log | $awk '{if (\$5 != \"localhost\" && \$6 != \"virtual\" && \$6 != \"localhost\" && \$10 != \"?\") print substr(\$3, index(\$3, \"@\")+1, length(\$3)),substr(\$4, index(\$4, \"@\")+1, length(\$4)),\$10;}' 1>$smtp_log";

        } else {
            # Logs not including AMAVIS/SPAMASSASSIN/CLAMAV entries
            $cmd = "$cat $smtp_delivery_log | $awk '{if (\$5 != \"localhost\" && \$6 != \"localhost\" && \$10 != \"?\") print substr(\$3, index(\$3, \"@\")+1, length(\$3)),substr(\$4, index(\$4, \"@\")+1, length(\$4)),\$10;}' 1>$smtp_log";
An the awk script will cause problems depending on the above variables.

A patch to alleviate the problem is this
Code:
=== traffic/ispcp-vrl-traff
==================================================================
--- traffic/ispcp-vrl-traff     (revision 31)
+++ traffic/ispcp-vrl-traff     (local)
@@ -340,11 +340,11 @@

                if(exists  $main::cfg{'MAIL_LOG_INC_AMAVIS'} && $main::cfg{'MAIL_LOG_INC_AMAVIS'}) {
                        # Logs including AMAVIS/SPAMASSASSIN/CLAMAV entries
-               $cmd = "$cat $smtp_delivery_log | $awk '{if (\$5 != \"localhost\" && \$6 != \"virtual\" && \$6 != \"localhost\" && \$10 != \"?\") print substr(\$3, index(\$3, \"@\")+1, length(\$3)),substr(\$4, index(\$4, \"@\")+1, length(\$4)),\$10;}' 1>$smtp_log";
+               $cmd = "$cat $smtp_delivery_log | $awk '{if ( \$10 != \"?\" && !(\$5 ~ /localhost|127.0.0.1/ && \$6 ~ /localhost|127.0.0.1/)) print substr(\$3, index(\$3, \"@\")+1, length(\$3)),substr(\$4, index(\$4, \"@\")+1, length(\$4)),\$10;}' 1>$smtp_log";

                } else {
                        # Logs not including AMAVIS/SPAMASSASSIN/CLAMAV entries
-           $cmd = "$cat $smtp_delivery_log | $awk '{if (\$5 != \"localhost\" && \$6 != \"localhost\" && \$10 != \"?\") print substr(\$3, index(\$3, \"@\")+1, length(\$3)),substr(\$4, index(\$4, \"@\")+1, length(\$4)),\$10;}' 1>$smtp_log";
+           $cmd = "$cat $smtp_delivery_log | $awk '{if ( \$10 != \"?\" && \$6 !~ /virtual/ && !(\$5 ~ /localhost|127.0.0.1/ && \$6 ~ /localhost|127.0.0.1/)) print substr(\$3, index(\$3, \"@\")+1, length(\$3)),substr(\$4, index(\$4, \"@\")+1, length(\$4)),\$10;}' 1>$smtp_log";
                }

         $rs = sys_command($cmd);
But it won't fix the issue when "hostname.local" is reported in the logs.

The patch has a few "side effects" that I'm not sure are present in the original version:

1. The smtp traffic arriving and leaving via smtp (mail relaying) is counted twice. This is correct as mail arrives at the server and leaves to another server.

2. When MAIL_LOG_INC_AMAVIS is false, traffic arriving to a virtual user is counted once (this is correct), as this traffic should not be counted.

3. When relaying mail from the domain to an MX end server (using the MX patch) the traffic is counted twice, which is correct behaviour (mail arrives and leaves).

I have created a ticket for this issue here: http://www.isp-control.net/ispcp/ticket/1345

ispcomm.
06-13-2008 11:09 PM
Find all posts by this user Quote this message in a reply
ispcomm Offline
Junior Member
*

Posts: 93
Joined: Apr 2008
Reputation: 3
Post: #2
RE: Mail traffic accounting
A fix for this issue and a fix for pop3 accounting under debian (which was broken in an attempt to fix fedora) is here: http://www.isp-control.net/ispcp/ticket/1215

Ispcomm

PS: I need dev access.
(This post was last modified: 06-14-2008 05:11 AM by ispcomm.)
06-14-2008 05:10 AM
Find all posts by this user Quote this message in a reply
kilburn Offline
Development Team
*****
Dev Team

Posts: 2,182
Joined: Feb 2007
Reputation: 34
Post: #3
RE: Mail traffic accounting
I've made some tests regarding mail traffic accounting and got some interesting results. From now on me@external.dmn is my main mail account, hosted outside the test server and mx.external.dmn it's primary MX. The server uses amavis as spam filter.

"after-awk" logs are the result of ispcp's awk processing command:
Code:
DEBUG: push_el() sub_name: sys_command('/bin/cat /var/log/ispcp/smtp_delivery.log | /usr/bin/awk '{if ( $10 != "?" && $6 !~ /virtual/ && !($5 ~ /localhost|127.0.0.1/ && $6 ~ /localhost|127.0.0.1/)) print substr($3, index($3, "@")+1, length($3)),substr($4, index($4, "@")+1, length($4)),$10;}' 1>/var/log/ispcp/smtp.log'), msg: Ending...

"final count" is the real current count for this test.

Test 1: user@dmn.tld is a domain mailbox
Code:
2008-06-17 23:23:47 me@external.dmn user@dmn.tld test.server.tld 127.0.0.1 SMTP - 1 778291
2008-06-17 23:23:47 me@external.dmn user@dmn.tld test.server.tld virtual SMTP - 1 778662
after-awk:
Code:
external.dmn dmn.tld 778291
Final count: 778291

Test 2: user@sub.dmn.tld is a subdomain mailbox
Code:
2008-06-17 23:23:58 me@external.dmn user@sub.dmn.tld test.server.tld 127.0.0.1 SMTP - 1 222773
2008-06-17 23:23:58 me@external.dmn user@sub.dmn.tld test.server.tld virtual SMTP - 1 223152
after-awk:
Code:
external.dmn sub.dmn.tld 222773
Final count: 222773

Test 3: user@als.tld is an alias mailbox
Code:
2008-06-17 23:24:09 me@external.dmn user@als.tld test.server.tld 127.0.0.1 SMTP - 1 225375
2008-06-17 23:24:09 me@external.dmn user@als.tld test.server.tld virtual SMTP - 1 225746
after-awk
Code:
external.dmn als.tld 225375
Final count: 0
Comment: als.tld traffic should be counted as dmn.tld traffic, but it isn't! We need to check the post-awk processing to account for this.

Test 4: redir@dmn.tld is a domain redirection to me@external.dmn
Code:
2008-06-17 23:24:20 me@external.dmn redir@dmn.tld test.server.tld 127.0.0.1 SMTP - 1 778299
2008-06-17 23:24:20 me@external.dmn me@external.dmn test.server.tld mx.external.dmn SMTP - 1 778670
after-awk
Code:
external.dmn dmn.tld 778299
external.dmn external.dmn 778670
Final count: 778299
Comment: Missing outgoing traffic (2nd line) because the logs don't show that the originating user is "redir@dmn.tld"

Test 5: redir@sub.dmn.tld is a subdomain redirection to me@external.dmn
Code:
2008-06-17 23:24:35 me@external.dmn redir@sub.dmn.tld test.server.tld 127.0.0.1 SMTP - 1 222781
2008-06-17 23:24:35 me@external.dmn me@external.dmn test.server.tld mx.external.dmn SMTP - 1 223152
after-awk
Code:
external.dmn sub.dmn.tld 222781
external.dmn external.dmn 223152
Final count: 222781
Comment: Missing outgoing traffic (2nd line) because the logs don't show that the originating user is "redir@sub.dmn.tld"

Test 6: redir@als.tld is an alias redirection to me@external.dmn
Code:
2008-06-17 23:24:44 me@external.dmn redir@als.tld test.server.tld 127.0.0.1 SMTP - 1 225383
2008-06-17 23:24:45 me@external.dmn me@external.dmn test.server.tld mx.external.dmn SMTP - 1 225754
after-awk:
Code:
external.dmn als.tld 225383
external.dmn external.dmn 225754
Final count: 0
Comment: Missing incoming traffic for the same reason as test 3 and outgoing traffic (2nd line) because the logs don't show that the originating user is "redir@als.tld"

Final comments
The missing alias counts should be an easy fix and the missing info on logs is much harder to track as it would involve fixing the awstats maillogconvert script.

PS to ispcomm: I'll rerun the tests tomorrow with a realyed domain (mx patch) to get some more results.
(This post was last modified: 06-18-2008 10:49 AM by kilburn.)
06-18-2008 10:40 AM
Visit this user's website Find all posts by this user Quote this message in a reply
ispcomm Offline
Junior Member
*

Posts: 93
Joined: Apr 2008
Reputation: 3
Post: #4
RE: Mail traffic accounting
Good job Kilburn!

A real fix for the alias counts would mean writing code to track a "mail session" from the arriving of external mail to its final delivery. By tracking I mean referencing each queue-id to the preceding (i.e. in postfix a mail arrives and is iterated trough the filters via lmtp or smtp and is threated as separate mail with different queue id). This is something maillogconvert was not designed to do (afaik).

A quicker fix would probably be a post-processing of awk output on lines that cannot be immediately referenced to specific domains. We have the means (mail_users table) to seek these lines and reference them to a domain and count them.

This would require some sql queries to be executed and could possibly slow down the log analyzis.

Perhaps a hashed in-memory table of known domain/alias pairs would speed up the process at the expense of memory consumption.

ispcomm
06-18-2008 05:33 PM
Find all posts by this user Quote this message in a reply
kilburn Offline
Development Team
*****
Dev Team

Posts: 2,182
Joined: Feb 2007
Reputation: 34
Post: #5
RE: Mail traffic accounting
After some brainstorming I've come up with this possible solution to the unaccounted forwarded messages, based on the fact that any incoming mail to a forwarded address will generate mail_size * number of forwarders bytes of outgoing mail:

1) build an in-memory table of (forwarded_address, n_recipients) at the beggning of traffic accounting.
2) For every incoming smtp check if the address is in the in-memory table. If it is, add size * n_recipients as traffic for this mail account.
3) Ignore any non-localdomain to non-localdomain delivery

Example with test case 4:
In-memory built table has a record (redir@dmn.tld, 1) because "redir@dmn.tld" is forwarded to 1 address.
Code:
2008-06-17 23:24:20 me@external.dmn redir@dmn.tld test.server.tld 127.0.0.1 SMTP - 1 778299
2008-06-17 23:24:20 me@external.dmn me@external.dmn test.server.tld mx.external.dmn SMTP - 1 778670
after-"modified awk script" (as after-awk we don't have the address anymore)
Code:
external.dmn dmn.tld 1556598
external.dmn external.dmn 778670 <- Ignored
Final count: 1556598 (778299 + 778299*1)
REAL count: 1556969 (778299 + 778670)
Comment: The missing bytes are those of the headers added by our mail server + mail filter.

This solution is far easier to implement than tracking queue-id's (have you ever tried to do this? there are times when logs are not in real processing order so you have references to not-yet-seen queue-id's and such...).

The overhead is also minimal, as there's only 1 extra query, a really short in-memory record for each forwarded address and a lookup for each incoming record.

Comments?
(This post was last modified: 06-18-2008 09:23 PM by kilburn.)
06-18-2008 09:22 PM
Visit this user's website Find all posts by this user Quote this message in a reply
ispcomm Offline
Junior Member
*

Posts: 93
Joined: Apr 2008
Reputation: 3
Post: #6
RE: Mail traffic accounting
Kilburn,

it could work the way you propose it. It won't be perfect as multiple deliveries to 1 external address could be combined (depending of ther options etc). However it's good enough for me.

As a different approach I could suggest to modify the awk script to retain the final destination as an extra field. That information can be referenced back to the ispcp domain and mail that generated the redirect with a hash-table. This could be also easy.

Quote:This solution is far easier to implement than tracking queue-id's (have you ever tried to do this? there are times when logs are not in real processing order so you have references to not-yet-seen queue-id's and such...).
Yes I have done this. I create the ID as soon as I see it and then fill the missing pieces when I see them in the logs. I keep incomplete sessions in memory and dump them to database when they're completed (or perhaps deferred). Whoever designed the logs didn't think too-much about processing them.

However my solution is a full-featured tracking application with searchable database and is too-much trouble to be integrated in ispcp.

For now I think your idea could be a quick fix and would work properly.

ispcomm.
06-18-2008 11:16 PM
Find all posts by this user Quote this message in a reply
kilburn Offline
Development Team
*****
Dev Team

Posts: 2,182
Joined: Feb 2007
Reputation: 34
Post: #7
RE: Mail traffic accounting
I tried to quick-fix trunk using my plan and (surprisingly for me) it turned out that the most difficult step is building the in-memory (address, n_forwarders) map. This is because "fowarded only" accounts do not have their "mail_addr" field filled so we must re-construct the full address using mail_users, subdomains and domain_aliasses table (you have to check the mail_type). Not a 1 query op anymore Sad
06-19-2008 12:57 AM
Visit this user's website Find all posts by this user Quote this message in a reply
ispcomm Offline
Junior Member
*

Posts: 93
Joined: Apr 2008
Reputation: 3
Post: #8
RE: Mail traffic accounting
It's probably still a single query, but not a simple one tough. Perhaph a few joins will do it?

ispcomm.
06-19-2008 03:30 AM
Find all posts by this user Quote this message in a reply
ispcomm Offline
Junior Member
*

Posts: 93
Joined: Apr 2008
Reputation: 3
Post: #9
RE: Mail traffic accounting
Kilburn,

try something like this:
Code:
SELECT concat( mail_acc, '@', domain_name ) AS mail_from, domain_name, mail_forward
FROM mail_users
LEFT JOIN domain
USING ( domain_id )
WHERE mail_type = 'normal_forward'
UNION
SELECT concat( mail_acc, '@', subdomain_name, '.', domain_name ) AS mail_from, domain_name, mail_forward
FROM mail_users
LEFT JOIN domain
USING ( domain_id )
LEFT JOIN subdomain
USING ( domain_id )
WHERE mail_type = 'subdom_forward'
UNION
SELECT concat( mail_acc, '@', alias_name ) AS mail_from, domain_name, mail_forward
FROM mail_users
LEFT JOIN domain
USING ( domain_id )
LEFT JOIN domain_aliasses
USING ( domain_id )
WHERE mail_type = 'alias_forward'
This will give you a table if recipients including subdomains, a reference main domain and a mail_forward field (That you need to parse for commas to find out how many recipients are there).

Hope that helps.

ispcomm

EDIT: Typo in the sql query. Fixed.
EDIT2: Full query including domain aliases.
(This post was last modified: 06-19-2008 05:04 AM by ispcomm.)
06-19-2008 04:46 AM
Find all posts by this user Quote this message in a reply
kilburn Offline
Development Team
*****
Dev Team

Posts: 2,182
Joined: Feb 2007
Reputation: 34
Post: #10
RE: Mail traffic accounting
I had to slightly edit your query as a mail address can be both mailbox and forwarded at the same time so it's mail_type is ie 'normal_mail,normal_forward':

Code:
SELECT concat( mail_acc, '@', domain_name ) AS mail_from, domain_name, mail_forward
FROM mail_users
LEFT JOIN domain
USING ( domain_id )
WHERE mail_type LIKE '%normal_forward'
UNION
SELECT concat( mail_acc, '@', subdomain_name, '.', domain_name ) AS mail_from, domain_name, mail_forward
FROM mail_users
LEFT JOIN domain
USING ( domain_id )
LEFT JOIN subdomain
USING ( domain_id )
WHERE mail_type LIKE '%subdom_forward'
UNION
SELECT concat( mail_acc, '@', alias_name ) AS mail_from, domain_name, mail_forward
FROM mail_users
LEFT JOIN domain
USING ( domain_id )
LEFT JOIN domain_aliasses
USING ( domain_id )
WHERE mail_type LIKE '%alias_forward'

Working on the fix right now
06-19-2008 06:25 AM
Visit this user's website Find all posts by this user Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 2 Guest(s)