perl: October 2006 Archives

Perl warnings and IIS6

| | Comments (0)
Just did a quick inexaustive test and it looks like IIS6 throws warnings straight out to the browser. I tested with the aid of Fiddler
(a very handy tool for seeing the headers and raw returns when viewing pages in IE) to see what was being returned. I ran the following test script first:
use strict;

warn "This is dumb!";

print "Content-type: text/html\n\n";
print "This is smart!";
That resulted in the following response from the server:
HTTP/1.1 200 OK
Date: Thu, 19 Oct 2006 21:05:40 GMT
Content-Type: text/html
Proxy-Connection: close
Server: Microsoft-IIS/6.0
This is dumb! at C: \Inetpub\cgi-bin\test.pl line 3.
X-Powered-By: ASP.NET

This is smart!

Notice the warning has been thrown out in the headers? Next I overrode warn to do nothing and ran the modified script:

use strict;

BEGIN {
*CORE::GLOBAL::warn = sub { # do nothing };
}

warn "This is dumb!";

print "Content-type: text/html\n\n";

print "This is smart!";

This return what we expect:


HTTP/1.1 200 OK
Date: Thu, 19 Oct 2006 21:12:41 GMT
Content-Type: text/html
Proxy-Connection: close
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET

This is smart!

Next I need to figure out the best way to handle this curious behaviour. I could opt for one of the CPAN modules that pushes warnings and carps into the Event Log, but I don't like that idea. I like the idea of a plain text error_log just like mamma Apache used to make 'em. Hmm...

Thanks to Nik for the talk about overloading and overriding CORE functions, btw!

HTML::Tidy - some observations

| | Comments (0) | TrackBacks (0)

This is not a complaint, but merely some observations from my exciting evening of playing with HTML::Tidy last night. I'll be sending Andy Lesiter a slightly more sane write up when I have a moment:

1. Passing it a config file

The documentation doesn't make this clear, but the config file you pass is one for HTML Tody (libtidy as opposed to HTML::Tidy). Have a look at the HTML Tidy quick ref documentation on Sourceforge for what these look like and what you can have in there.

2. Quiet in the config file

Setting quiet = yes in the config file reduces the amount of errors being thrown back from libtidy a lot. This is handy if you are trying to use the clean method to format your messy HTML into something clean and, erm, tidy.

3. use warnings in Tidy.pm

This does result in warnings being thrown about if you pass an undefined (or no) argument to the clean method. I hacked my version of Tidy.pm to test and this worked fine.

4. The error parsing is out of date

Lots of the error messages returned by libtidy have changed syntax slightly since the latest release of HTML::Tidy. This means messages that HTML::Tidy intends to ignore are being warned out unneccessarily.

5. clean method returns cleaned up X/HTML

This is not in the documents, but if clean works it returns your X/HTML nicely tidied up. This is the most useful feature of HTML::Tidy!

6. HTML Tidy (libtidy) has limited DTD support

Appears to only support HTML 4.01 and XHTML 1.0. You can tell HMTL::Tidy which you want clean to produce in the config file you supply.

7. clear_messages is good

Save memory when processing lots of X/HTML by making sure you call clear_messages after each document. Messages in HTML::Tidy are relatively large objects that pile up quite quickly.

I processed 14,000 HTML files of various legibiility with only one or two warnings thrown out when I used the config_file with quiet = yes and by hacking out use warnings in Tidy.pm. I do not recommend hacking Tidy.pm. Instead, make sure when you call clean that you pass it a defined variable - do the checking in your script rather than messing with Tidy.pm. Hopefully when I do give Andy something legible and useful to use, HTML::Tidy will be updated properly.

About this Archive

This page is a archive of entries in the perl category from October 2006.

perl: September 2006 is the previous archive.

perl: December 2006 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.21-en