- Don't try to tidy up weirdly tagged HTML with regular expressions: HTML::TreeBuilder is your friend. (But remember the implications if you use $node->replace_content.)
- Don't try to replace HTML entities (á and so on) with regular expressions: HTML::Entities is your friend.
- Remember that Encode::encode('utf8', $string) does not turn $string into UTF-8 Unicode; it gives you an octet string, which after all is what HTML::Message wants. You do not want Encode::decode here.
- If you have multiple parameters of the same name to send to a URL with POST, call HTML::Request::Common::POST with "Content => [p => 1, p => 2] ". Do not call it with "Content => {p => [1,2]}, for this subroutine regards a list reference as specifying a file to load. You will scratch your head wondering why your script is trying and failing to open a file named "1", and it will take you a moment to check back with the documentation and figure it out.
Thursday, February 2, 2012
CPAN is Your Friend
The past couple of weeks have offered some lessons in the use of Perl to manipulate HTML and use HTTP.
Labels:
tech
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment