<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to bugs</title><link>https://sourceforge.net/p/php-crawler/bugs/</link><description>Recent changes to bugs</description><atom:link href="https://sourceforge.net/p/php-crawler/bugs/feed.rss" rel="self"/><language>en</language><lastBuildDate>Mon, 27 Jul 2009 21:39:59 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/php-crawler/bugs/feed.rss" rel="self" type="application/rss+xml"/><item><title>Problem with preg_match_all in phpcrawlerutils.class.php</title><link>https://sourceforge.net/p/php-crawler/bugs/3/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;In file: phpcrawlerutils.class.php 201&lt;br /&gt;
if there is strlen($source) &amp;gt;= 400000  preg_match_all produces a segmentation fault.&lt;/p&gt;
&lt;p&gt;I solved the problem adding this:&lt;/p&gt;
&lt;p&gt;if ( strlen($source) &amp;gt; 50000 ) {&lt;br /&gt;
$source = substr($source, 0, 50000);&lt;br /&gt;
}&lt;br /&gt;
preg_match_all("/&amp;lt;[ ]{0,}a[ \n\r][^&amp;lt;&amp;gt;]{0,}(?&amp;lt;= |\n|\r)(?:".$match_part.")[ \n\r]{0,}=[ \n\r]{0,}[\"|']{0,1}([^\"'&amp;gt;&amp;lt; ]{0,})[^&amp;lt;&amp;gt;]{0,}&amp;gt;((?:(?!&amp;lt;[ \n\r]*\/a[ \n\r]*&amp;gt;).)*)&amp;lt;[ \n\r]*\/a[ \n\r]*&amp;gt;/ is", $source, $regs);&lt;/p&gt;
&lt;p&gt;You can contact me at: &amp;lt;a href="http://www.informaticaautonomos.com"&amp;gt;http://www.informaticaautonomos.com&amp;lt;/a&amp;gt;&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Mon, 27 Jul 2009 21:39:59 -0000</pubDate><guid>https://sourceforge.netc5a0c3c04e5129e543d1d9302821b64b55d47415</guid></item><item><title>Base HREF not being considered and incorrect links result</title><link>https://sourceforge.net/p/php-crawler/bugs/2/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;If a document that contains a base href such as &amp;lt;base href="http://mysite.com/test/" /&amp;gt;&lt;/p&gt;
&lt;p&gt;With links such as &amp;lt;a href="test/test-things"&amp;gt;Things to test&amp;lt;/a&amp;gt;&lt;/p&gt;
&lt;p&gt;The makeFullQualifiedURL function does not build out the URL correctly.&lt;/p&gt;
&lt;p&gt;Refer to modified files for fix.&lt;/p&gt;
&lt;p&gt;I added made the main program loop lookup the base href and in the makeFullQualifiedURL it detects if the baseHref exists and builds the URL as required.&lt;/p&gt;
&lt;p&gt;The crawler now traverses a site with base hrefs :)&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">zoltak</dc:creator><pubDate>Tue, 31 Jul 2007 07:10:09 -0000</pubDate><guid>https://sourceforge.net420233ac61eab392c0aa2b8637c5803b63c38bc4</guid></item></channel></rss>