Recently I came across an issue where I wanted to generate tag cloud of given keywords fetched from a database.
I was searching for an algorithm that would allow me to push in as many tags as possible in the given area. In this case it is a table cell.
In tag cloud the input data varies as some tags has more characters and some tags are consisted of less characters. To make it more messy the font sizes also varies depending on the weight of the tag whether based on popularity or any other ratio. So the width occupied by characters in the words will vary as well as the height occupied in that line will also vary.
To add little more to it the text in tag cloud will also word wrap itself. So number of lines occupied will also vary.
After 2 days of thinking and coding along with testing and trashing I finally settled down with an idea to fit in small size 2D boxes into a given 2D rectangle.
Like shown below in the diagram:
What I am doing with the code in PHP is that I am using a loop to step through the given height [That is top to bottom] and then running another loop into it for the given width [That is left to right]
Here is the code which is doing the trick right now.
<?php
error_reporting(E_PARSE);
$keyword[] = “Web Development”;
$keyword[] = “Programming”;
$keyword[] = “Information”;
$keyword[] = “Webmaster Blog”;
$keyword[] = “Blogging”;
$keyword[] = “Search engines”;
$keyword[] = “CMS”;
$keyword[] = “Websites”;
$keyword[] = “Java”;
$keyword[] = “PHP”;
$keyword[] = “Domain Name”;
$keyword[] = “Web Hosting”;
$keyword[] = “Control Panel”;
$keyword[] = “Web Servers”;
$keyword[] = “Shell access”;
$keyword[] = “Internet”;
$keyword[] = “Web Info”;
$keyword[] = “Designing”;
$keyword[] = “Marketing”;
$keyword[] = “Optimization”;
$keyword[] = “Algos”;
$keyword[] = “Keywords”;
$keyword[] = “Research”;
$keyword[] = “Dynamic pages”;
$keyword[] = “Banners”;
echo “<table cellpadding=’0′ cellspacing=’0′ border=’1′ width=’120′ height=’400′>
<tr><td valign=’top’>”;
$cur = 0; //Keyword counter
$height = 400; //Height
$width = 120; //Width
$w = 0; //Width counter
$h = 0; //Height counter
$hmax = 0; //Max height for given line.
$i = 1;
while($h <= $height)
{
$hmax = 0;
while($w <= $width)
{
$data = $keyword[$cur];
$rnd = rand(12,18);
$thislength = strlen($data);
$thisw = getsize($rnd, $thislength, 0);
$thish = getsize($rnd, $thislength, 1);
echo “<font style=”font-size:$rnd”;
echo “px”>$data</font> “;
$w = $w + $thisw; //Updates the current width.
if($thish >= $hmax)
{
$hmax = $thish;
}
$cur = $cur + 1;
}//while of w
$h = $h + $hmax; //Updates the current height.
$w = 0;
}//while of h
echo “</td></tr></table>”;
function getsize($size, $nums, $h)
{
switch($size)
{
case 10:
$ans = 4.5 * $nums;
$ansh = 16;
break;
case 11:
$ans = 5.2 * $nums;
$ansh = 18;
break;
case 12:
$ans = 5.5 * $nums;
$ansh = 20;
break;
case 13:
$ans = 6.4 * $nums;
$ansh = 20;
break;
case 14:
$ans = 6.7 * $nums;
$ansh = 21;
break;
case 15:
$ans = 6.9 * $nums;
$ansh = 22;
break;
case 16:
$ans = 7 * $nums;
$ansh = 25;
break;
case 17:
$ans = 7.8 * $nums;
$ansh = 25;
break;
case 18:
$ans = 8.2 * $nums;
$ansh = 25;
break;
case 19:
$ans = 8.7 * $nums;
$ansh = 26;
break;
case 20:
$ans = 9 * $nums;
$ansh = 27;
break;
}//switch
if($h == 1)
{
return $ansh;
}
else
{
return $ans;
}
}//Function
?>
Have used randomize function to generate the different fontsizes for this example code but it can be fed from the database depending on its popularity or any other ratio and it will still function same way. I have also included the function here to get the size occupied by the tag but the data fetching could be done via a database. If you are using some other fonts then the results might vary .
It can still be polished up for wordwrap issues and Greedy algorithm can also be applied which will try and minimize the whitespace as much as possible. Right now I have got what I needed so will be finishing the rest of the work and then maybe someday come back on this and polish it more to have its own self learning algo which will take a new approach to the problem and give the functionality to count the fontsizes based on given tags along with width and height where it needs to fit in. It will make it almost 99% accurate.