Finding how many words are the same and duplicate content

I was trying to find how to get all the articles words then counting it by words, if it's tags it's obvious but what if I want to get the most words within all the articles combined using php and mysql using codeigniter as php framework.


public function find_words()
{
$query = $this->db->query('select title from articles');

$result = $query->result_array();
if (!empty($result)){
$wordsup = '';
foreach($result as $k => $v){
$wordsup .= $v['title'].' ';
}
}
$words = $this->utf8_str_word_count($wordsup,1);
$words = array_count_values($words);
arsort($words);
echo "
print_r($words);
exit();
}

function utf8_str_word_count($string, $format = 0, $charlist = null)
{
$result = array();

if (preg_match_all('~[p{L}p{Mn}p{Pd}'x{2019}' . preg_quote($charlist, '~') . ']+~u', $string, $result) > 0)
{
if (array_key_exists(0, $result) === true)
{
$result = $result[0];
}
}

if ($format == 0)
{
$result = count($result);
}

return $result;
}

or use this to find out duplicates articles by id

SELECT *
FROM 2009_product_catalog
WHERE sku IN (
SELECT sku
FROM 2009_product_catalog
GROUP BY sku
HAVING count(sku) > 1
)
ORDER BY sku

Subscribe to You Live What You Learn

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe