Use Google Analytics Instead of the Statistics Module

I recently created a module that uses the Google Analytics API to capture the top ten nodes of various content types by day, week, and all time. This is a great option for any site that needs to use caching, and can’t use the Statistics module.
The module depends on the google_analytics_api module, which makes the job of capturing all the data extremely easy with the google_analytics_api_report_data() function. Here is some easy example code for building a report:
<?php
if (!$start_date) {
$start_date = date('Y-m-d');
}
if (!$end_date) {
$end_date = date('Y-m-d'); // H:i:s // can't include time... if before noon, include previous day
}
$dimensions = array('pagePath');
$metrics = array('visits');
$sort_metric = array('-visits');
$filter = 'pagePath =@ /blog/ || pagePath =@ /article/';
$start_index = 1;
$max_results = 20;
// Construct request array.
$request = array(
'#dimensions' => $dimensions,
'#metrics' => $metrics,
'#sort_metric' => $sort_metric,
'#filter' => $filter,
'#start_date' => $start_date,
'#end_date' => $end_date,
'#start_index' => $start_index,
'#max_results' => $max_results,
);
try {
$entries = google_analytics_api_report_data($request);
}
catch (Exception $e) {
return $e->getMessage();
}
?>By default, today’s date is used for both the start and end date, to give today’s top content. GA requires both a start and end date, so to get all-time results, you will need to set the start date to the date you first started using GA with your site.
To get the top content, sorted by most popular to least popular, the dimensions variable needs to be set to “pagePath,” with a “visits” metric (for unique page views). or a "pageviews" metric (for all views). The sort_metric variable is set to “-visits” (or "-pageviews") to sort from most visits to least (note the “-” prefix, which tells Google Analytics to sort our results in reverse order).
Since I want to grab blogs and articles only, I have set the filter to match only paths that contain “/blog/” or “/article/”. Unfortunately, this is the only way to filter your node types, so it’s a good idea to use pathauto to ensure all node types have a specific path, and write some code that prevents any other node types from having the path you are targeting.
In my case, there were also specific CCK fields I needed to use in order to filter out additional nodes. If you know that this is going to happen ahead of time, you can always inject something in the path for nodes that have the CCK fields you would like to filter out, and filter them out when retrieving the report. Otherwise, you will have to do what I did, which was to retrieve more results than are needed in the final report (note that $max_results is set to 20, even though this will eventually be a top ten list), and filter the out the excess with a database query, then unset the remaining excess.
One other catch with using Google Analytics in place of Statistics is that it does not work well with cron. You can get it to run through cron when running cron.php manually, but I couldn't find a way to get it to work automatically, even using various spoofing methods. The method will finish without errors, but GA will not return any data.
Cache variables can save the day here! We can modify the code above with the following:
<?php
if ($cache = cache_get('ga_stats', 'cache_content')) {
$stats = $cache->data;
}
else {
//GA code from above goes here
if (!empty($entries)) {
foreach ($entries as $entry) {
$metrics = $entry->getMetrics();
$stats['visits'] = $metrics['visits'];
//grab any other data you want here
}
}
if (!empty($stats)) {
cache_set('ga_stats', $stats, 'cache_content', CACHE_TEMPORARY);
}
}
?>Just replace ga_stats with the name you want for your variable above. In fact, you can create variables for multiple individual pages as well, if you really want to study all the stats for specific pages. You may also want to replace cache_content with a different cache object, such as a custom one created in your own module.
This is only the beginning of what you can do with Google Analytics. If you plan your pages and URLs well, you can capture almost any data you want, even link clicks and page exits. The google_analytics_api module provides plenty of options, and the report API itself offers a plethora of options.
Here is the main developer page to learn about your report options:
http://code.google.com/apis/analytics/docs/gdata/gdataDeveloperGuide.html
I also found this page really handy:
http://code.google.com/apis/analytics/docs/gdata/gdataReferenceDataFeed....
Pay special attention to the filters section.
And here is a link to the google_analytics_api module:
http://drupal.org/project/google_analytics_api



Comments
The other way to do this is with an image who's src is a PHP file on your server and that PHP file would just make a quick insert into the statistics table. But that still requires PHP/MySQL loaded on every page view.
With your method that's all completely bypassed. Real elegant, but it would be a bit tricker if you need to do more complex views based on "most popular".
This is great! I was just needing exactly this yesterday for a site in that I need to count content impressions and external cache with pressflow and I thought a that a solution would be using google analytics api! , I will try your solution indeed.
Thanks!!!
Google Analytics is cool, but I'm more and more impressed with Piwik. It's not as advanced as GA yet but your data stays in your own control and it's great for sites where you can't use Analytics, because of the nature of the site or the wishes of the customer. Haven't checked Piwik's API yet but the good thing is that you can probably scratch yourself if you have some sensible itches.
http://piwik.org/
I've been toying with this myself. Excellent work!
This is going to be vitally important as more and more sites start using Varnish and CDNs to serve anonymous pages. Drupal can't and shouldn't track those hits, but Google can certainly do so. Integrating that data back into the Drupal system is a major win.
Very very cool implementation, fellas. I've been thinking (& doodling) about doing something similar but y'all beat me to the punch.
Thanks for sharing.
ps. didn't cy'all @LADrupal the other day but I *almost* drove up to the last DUG that you host...cya soon.
I'm not a fella! :)