My views on the Drupal Statistics module

Back in January I opened an issue "Remove Statistics module from D8 core", this was around the same time I was working on Project Verity. We wanted a page where the content creator can go and see site statistics, latests content, latest comments etc. It was soon found that relying on the statistics module in Drupal Core was not the best way forward for this so we started working with the Google Analytics API, although this is another story.

This all sparked my interest in the statistics module, what it should do and how it should work. Last month another issue about the statistics module in core popped up "Improve statistics module for D8?". Last week within DrupalCon CHX wrote a blog post about "The Drupal 8 action plan" which featured the statistics module as one of the modules in core that he would like to see "the imminent death of".

I agree with CHX, I would like to see "the imminent death of" the statistics module too, although I feel there needs to be something for statistics in Drupal Core. Therefore I also agree with the "Improve statistics module for D8?" issue thread too.

Here it what I would like to see:-

  • A core module for tracking many different aspects of a page request.
  • Javascript based interaction to work with varnish and similar caching.
  • Able to interact with local or remote data sources
  • Rules, settings and attributes to allow control of statistics gathering
  • Views integration to display statistics

My personal next step would be to remove the statistics module and start an initiative to rewrite the module from the ground up.

The first steps for the new module would be to define a new hook, maybe hook_statistics. This hook would be called at every page load and passed details / statistics about the page being loaded and machine loading it via Javascript / AJAX. Many modules could then use hook_statistics to save this data to the local database or any datasource.

I would love to hear some thoughts on this, and will therefore cross post in the issue queue.

Comments

I think any effort to make a statistics module that makes write-queries to the same MySQL server that the Drupal site runs on is doomed. It simply won’t scale for larger sites, and it will be a performance burden even for smaller sites.

I am working on a statistics package for a Drupal-site, but that is a simple Node.js app that reads and writes from a PostgreSQL database, which is separate from the MySQL instance Drupal runs on.

If you want to have statistics logging from within Drupal code (ie. serverside), I think the nicest way to accomplish this would be using something like http://www.zeromq.org/ or simple UDP packets sent from Drupal to a separate logging server.

The performance tax is just too great to do it inside Drupal (and you'll have all sorts of nasty caching issues, which is why the stats-package I'm building is JavaScript-powered).

Its not just the *recording* of stats that can cause issues. If your site has any meaningful traffic, you will start generating data at an immense rate.

For example, my blog (http://www.thingy-ma-jig.co.uk/) had around 17,600 page hits in July and has already exceeded that for August. So in 2 months I have over 35,000 page hits. For my server to do the level of statistical analysis across 35,000 rows of data (ranging from hit time, IP, User Agent, Referrer and even some "aggregated" fields/computed fields) would take significant resource. And I have a small blog. What if your site had my monthly traffic in a day? In an hour?

IMHO, recording the data is the easy part - it's getting any kind of meaningful analysis FROM that data which causes difficulty and - for me - is the main reason I don't use the statistics module.

The other thing to bear in mind is "Why have statistics module"? Why do we need it when there are dozens of free 3rd party alternatives out there which do a better job and solve the recording and analysing problems for us? Why re-invent the wheel? If people have privacy issues surrounding third party tracking, maybe that could be better solved by a community contrib module? I would imagine the group of people that CANT use 3rd party tracking is a minor/edge case to the point where Core shouldn't be addressing it?

So, I think that we agree that overall sign analytics should be handled by a third party service.

But what about if you wanted to create a view of most popular content. We need some way to be able to do this without having to hook into the third party service via an API. It also needs to be JavaScript based to bypass caching.

The other thing is, does this really need to be in core?
I doubt it.

I will try to put in my sales pitch for Radioactivity module again :-) (I am not the maintainer).

Radioactivity supports node view counts, node comment counts, node comments and any entities in Drupal 7, it can write to memcache thus be scalable, it can work with Boost, Varnish using AJAX or any other static caching. Frankly, the module is really awesome. Look into it, I believe it can be a drop in replacement for Statistics and with a small UX effort can have an easy UI as well.

I agree - Radioactivity is a fantastic module. Very well made and a great idea. Thoughts:
1) It's not really "statistics" in the traditional "logging" sense.
2) As awesome as it is - I don't think it should be "core". If it does get in core then some of the stuff that got removed should be re-evaluated (IMHO).

The issue with stats is that everyone wants/needs something different. Some people may want on-site integration - others may not need it. By the time you have decided on what all these use-cases are and have built the spec for such a module, you'll essentially need a Google Analytics clone as a module. It would be a BIG module to cover all use cases. If its that big, should it be in core? Would it benefit from a contrib-style release cycle and development approach?

On the flip side, if it's small and simple enough to be maintainable in core - does it have enough common-use to make it core-worthy? One of the things about stats is that once you switch systems, you lose history for comparative purposes. Therefore a simple in-core solution would eventually lead you to needing a better service in the future, thus disconnecting your stats from historical data.

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <p> <h1> <h2> <h3> <h4> <blockquote>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.