The analytalks project is a Wikimedia Tools Labs project for generating a real-time sentiment network of editor conversations for visual spam detection.

process flow

  1. Extract recently updated user talk pages from the English Wikipedia's MySQL DB
  2. Run a sentiment classifier on the most recent page diff of above pages
    The sentiment classifier was trained on the rotten tomatoes movie review dataset
  3. Create a network of the user conversations and store in json format with metadata
  4. Visualise using a force-directed graph created with d3.js
The code is written in Python. You can clone the source from here. Current attempts at creating a data visualisation can be seen here.

what's next ?


If you would like to help out with the project please get access to Tools Labs and request to be a member of the analytalks project by getting in touch with Deba