Wikipedia Vandalism Detection

We identify the cases of vandalism on Wikipedia articles by classifying edits as either regular or vandalism. This is clearly a Binary Classification task, but because of the skewed distribution of regular and vandalism cases in the dataset, we will also explore if this can be modeled as an Anomaly Detection problem.

We use the corpus of vandalism cases found on Wikipedia which is available [here] ( The corpus consists of 32452 edits on 28468 Wikipedia articles among which 2391 edits have been labeled as vandalism, while the rest are labeled as regular.

Animesh Sinha
