Matthias Budde, Julio De Melo Borges, Stefan Tomov, Till Riedel, Michael Beigl
3rd International Workshop on Urban Computing (UrbComp 2014) in conjunction with the 20th ACM SIGKDD 2014
The graph dataset is the result of the data modeling presented in the paper. The edges contains two attributes:
- distance: the spatial distance between two vertexes (reports) in meters
- timedeltaDays: the temporal distance between two vertexes (reports) in days
The maximal value for the spatial distance is 1km and the maximal value for the temporal distance is 21 days. The user can then filter the edges based on the these parameters. The graph contains 34690 vertexes.
1) Graph 1 (17.2 MB): Data modeled without any constraints (1140387 edges).
2) Graph 2 (6.3 MB): Data modeled constraining the neighbourhood search among reports of the same category (389210 edges).
3) CSV (3.1 MB): Dataset as CSV
1) Clustering results of the SCF Dataset using the community structure detection based on the leading eigenvector of the community matrix [1] with parameters 90m and 7 days on graph 2.
7) Description of attributes used in the supervised Random Forest
[1] MEJ Newman: Finding community structure using the eigenvectors of matrices, Physical Review E 74 036104, 2006.
[2] M Newman and M Girvan: Finding and evaluating community structure in networks, Physical Review E 69, 026113 (2004)
[4] seeclickfix.com - (dataset under CC Attribution-Noncommercial-Share Alike 3.0)