Datasets

Web communities

Theses datasets were extracted using an algorithm for community extraction. Each dataset is given in tar.gz. The archive contains html files of the sites that were found by the algorithm, an opml file listing all the sites, a gdf file which supplies the graph structure of the community and a csv file (comma separated) that gives the same information in forms of an adjacency matrix. The following communities were extracted (in 10/2009) :

Some statistics on these datasets are summarized in the next table. Community size give the number of noded of the network and alpha the edges density in the network. Other informations concern the used algorithm.

Comics (fr) Scrapbooking (fr) Food (us) Politics (us)
Nb seed 100 100 50 50
Community size 1 263 1 130 1 681 1 884
Graph size 4 1435 20 611 55 061 105 197
Fetched pages 2 177 1 739 2 813 3 601
Level max 3 2 5 4
alpha 0.01821 0.01899 0.03560 0.02091
beta 0.00093 0.00147 0.00091 0.00065
gamma 0.03048 0.05579 0.03060 0.01808
Global statistics and results for the 5 communities studied.

Other networks datasets ressources

Publicités

Laisser un commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l'aide de votre compte WordPress.com. Déconnexion / Changer )

Image Twitter

Vous commentez à l'aide de votre compte Twitter. Déconnexion / Changer )

Photo Facebook

Vous commentez à l'aide de votre compte Facebook. Déconnexion / Changer )

Photo Google+

Vous commentez à l'aide de votre compte Google+. Déconnexion / Changer )

Connexion à %s