Navigation Menu

Skip to content

cheetah90/WikiSubarticle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WikiSubarticle

This is the repo for CSCW2017 paper Lin, Y., Yu, B., Hall, A., and Hecht, B. (2017) Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia. Proceedings of the 20th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2017). New York: ACM Press.

The repo contains three parts:

  1. The Java program WikiSubarticle in ./wikibrain_w_subarticle_plugin/
  2. The Python Flask program that serves the trained Subarticle classifiers in ./flask_classifiers/
  3. The groud truth ratings of subarticle candidates in ./gold_standard_datasets/ that allow for training your own subarticle classifiers

Requirements

Java >= 1.7
Maven >= 2
Postgres >= 8.6
Python >= 3.5
Flask >= 0.12

Instructions

Step 1 - Set up WikiSubarticle

The Java program WikiSubarticle leverages WikiBrain to provide technical infrastucture to access Wikipedia content. Please follow the instructions on WikiBrain to set up this part.

Note: Currently, WikiSubarticle requires the training the MilneWitten Semantic Relatedness module of WikiBrain. Please refer to this page for details of how to train the module

Quick summary of the essential steps (explanations could be found in the above links)

  1. mvn generate-sources
  2. mvn -f wikibrain-utils/pom.xml clean compile exec:java -Dexec.mainClass=org.wikibrain.utils.ResourceInstaller
  3. screen -S subarticle_ingestions
  4. export JAVA_OPTS="-d64 -Xmx128000M -server"
  5. ./wb-java.sh org.wikibrain.Loader -l en,sv,de,nl,fr,ru,it,es,pl,vi,ja,pt,zh,uk,ca,fa,no,ar,fi,hu,id,ro,cs,ko,sr,simple -s fetchlinks -s download -s dumploader -s redirects -s wikitext -s lucene -s phrases -s concepts -s universal -s wikidata -s spatial -c customized.conf

Step 2 - Set up Python Flask

From ./flask_classifiers/, run python classifiers_server.py. Doing so will serve the trained subarticle classifiers through Flask so you don't need to train your own model

Step 3 - Run the Subarticle Classifier

wb-java.sh org.wikibrain.cookbook.core.SubarticleClassifier [main article title] [lang_code] [type of dataset] [rating options] -c [configuration]

Specifications of the parameters:
[main article title]: the program will find the subarticles of this Wikipedia article
[lang_code]: three options "en" "es" "zh"
[type of dataset]: one option "popular" (currently) [rating options]: two options "2" "2.5" "3"

The meanings for each parameter could be seen in the paper.

About

The repo for CSCW2017 paper "Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages