Skip to content

Build from Source with Maven

pablomendes edited this page Apr 27, 2012 · 9 revisions

Table of Contents

Running DBpedia Spotlight Server with Maven2

Install pre-requisites:

  sudo apt-get install subversion maven2

If you also want to run the demo in your server, install Apache

  sudo apt-get install apache2

Checkout the release you would like to use. We recommend using a stable release from /tags subdirectory (e.g. 0.5):

  svn co https://spotlight.svn.sourceforge.net/svnroot/dbp-spotlight/tags/release-0.5 dbpedia-spotlight-0.5

...or get the latest from trunk (which contains daily updates and often bugs). Use at your own risk!

  svn co https://spotlight.svn.sourceforge.net/svnroot/dbp-spotlight/trunk dbpedia-spotlight-latest

Edit the file ??dbpedia-spotlight-latest/pom.xml?? and leave only the modules core, rest and demo.

Run install through Maven

  cd dbpedia-spotlight-*
  mvn install

This mvn install from the parent pom.xml is important because it runs install-file for some jars distributed alongside the source code.

After installing the software, in order to run a Web service in your machine, also need the disambiguation index and the spotter lexicon, change the ??conf/server.properties?? file to point to those files, and run ??mvn scala:run '-DaddArgs=../conf/server.properties'?? from the rest directory. Get the necessary files. See http://spotlight.dbpedia.org/download/ Depending on the files you choose (small, medium, large) you will need different RAM requirements. With the largest dictionary, you will need close to 16GB of RAM. This parameter can be configured within pom.xml inside the rest directory.

  mvn scala:run '-DaddArgs=../conf/server.properties'

Generating DBpedia Spotlight Data with Maven2

Get DBpedia Extraction

  hg clone http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework
  cd extraction_framework
  mvn install

Edit the file ??dbpedia-spotlight-latest/pom.xml?? and leave only the modules core, index, rest and demo.

Run install through Maven



Follow instructions in ??dbpedia-spotlight-*/bin/index.sh?? See also the Data Generation Manual to learn more about the steps to create your own datasets for DBpedia Spotlight.

FAQ

Some frequently observed errors are collected below.

Whatever build error you get, check maven version

if you experience problems with missing dependencies while doing mvn install on the project, you might want to check your installed version of Maven. So far the project seems to only work with Maven 2.2.1. Downgrading from 3.0.3 to 2.2.1 solved all my dependency issues.

Cannot find (maven) model file

Error:

  org.apache.maven.reactor.MavenExecutionException: Could not find the model file '/usr/local/spotlight/trunk/jung'. for project unknown

Solution: The only required modules for running the web service are: core, rest and demo (if you want the HTML interface as well). If you do not need to index, you can remove every other module from the parent pom.xml The only required modules for running indexing are: core and index. You can remove the other modules from the parent pom.xml

Memory error

Error:

  Memory error, heap space

You may need to update your pom.xml with adequate heap space for the dictionary file you are using.




How much memory?

The memory requirements are directly tied to your target lexicon, as our most rudimentary implementation loads the entire lexicon into memory in order to speed up spotting.

You can build a dictionary of People, Locations and Organizations with about 200M of RAM. See the one that I included in the distribution, for example. http://dbp-spotlight.svn.sourceforge.net/viewvc/dbp-spotlight/tags/release-0.5/dist/src/deb/control/data/usr/share/dbpedia-spotlight/spotter.dict?view=log

You can also download the dictionary built from URIs that occurred more than 75 times in Wikipedia: http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary.gz

This should load with a lot less (maybe 5x) less RAM than the one we use in production. And it will spot the most important things anyways.

See: http://sourceforge.net/mailarchive/message.php?msg_id=28255247

Could not resolve dependencies

For some dependencies that either did not have a maven repo or that we had to patch, we distribute the jars alongside our code, and install them via install-file in the parent pom.xml. Make sure you run ??mvn install?? from the parent directory (e.g. /home/user/workspace/dbpedia-spotlight-0.5/)

Error:

  (Failed to execute goal on project core: Could not resolve dependencies for project org.dbpedia.spotlight:core:jar:0.5)  dependencies are missing for:
  org.semanticweb.yars:nx-parser:jar:1.1
  com.aliasi:lingpipe:jar:4.0.0
  edu.umd:cloud9:jar:SNAPSHOT
  weka:weka:jar:3.7.3

Solution:

  cd /home/user/workspace/dbpedia-spotlight-0.5/
  mvn install
Problems with installation of rest/ jersey dependencies

problems with jersey dependencies: " This is due to the glassfish repository, which is hardcoded in the jerser-server-1.1.5.pom, returning a junk artifact (some HTML with a nginx message instead of a real pom).

You can work around this by adding this to the "mirrors" section of your settings.xml:

    <mirror>
        <id>glassfish-mirror</id>
        <name>glassfish mirror</name>
        <url>http://maven.nuxeo.org/nexus/content/repositories/public-releases</url>
        <mirrorOf>glassfish-repository</mirrorof>
    </mirror>

and removing all "com.sun.jersey" artifacts from your local repository (rm -rf ~/.m2/repository/com/sun/jersey) " (http://answers.nuxeo.com/questions/2195/cant-build-nuxeo-source-nuxeo-webengine-jax-rs-jersey-server-error)

Cannot find parent

If this problem occurs when installing dbpedia spotlight, try running (in root folder of the project):

 1) mvn --non-recursive clean install
 2) mvn clean install