Segment text using SpaCy

This project segments text sent to it into both sentences and verbal phrases. For now only German is supported! We primarily aim to provide a simple way of splitting text into verbal phrases as proposed in Vauth et al (2021). In addition, we also provide a way of splitting the text into sentences.

If you are using this in your academic work please cite our paper:

@inproceedings{vauthAutomatedEventAnnotation2021,
  title = {Automated {{Event Annotation}} in {{Literary Texts}}},
  booktitle = {{{CHR}} 2021: {{Computational Humanities Research Conference}}},
  author = {Vauth, Michael and Hatzel, Hans Ole and Gius, Evelyn and Biemann, Chris},
  date = {2021-11-17/2021-11-19},
  series = {{{CEUR Workshop Proceedings}}},
  volume = {2989},
  pages = {333--345},
  location = {Amsterdam, The Netherlands},
  url = {http://ceur-ws.org/Vol-2989/short_paper18.pdf},
  eventtitle = {{{CHR}} 2021: {{Computational Humanities Research Conference}}}
}

Building the Docker Image

In the project's top-level directory run: docker build -t verby . This will build a docker image that can be run with: docker run -p 8000:80 verby where the -p option will ensure that you can access the api on port 8000 from your host.

HTTP API

After starting the server either via docker or in a development setup you should be able to post you segmentation requests.

Using the CLI tool httpie:

http POST 127.0.0.1:8000/segment text="Ich gehe auf einem Wagen, oder wie manche sagen einem Auto, spazieren. Du gehst nachhause."

Or from Python code:

import requests
response = requests.post("http://127.0.0.1:8000/segment", json={"text": "Ich gehe auf einem Wagen, oder wie manche sagen einem Auto, spazieren. Du gehst nachhause."})
print(response.json())
# Prints: {'verbal_phrases': [[[0, 30], [60, 69]], [[31, 47]], [[71, 90]]], 'sentences': [[0, 70], [71, 90]]}

You will get a response object with the character offsets of sentences and verbal phrases. Note that verbal phrases may be discontinuous, as in the case above with the insertion.

Development Server

To run a development server just execute fastapi dev web.py

Library Usage

If you would prefer using verby as a library rather than via HTTP, you can use this sample code as a starting point.

import verby

nlp = verby.pipeline.build_pipeline("de")

doc = nlp("Sie lassen alle die krank sind nachhause gehen.")
for phrase in doc._.verbal_phrases:
    for span in phrase:
        print(span.start_char, span.end_char)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Segment text using SpaCy

Building the Docker Image

HTTP API

Development Server

Library Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Segment text using SpaCy

Building the Docker Image

HTTP API

Development Server

Library Usage