Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cube-pipeline ignores + character when generating code list URIs #114

Open
jennet opened this issue Mar 16, 2020 · 6 comments
Open

cube-pipeline ignores + character when generating code list URIs #114

jennet opened this issue Mar 16, 2020 · 6 comments
Labels

Comments

@jennet
Copy link

jennet commented Mar 16, 2020

Not sure if the problem will be the value template tags or something else, but if the code list has a concept with URI e.g. http://example.org/def/concept/age/100+ and the input CSV has a value "100+" (which is correctly set up in the columns.csv to convert to http://example.org/def/concept/age/{age} it generates http://example.org/def/concept/age/100, losing the + which then results in a broken link.

@jennet jennet added the bug label Mar 16, 2020
@RickMoynihan
Copy link
Member

RickMoynihan commented Mar 17, 2020

I think this is potentially related (though not the same as) your whitespace issues too:

#111
#113

+ is unfortunately another encoding for (space) so it might be that it is either being stripped for this reason, or it is being url decoded into http://example.org/def/concept/age/100 and then trimmed somewhere else.

https://stackoverflow.com/questions/2678551/when-to-encode-space-to-plus-or-20

Not sure what to do about it; it will depend and likely be subtle. I think the best bet is to possibly change our slugify implementation to convert + into something like plus -- but that has problems too.

@RickMoynihan
Copy link
Member

In csv2rdf.uri-template the following happens:

(expand-template (parse-template "http://example.org/def/concept/age/{age}") {:age "100+"}) ;; => #object[java.net.URI 0x54f684ed "http://example.org/def/concept/age/100%2B"]

@RickMoynihan
Copy link
Member

RickMoynihan commented Mar 17, 2020

@jennet ok as I think we suspected this appears to be caused by table2qb slugize.

I don't know exactly what code path is being used in your case, (as I don't have your pipeline/config etc available) but I suspect you're somehow calling slugize which I think will end up using this implementation:

https://github.com/Swirrl/grafter-extra/blob/f4422fde5a8413be5313fb2bfdc3c9751a422be0/src/grafter/extra/cell/uri.clj#L5-L13

Essentially +'s get replaced with -s then any slugs ending in - are stripped off. The regex responsible for this is the line: (-> string ,,, (clojure.string/replace #"[^\w/]" "-")) where the ^\w means any non-word character; which includes +.

so:

(slugize "100+") ;;=> "100"

Strictly speaking I don't know that this is the cause of your problem; but if this were being used it would cause your issue; so I think it's highly likely this is the root cause.

Potential fixes:

  1. Do as Bill suggests and change the input to be 100 and over which will slugize better.
  2. Introduce a different sluggizer that can be configured to that column, that doesn't replace + with -.

@jennet
Copy link
Author

jennet commented Mar 17, 2020

@RickMoynihan thanks for looking into this. I think option 1 is the safest for now until we have got a more thorough development roadmap in place

@RickMoynihan
Copy link
Member

Agreed. 2 is probably pretty easy to add, but might as well work around with 1 at this stage.

@Robsteranium
Copy link
Contributor

I feel like we did this deliberately for sns-graft, but I forget why.

Given that a + is a valid part of a URI path then we don't need to strip it out. Tbh, the slugize action is probably a bit aggressive in rejecting non-words anyway. Perhaps there's a version available somewhere written by someone who's been through all the RFCs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants