Problem with scraped UTF-encoded strings #664
Unanswered
BaronCzerny
asked this question in
Q&A
Replies: 1 comment 3 replies
-
Can you share a link to a post that has this problem? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
when I scrape contents written in Spanish, which contains accented characters among other special characters, the scraped strings contain this kind of escaped sequences "\u00ed", which I think are the Unicodes for the corresponding characters, in this case "í". I would like to have these sequences converted to the corresponding character. Or is it not wise to do it, and I should feed my MongoDB collections with these strings as they are?
I have seen that there is a CLI switch called "encoding". Can I use it as an argument in get_posts(), too? I have tried it, but the scraper module complains then.
Thanks in advance for your help!
Miguel
Beta Was this translation helpful? Give feedback.
All reactions