Skip to content

Commit

Permalink
Handle < and > for HTML in autodescribe
Browse files Browse the repository at this point in the history
These are probably the two most common character entities, so might as
well convert them. Using an HTML parser would be better (or calibre) but
the simple text scanning method for HTML is much more likely to have all
the required dependencies available, and works just as well in most real
world cases.
  • Loading branch information
dfandrich committed Sep 2, 2024
1 parent 29dc0f5 commit cd806c5
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion autodescribe
Original file line number Diff line number Diff line change
Expand Up @@ -1059,7 +1059,7 @@ comment_first_line () {

# File type: html (HTML text)
comment_html () {
COMMENT=$(sed -n 's,^.*<[tT][iI][tT][lL][eE]\>[^>]*>\(.*\)</.*$,\1,p' < "$1" | head -1)
COMMENT=$(sed -n 's,^.*<[tT][iI][tT][lL][eE]\>[^>]*>\(.*\)</.*$,\1,p' < "$1" | sed -e 's/&lt;/</g' -e 's/&gt;/>/g' | head -1)
}

# File type: kdenlive (Kdenlive file)
Expand Down
2 changes: 1 addition & 1 deletion test-autodescribe-expected
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
'testfiles/type1.gif' 'GIF comment'
'testfiles/type1.gnumeric' 'Gnumeric Title'
'testfiles/type1.gpx' 'GPX metadata name'
'testfiles/type1.html' 'HTML title is description'
'testfiles/type1.html' 'HTML title <is> description'
'testfiles/type1.ics' 'iCalendar summary'
'testfiles/type1.iso' 'volume_id'
'testfiles/type1.jar' 'Jar Application Name'
Expand Down
2 changes: 1 addition & 1 deletion testfiles/type1.html
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<html>
<head>
<TITLE>HTML title is description</TITLE>
<TITLE>HTML title &lt;is&gt; description</TITLE>
</head>
<body>
Nothing to see here...
Expand Down

0 comments on commit cd806c5

Please sign in to comment.