Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Translator: Australian Parliament House (Bills & Legislation) #3396

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

drzax
Copy link

@drzax drzax commented Nov 26, 2024

This translator adds support for importing documents related to bills and enacted legislation from the Australian
Parliament House website (aph.gov.au).

Copy link
Member

@AbeJellinek AbeJellinek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some comments.

"translatorID": "ee31d168-6213-4436-acf6-3e08513ba501",
"label": "Australian Parliament House (Bills & Legislation)",
"creator": "Simon Elvery",
"target": "https:\\/\\/www\\.aph\\.gov\\.au\\/Parliamentary_Business\\/Bills_Legislation\\/Bills_Search_Results.+",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"target": "https:\\/\\/www\\.aph\\.gov\\.au\\/Parliamentary_Business\\/Bills_Legislation\\/Bills_Search_Results.+",
"target": "^https://www\\.aph\\.gov\\.au/Parliamentary_Business/Bills_Legislation/Bills_Search_Results",

The regex passes if it matches any part of the string, so we need to explicitly anchor to the start and don't need padding at the end. And no need to escape slashes except in a JS regex literal.

Comment on lines +42 to +44
const results = doc.querySelector('.search-filter-results');
if (!status && !results) return false;
if (results) return 'multiple';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check getSearchResults(doc, true) explicitly before returning multiple for consistency. Can do that instead of this (similar but not identical) check.

Comment on lines +53 to +55
for (var i = 0; i < rows.length; i++) {
var href = attr(rows[i], 'href');
var title = text(rows[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (var i = 0; i < rows.length; i++) {
var href = attr(rows[i], 'href');
var title = text(rows[i]);
for (let row of rows) {
var href = row.href;
var title = row.textContent;

Can use for..of, and I think you're misunderstanding what attr() and text() do: they get an attribute/the text of a child matching a selector, not the element itself. text(elem, '.some-child') is roughly equivalent to elem.querySelector('.some-child')?.textContent || ''.

Comment on lines +66 to +72
const results = getSearchResults(doc, false);
if (!Object.keys(results).length) return;
let items = await Zotero.selectItems(results);
if (!items) return;
for (let url of Object.keys(items)) {
await scrape(await requestDocument(url), url);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just follow the web translator template:

Suggested change
const results = getSearchResults(doc, false);
if (!Object.keys(results).length) return;
let items = await Zotero.selectItems(results);
if (!items) return;
for (let url of Object.keys(items)) {
await scrape(await requestDocument(url), url);
}
let items = await Zotero.selectItems(getSearchResults(doc, false));
if (!items) return;
for (let url of Object.keys(items)) {
await scrape(await requestDocument(url));
}

}

function scrape(doc, url) {
let item = new Zotero.Item(detectWeb(doc,url));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let item = new Zotero.Item(detectWeb(doc,url));
let item = new Zotero.Item(detectWeb(doc, url));

Comment on lines +84 to +88
item.url = url;
var permalink = attr(doc, '#main_0_billSummary_permalink', 'href');
if (permalink.length > 0) {
item.url = permalink;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
item.url = url;
var permalink = attr(doc, '#main_0_billSummary_permalink', 'href');
if (permalink.length > 0) {
item.url = permalink;
}
item.url = attr(doc, '#main_0_billSummary_permalink', 'href') || url;

Comment on lines +90 to +95
item.history = Array.from(doc.querySelectorAll('#main_0_mainDiv thead tr')).reduce((all, el) => {
const chamber = text(el, 'th');
return all.concat(Array.from(el.querySelectorAll('tbody tr')).map((el) => {
return `${chamber.length ? chamber + ' ' : ''}${text(el, 'td:first-child')} ${text(el, 'td:last-child')}`;
}));
}, []);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I really have no idea what's going on here. Can we rewrite this in a more imperative style with comments? I like functional programming, but a web scraper isn't usually the right place for it - we just end up with way too many nested layers of logic and it's pretty hard to follow.

Comment on lines +97 to +109
item.attachments = Array.from(doc.querySelectorAll('div.bill-details tr')).reduce((acc, rowEl) => {
const name = text(rowEl, 'li', 0).replace(/\\n/g,'').replace(/\s+/g, ' ');
Array.from(rowEl.querySelectorAll('.format a')).forEach((aEl) => {
const formatName = attr(aEl, 'img', 'alt');
const mimeType = aEl.href.split(';').map(d => d.split('=').map(dd => decodeURIComponent(dd))).find(d => d[0] === 'fileType')[1];
acc.push({
title: `${name} (${formatName})`,
url: aEl.href,
mimeType
})
})
return acc;
},[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rewrite this using map or just iteration?

Comment on lines +140 to +203
{
"title": "First reading (Word Format)",
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
},
{
"title": "First reading (PDF Format)",
"mimeType": "application/pdf"
},
{
"title": "Explanatory memorandum (Word Format)",
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
},
{
"title": "Explanatory memorandum (PDF Format)",
"mimeType": "application/pdf"
},
{
"title": "2nd reading amendment STEGGALL, Zali, MP (Word Format)",
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
},
{
"title": "2nd reading amendment STEGGALL, Zali, MP (PDF Format)",
"mimeType": "application/pdf"
},
{
"title": "2nd reading amendment DANIEL, Zoe, MP (Word Format)",
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
},
{
"title": "2nd reading amendment DANIEL, Zoe, MP (PDF Format)",
"mimeType": "application/pdf"
},
{
"title": "2nd reading amendment SCAMPS, Sophie, MP (Word Format)",
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
},
{
"title": "2nd reading amendment SCAMPS, Sophie, MP (PDF Format)",
"mimeType": "application/pdf"
},
{
"title": "2nd reading amendment TINK, Kylea, MP (Word Format)",
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
},
{
"title": "2nd reading amendment TINK, Kylea, MP (PDF Format)",
"mimeType": "application/pdf"
},
{
"title": "Detail - Crossbench SHARKIE, Rebekha, MP (Word Format)",
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
},
{
"title": "Detail - Crossbench SHARKIE, Rebekha, MP (PDF Format)",
"mimeType": "application/pdf"
},
{
"title": "Bills Digest (PDF Format)",
"mimeType": "application/pdf"
},
{
"title": "Snapshot",
"mimeType": "text/html"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We definitely don't want all of these. At the very least, we only want one format per attachment.
  • The "[...] Format" part of the title is unnecessary.

"items": [
{
"itemType": "statute",
"nameOfAct": "Aboriginal Land Grant (Jervis Bay Territory) Amendment (Strengthening Land and Governance Provisions) Bill 2022",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we aren't really filling in most of the bill/statute fields that we need to make a good citation. We should be able to get most of them from the page. See Help -> List All Fields for Item Type (or hover over an item type name in the code editor)

@drzax
Copy link
Author

drzax commented Jan 22, 2025

@AbeJellinek — thanks for all your comments on this, it's cleared a few things up for me. I'll work on fixing/committing your suggestions when I can.

As a first time translator author I found it very difficult to find good documentation about the structure and functions available to translators and the field mappings for different citation types. Can you recommend any good documentation I might have missed?

@AbeJellinek
Copy link
Member

There's general documentation and web translator-specific documentation, but it honestly all could use an update. As I mentioned in the comments, you should use Scaffold and build around the web translator template (+ button in the toolbar). Also take a look at some recent(ish)ly added/updated translators, like https://github.com/zotero/translators/blob/b947ce66c9da49c9019bafd6098dafe8335a98ea/PubFactory%20Journals.js and https://github.com/zotero/translators/blob/35fc547908c39f842e09b653f9f51f41b7326798/Retsinformation.js.

@AbeJellinek
Copy link
Member

AbeJellinek commented Jan 23, 2025

@AbeJellinek
Copy link
Member

@drzax: I've updated the translator development docs at https://www.zotero.org/support/dev/translators somewhat. Still more work to do, but I hope they're a little clearer now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants