New Translator: Australian Parliament House (Bills & Legislation) #3396

drzax · 2024-11-26T03:47:27Z

This translator adds support for importing documents related to bills and enacted legislation from the Australian
Parliament House website (aph.gov.au).

AbeJellinek

Thanks! Some comments.

AbeJellinek · 2024-12-03T20:05:01Z

Australian Parliament House (Bills & Legislation).js

+	"translatorID": "ee31d168-6213-4436-acf6-3e08513ba501",
+	"label": "Australian Parliament House (Bills & Legislation)",
+	"creator": "Simon Elvery",
+	"target": "https:\\/\\/www\\.aph\\.gov\\.au\\/Parliamentary_Business\\/Bills_Legislation\\/Bills_Search_Results.+",


Suggested change

"target": "https:\\/\\/www\\.aph\\.gov\\.au\\/Parliamentary_Business\\/Bills_Legislation\\/Bills_Search_Results.+",

"target": "^https://www\\.aph\\.gov\\.au/Parliamentary_Business/Bills_Legislation/Bills_Search_Results",

The regex passes if it matches any part of the string, so we need to explicitly anchor to the start and don't need padding at the end. And no need to escape slashes except in a JS regex literal.

AbeJellinek · 2024-12-03T20:07:02Z

Australian Parliament House (Bills & Legislation).js

+	const results = doc.querySelector('.search-filter-results');
+	if (!status && !results) return false;
+	if (results) return 'multiple';


Check getSearchResults(doc, true) explicitly before returning multiple for consistency. Can do that instead of this (similar but not identical) check.

AbeJellinek · 2024-12-03T20:09:38Z

Australian Parliament House (Bills & Legislation).js

+	for (var i = 0; i < rows.length; i++) {
+		var href = attr(rows[i], 'href');
+		var title = text(rows[i]);


Suggested change

for (var i = 0; i < rows.length; i++) {

var href = attr(rows[i], 'href');

var title = text(rows[i]);

for (let row of rows) {

var href = row.href;

var title = row.textContent;

Can use for..of, and I think you're misunderstanding what attr() and text() do: they get an attribute/the text of a child matching a selector, not the element itself. text(elem, '.some-child') is roughly equivalent to elem.querySelector('.some-child')?.textContent || ''.

AbeJellinek · 2024-12-03T20:11:12Z

Australian Parliament House (Bills & Legislation).js

+		const results = getSearchResults(doc, false);
+		if (!Object.keys(results).length) return;
+		let items = await Zotero.selectItems(results);
+		if (!items) return;
+		for (let url of Object.keys(items)) {
+			await scrape(await requestDocument(url), url);
+		}


Just follow the web translator template:

Suggested change

const results = getSearchResults(doc, false);

if (!Object.keys(results).length) return;

let items = await Zotero.selectItems(results);

if (!items) return;

for (let url of Object.keys(items)) {

await scrape(await requestDocument(url), url);

}

let items = await Zotero.selectItems(getSearchResults(doc, false));

if (!items) return;

for (let url of Object.keys(items)) {

await scrape(await requestDocument(url));

}

AbeJellinek · 2024-12-03T20:11:32Z

Australian Parliament House (Bills & Legislation).js

+}
+
+function scrape(doc, url) {
+	let item = new Zotero.Item(detectWeb(doc,url));


Suggested change

let item = new Zotero.Item(detectWeb(doc,url));

let item = new Zotero.Item(detectWeb(doc, url));

AbeJellinek · 2024-12-03T20:12:13Z

Australian Parliament House (Bills & Legislation).js

+	item.url = url;
+	var permalink = attr(doc, '#main_0_billSummary_permalink', 'href');
+	if (permalink.length > 0) {
+		item.url = permalink;
+	}


Suggested change

item.url = url;

var permalink = attr(doc, '#main_0_billSummary_permalink', 'href');

if (permalink.length > 0) {

item.url = permalink;

}

item.url = attr(doc, '#main_0_billSummary_permalink', 'href') || url;

AbeJellinek · 2024-12-03T20:13:41Z

Australian Parliament House (Bills & Legislation).js

+	item.history = Array.from(doc.querySelectorAll('#main_0_mainDiv thead tr')).reduce((all, el) => {
+		const chamber = text(el, 'th');
+		return all.concat(Array.from(el.querySelectorAll('tbody tr')).map((el) => {
+			return `${chamber.length ? chamber + ' ' : ''}${text(el, 'td:first-child')} ${text(el, 'td:last-child')}`;
+		}));
+	}, []);


Sorry, I really have no idea what's going on here. Can we rewrite this in a more imperative style with comments? I like functional programming, but a web scraper isn't usually the right place for it - we just end up with way too many nested layers of logic and it's pretty hard to follow.

AbeJellinek · 2024-12-03T20:14:57Z

Australian Parliament House (Bills & Legislation).js

+	item.attachments = Array.from(doc.querySelectorAll('div.bill-details tr')).reduce((acc, rowEl) => {
+		const name = text(rowEl, 'li', 0).replace(/\\n/g,'').replace(/\s+/g, ' ');
+		Array.from(rowEl.querySelectorAll('.format a')).forEach((aEl) => {
+			const formatName = attr(aEl, 'img', 'alt');
+			const mimeType = aEl.href.split(';').map(d => d.split('=').map(dd => decodeURIComponent(dd))).find(d => d[0] === 'fileType')[1];
+			acc.push({
+				title: `${name} (${formatName})`,
+				url: aEl.href,
+				mimeType
+			})
+		})
+		return acc;
+	},[])


Can we rewrite this using map or just iteration?

AbeJellinek · 2024-12-03T20:16:38Z

Australian Parliament House (Bills & Legislation).js

+					{
+						"title": "First reading (Word Format)",
+						"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+					},
+					{
+						"title": "First reading (PDF Format)",
+						"mimeType": "application/pdf"
+					},
+					{
+						"title": "Explanatory memorandum (Word Format)",
+						"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+					},
+					{
+						"title": "Explanatory memorandum (PDF Format)",
+						"mimeType": "application/pdf"
+					},
+					{
+						"title": "2nd reading amendment STEGGALL, Zali, MP (Word Format)",
+						"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+					},
+					{
+						"title": "2nd reading amendment STEGGALL, Zali, MP (PDF Format)",
+						"mimeType": "application/pdf"
+					},
+					{
+						"title": "2nd reading amendment DANIEL, Zoe, MP (Word Format)",
+						"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+					},
+					{
+						"title": "2nd reading amendment DANIEL, Zoe, MP (PDF Format)",
+						"mimeType": "application/pdf"
+					},
+					{
+						"title": "2nd reading amendment SCAMPS, Sophie, MP (Word Format)",
+						"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+					},
+					{
+						"title": "2nd reading amendment SCAMPS, Sophie, MP (PDF Format)",
+						"mimeType": "application/pdf"
+					},
+					{
+						"title": "2nd reading amendment TINK, Kylea, MP (Word Format)",
+						"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+					},
+					{
+						"title": "2nd reading amendment TINK, Kylea, MP (PDF Format)",
+						"mimeType": "application/pdf"
+					},
+					{
+						"title": "Detail - Crossbench SHARKIE, Rebekha, MP (Word Format)",
+						"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+					},
+					{
+						"title": "Detail - Crossbench SHARKIE, Rebekha, MP (PDF Format)",
+						"mimeType": "application/pdf"
+					},
+					{
+						"title": "Bills Digest (PDF Format)",
+						"mimeType": "application/pdf"
+					},
+					{
+						"title": "Snapshot",
+						"mimeType": "text/html"
+					}


We definitely don't want all of these. At the very least, we only want one format per attachment.

The "[...] Format" part of the title is unnecessary.

AbeJellinek · 2024-12-03T20:18:18Z

Australian Parliament House (Bills & Legislation).js

+		"items": [
+			{
+				"itemType": "statute",
+				"nameOfAct": "Aboriginal Land Grant (Jervis Bay Territory) Amendment (Strengthening Land and Governance Provisions) Bill 2022",


Seems like we aren't really filling in most of the bill/statute fields that we need to make a good citation. We should be able to get most of them from the page. See Help -> List All Fields for Item Type (or hover over an item type name in the code editor)

drzax · 2025-01-22T23:59:29Z

@AbeJellinek — thanks for all your comments on this, it's cleared a few things up for me. I'll work on fixing/committing your suggestions when I can.

As a first time translator author I found it very difficult to find good documentation about the structure and functions available to translators and the field mappings for different citation types. Can you recommend any good documentation I might have missed?

AbeJellinek · 2025-01-23T00:47:55Z

There's general documentation and web translator-specific documentation, but it honestly all could use an update. As I mentioned in the comments, you should use Scaffold and build around the web translator template (+ button in the toolbar). Also take a look at some recent(ish)ly added/updated translators, like https://github.com/zotero/translators/blob/b947ce66c9da49c9019bafd6098dafe8335a98ea/PubFactory%20Journals.js and https://github.com/zotero/translators/blob/35fc547908c39f842e09b653f9f51f41b7326798/Retsinformation.js.

AbeJellinek · 2025-01-23T00:48:55Z

(For reference, the web translator template is https://github.com/zotero/zotero/blob/ed045e640a74e2b2075e7b16fa32ea856f704ce5/chrome/content/scaffold/templates/newWeb.js)

AbeJellinek · 2025-02-07T20:28:05Z

@drzax: I've updated the translator development docs at https://www.zotero.org/support/dev/translators somewhat. Still more work to do, but I hope they're a little clearer now.

New Translator: Australian Parliament House (Bills & Legislation)

9746eaa

AbeJellinek requested changes Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Translator: Australian Parliament House (Bills & Legislation) #3396

New Translator: Australian Parliament House (Bills & Legislation) #3396

drzax commented Nov 26, 2024

AbeJellinek left a comment

AbeJellinek Dec 3, 2024

AbeJellinek Dec 3, 2024

AbeJellinek Dec 3, 2024

AbeJellinek Dec 3, 2024

AbeJellinek Dec 3, 2024

AbeJellinek Dec 3, 2024

AbeJellinek Dec 3, 2024

AbeJellinek Dec 3, 2024

AbeJellinek Dec 3, 2024

AbeJellinek Dec 3, 2024

drzax commented Jan 22, 2025

AbeJellinek commented Jan 23, 2025

AbeJellinek commented Jan 23, 2025 •

edited

Loading

AbeJellinek commented Feb 7, 2025

	"target": "https:\\/\\/www\\.aph\\.gov\\.au\\/Parliamentary_Business\\/Bills_Legislation\\/Bills_Search_Results.+",
	"target": "^https://www\\.aph\\.gov\\.au/Parliamentary_Business/Bills_Legislation/Bills_Search_Results",

	let item = new Zotero.Item(detectWeb(doc,url));
	let item = new Zotero.Item(detectWeb(doc, url));

New Translator: Australian Parliament House (Bills & Legislation) #3396

Are you sure you want to change the base?

New Translator: Australian Parliament House (Bills & Legislation) #3396

Conversation

drzax commented Nov 26, 2024

AbeJellinek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drzax commented Jan 22, 2025

AbeJellinek commented Jan 23, 2025

AbeJellinek commented Jan 23, 2025 • edited Loading

AbeJellinek commented Feb 7, 2025

AbeJellinek commented Jan 23, 2025 •

edited

Loading