Fields in the buda-generated MARC records needed for DRS #227

TBRC-TimB · 2022-01-20T16:50:10Z

We have a few asks from Harvard for our marc records to help smooth out the ingestion process.

Currently we have this for the 001 field:
<marc:controlfield tag="001">(BDRC)bdr:W00EGS1016181</marc:controlfield>

They are suggesting we break it out into two fields. the 001 for just the unique ID for the work (e.g.: W00EGS1016181)
and the 003 field for the organizational identifier, which they have as MaCbBDRC

They also asked us to get rid of our 035 field. In most cases, a system will be able to build the contents of this field from the 001 and 003, or 010.

The text was updated successfully, but these errors were encountered:

eroux · 2022-01-20T17:27:09Z

Thanks for the report! The MARC records as we have now (on BUDA) are the result of a veeeeery long back and forth with Columbia, and that was such a tedious endeavor that I'm a bit hesitant to dive into it again...

a few remarks:

in the new database, the ID is really bdr:W00EGS1016181, not just W00EGS1016181 so if possible I'd rather keep the bdr:
Columbia required that we had a 035 field but since we likely won't send them more records we can remove it
Columbia did rewrite the 001 field when they ingested the records i their own database, perhaps Harvard can do the same? (I don't know how this is usually done)
adding the 003 proposed by Harvard will be cool! I suppose the Cb in MaCbBDRC means Cambridge which isn't really the case anymore but that's not a big deal (it's actually a good case for non-semantic IDs!!)

wdyt?

TBRC-TimB · 2022-01-20T18:48:29Z

Oh i can imagine. The big issue with marc records is they are just standard enough that everyone thinks their way is the way everyone should do it .

the 001 field seems to be something they were really pushing. Like you mentioned, post processing on their end might be possible . After all , they used to ingest our marc records the way they were. But I imagine because the drs process is so automated, reworking the fields might not be something they can easily do when batch ingesting into HOLLIS.
I'm also a fan of the 003. I'm willing to bet that the MaCbBDRC might be a standard Harvard is already using as a unique ID for bdrc. While not technically accurate, I think the semantic part of it is really meant to be arbitrary. But I completely agree, semantic unique IDs are a bad time.

DRS communications have been pretty dried up since the Summer. Once we have some of these changes rolled out we can get back in contact and hopefully resume the whole process.

eroux · 2022-01-24T08:49:42Z

@TBRC-TimB would https://purl.bdrc.io/resource/W00EGS1016181.mrcx be satisfactory?

TBRC-TimB · 2022-01-24T13:40:50Z

I can reopen communications with DRS and get their feedback. the only potential hiccup i see is the 001 field. I'll make the case that this is the way we have our unique item ids in our own system. Thanks for looking into this. I'll report back once I hear from harvard.

TBRC-TimB · 2022-02-07T14:09:25Z

I got some feedback on these marc records from harvard. Seems they want us to try and change a few other fields too.

This looks good. Two things:

The ISBN of the print original should not go in 020 $z. It should go into 776 like so: 776/0_ $cOriginal$z7540932317
Can you remove the extraneous space in the pagination? E.g. change "1 online resource (3, 347 pages)" to "1 online resource (3,347 pages)"
Thanks!

@eroux how possible is it to make these changes easily on our end? I imagine the second one might be tricky since it would involve correcting the actual content of the fields rather than just how they are presented in the marc record.

eroux · 2022-02-07T15:08:35Z

oh the second one looks like a mistake, thanks! interesting about the first change, I can make it yes, makes sense

eroux · 2022-02-07T15:49:29Z

I implemented these two changes, but unfortunately the issue with the extent statement is in the data: the book has 347 pages, but the extent statement says "3, 347 p.", I have no idea what the first "3" here refers to... but that's something else, not an issue with the Marc export

TBRC-TimB · 2022-02-07T16:08:56Z

wow, I figured it was a typo but didnt think it would be off my a magnitude of x10.
So it sounds like a data input error. Is there a way to bulk query for that field to get a guess at how common that sort of extent error is? I'm hoping it was a typo and not an attempt to convey a different kind of information, ie 3 chapters 347 pages.
If its just a one off error we can probably ignore it. Thanks again for looking into it.

TBRC-TimB · 2022-02-07T17:17:11Z

oh just took a look at this. It looks like you replaced the ISBN from the 020 with a 760, not a 776 field. The content of the element looks good. Let me know when it is all squared and I'll send it off again for approval.

eroux · 2022-02-07T17:23:23Z

right, sorry for that! fixed

TBRC-TimB · 2022-02-07T17:52:01Z

Perfect! thanks again

eroux · 2022-03-08T13:56:32Z

After further discussions, we should:

VIAF

add the VIAF URI of persons when we have them. When we do they should go in the $1 field, which should be 100, for instance

100 1# $a Obama, Michelle, $d 1964- $e author. $1 http://viaf.org/viaf/81404344

see relevant documentation in the PCC Formulating URIs guide and the PCC Linked Data Best Practices report.

OCLC number

for the erecords only, they are provided by IA on URLs like

https://archive.org/metadata/bdrc-W3CN4988/metadata/external-identifier

and in order to get all the records one can search like this or use the advanced search or the ia search command line

The fields should probably go to 035_$a like in this example from IA

856 improvements

The links to BDRC on Worldcat (example) could look better. Having a proper $y, $3 and $7 would improve it

eroux · 2022-03-09T08:24:07Z

About VIAF, people at Harvard will make a request to OCLC to have a $1 subfield in 720, so that we can place the VIAF ID there. It will take about a year.

Let's add the OCLC number to the database and to the MARC records in that field.

For 856, the Harvard team thinks we should use 856 40 $3 Buddhist Digital Resource Center: $u http://purl.bdrc.io/resource/W1KG16654

I propose we also add a $7 when it's full access.

TBRC-TimB · 2022-03-09T14:46:56Z

if we are going to make some of these changed to our marc records generally, should I hold off on building the marc records for the google books process?

eroux · 2022-03-09T14:52:55Z

oh, very good question! I don't think we need to wait for the 720$1 field to be accepted by OCLC, but I'll make the other changes this week so that you can do the export, I'll keep you updated

eroux · 2022-03-10T08:54:07Z

@TBRC-TimB I've updated the MARC export (without the VIAF URLs), I think it's ready for the export to Google Books, tell me if you encounter issues

eroux · 2022-03-17T21:43:28Z

Harvard wants the 856 descriptive part to be in $y instead of $3

eroux · 2022-03-26T09:03:55Z

two other comments from Harvard:

In the records that have a 776, there should be a colon at the end of the $i information. For example in W1KG14512 you have =776 08$iElectronic reproduction of (manifestation)$w(DLC) 2010309067 You should have: =776 08$iElectronic reproduction of (manifestation):$w(DLC) 2010309067
W1PD159430 has an error in the 490 where it has "=490 0$v1-7" Since the 300 field states that this online resource is complete in 7 volumes, that information does not need to go into the series (490) field. This record should have no 490. This is connected to https://github.com/buda-base/library-issues/issues/424

jimk-bdrc · 2022-08-24T20:09:19Z

Currently we have this for the 001 field: <marc:controlfield tag="001">(BDRC)bdr:W00EGS1016181</marc:controlfield>

They are suggesting we break it out into two fields. the 001 for just the unique ID for the work (e.g.: W00EGS1016181)

and @eroux replied

in the new database, the ID is really bdr:W00EGS1016181, not just W00EGS1016181 so if possible I'd rather keep the bdr:

It's not really a suggestion to drop the 'bdr:' namespace designator, it will make their current DRS holdings consistent with future ones that we deposit.DRS has rigid file naming conventions, I don't know if having this prefix in their database will require that all the works and files we package for them have this prefix, or if it will invalidate DRS searches of our works.

eroux · 2022-08-24T21:02:01Z

There's been many discussions with Harvard about the MARC records, I expect they'll continue in September, there are some more things we need to change

eroux added a commit that referenced this issue Jan 24, 2022

fix for #227

faa5b9a

eroux added a commit that referenced this issue Feb 7, 2022

fix for #227 (comment)

3b7796b

eroux added a commit that referenced this issue Feb 7, 2022

address #227 (comment)

c658b47

eroux changed the title ~~Fields in the buda-generated MARC records AO needs for DRS~~ Fields in the buda-generated MARC records needed for DRS Mar 9, 2022

eroux added a commit that referenced this issue Mar 10, 2022

[Marc] add OCLC & improve 856, #227

f748990

eroux added a commit that referenced this issue Mar 18, 2022

first implementation of 700 fields (#227)

2169fd1

eroux added a commit that referenced this issue Mar 28, 2022

fix to 776 for #227

bb8f009

eroux added a commit that referenced this issue Mar 28, 2022

fix to 776 for #227

058052c

eroux added a commit that referenced this issue Mar 28, 2022

fix for #227, buda-base/library-issues#424

9a7dbbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fields in the buda-generated MARC records needed for DRS #227

Fields in the buda-generated MARC records needed for DRS #227

TBRC-TimB commented Jan 20, 2022

eroux commented Jan 20, 2022

TBRC-TimB commented Jan 20, 2022

eroux commented Jan 24, 2022 •

edited

Loading

TBRC-TimB commented Jan 24, 2022

TBRC-TimB commented Feb 7, 2022

eroux commented Feb 7, 2022

eroux commented Feb 7, 2022

TBRC-TimB commented Feb 7, 2022

TBRC-TimB commented Feb 7, 2022

eroux commented Feb 7, 2022

TBRC-TimB commented Feb 7, 2022

eroux commented Mar 8, 2022

eroux commented Mar 9, 2022

TBRC-TimB commented Mar 9, 2022

eroux commented Mar 9, 2022

eroux commented Mar 10, 2022

eroux commented Mar 17, 2022

eroux commented Mar 26, 2022 •

edited

Loading

jimk-bdrc commented Aug 24, 2022

eroux commented Aug 24, 2022

Fields in the buda-generated MARC records needed for DRS #227

Fields in the buda-generated MARC records needed for DRS #227

Comments

TBRC-TimB commented Jan 20, 2022

eroux commented Jan 20, 2022

TBRC-TimB commented Jan 20, 2022

eroux commented Jan 24, 2022 • edited Loading

TBRC-TimB commented Jan 24, 2022

TBRC-TimB commented Feb 7, 2022

eroux commented Feb 7, 2022

eroux commented Feb 7, 2022

TBRC-TimB commented Feb 7, 2022

TBRC-TimB commented Feb 7, 2022

eroux commented Feb 7, 2022

TBRC-TimB commented Feb 7, 2022

eroux commented Mar 8, 2022

VIAF

OCLC number

856 improvements

eroux commented Mar 9, 2022

TBRC-TimB commented Mar 9, 2022

eroux commented Mar 9, 2022

eroux commented Mar 10, 2022

eroux commented Mar 17, 2022

eroux commented Mar 26, 2022 • edited Loading

jimk-bdrc commented Aug 24, 2022

eroux commented Aug 24, 2022

eroux commented Jan 24, 2022 •

edited

Loading

eroux commented Mar 26, 2022 •

edited

Loading