This is out of date
The best representation of the current (2019) schema is https://github.com/internetarchive/openlibrary-client/tree/master/olclient/schemata
The authoritative Open Library schema -- a specification of the database fields used to represent items like books and authors -- is a python expression in the source repository, here.
An more readable version may be generated by executing that file; here it is as of 2007-08-30. (Asterixes indicate multi-valued fields. The types "string", "text", "url" and "date" are all currently represented in ThingDB as strings, but could be displayed or edited in different ways.)
edition
Field | Type | MARC Fields | Example (Description) |
---|---|---|---|
source_record_loc | string* | "marc_records_scriblio_net/part01.dat:29834:543" (a locator for the source record data) | |
source_record_id | string* | "LC:DLC:00000006" (a record identifier that is globally unique and that also can be constructed consistently from the contents of a record and an identifier for its source catalog) | |
author_identifier | string* | 100:abcd, 110:ab, 710:ab, 111:acdn, 711:acdn | "Twain, Mark, 1835-1910" (unique author id in some catalog) |
contributions | string* | 700:abcde | "Illustrated by: Steve Bjorkman" |
title | string | 245:a clean_name | "The adventures of Tom Sawyer" |
subtitle | string | 245:b clean_name | "a play in three acts" |
by_statement | string* | 245:c | "Herman Melville ; [illustrated by Barry Moser]" |
sort_title | string | "adventures of Tom Sawyer" | |
other_titles | string* | 246:a, 730:a-z, 740:apn | "Mark Twain's The Adventures of Tom Sawyer" |
work_title | string | 240:amnpr, 130:a-z | (The 240 "work title" is used in the OCLC FRBR algorithm. The 130 is also used, and there should be either a 130 or a 240 in a record, but not both. It would be ideal if we could pick up either for the work title.) |
edition | string | 250:ab | "2nd. editon" (information about this edition) |
publisher | string | 260:b clean_name | "W. W. Norton & Co." |
publish_place | string* | 260:a clean | "New York" |
publish_date | date | 008:7-10 | "2006" |
pagination | string | 300:a | "viii, 383 p. :" (full pagination information) |
number_of_pages | int | 300:a biggest_decimal | 383 (largest decimal found) |
subjects | string* | 600:abcd--x--v--y--z, 610:ab--x--v--y--z, 650:a--x--v--y--z, 651:a--x--v--y--z | "Runaway children -- Fiction" |
subject_place | string* | 651:a*, 650:z* | "Venice (Italy)" |
subject_time | string* | 600:y*, 650:y* | "20th century" |
genre | string* | 600:v*, 650:v*, 651:v* | "Biography" |
series | string* | 440:av, 490:av, 830:av | "Oxford world's classics" |
language | string | 008:35-37 "ISO" tag | "ISO: tel" (coded or human-readable description of the text's language) |
physical_format | string* | 245:h | |
notes | string* | 5XX!505!520:a-z | |
description | text | 520:a | |
exerpts | text* | ||
table_of_contents | text* | 505:art | |
cover_image | url | ||
scan_contributor | string | ||
scan_sponsor | string | ||
dewey_number | string* | 082:a | "914.3" |
LC_classification | string | 050:ab | "BJ1533.C4 L49" |
ISBN | string* | 020:a normalize_isbn, 024:a normalize_isbn | "9780393926033" (13-digit ISBN) |
UCC_13 | string | ||
UPC | string | ||
ISMN | string | ||
DOI | string | ||
LCCN | string | 010:a normalize_lccn | "2006285320" |
GTIN_14 | string | ||
oca_identifier | string | "albertgallatinja00stevrich" |
author
Field | Type | MARC Fields | Example (Description) |
---|---|---|---|
identifier | string* | "Twain, Mark, 1835-1910" (unique id in some catalog) | |
name | string | "Mark Twain" (human-readable name) | |
birth_date | date | "1835" | |
death_date | date | "1910" | |
bio | text |
EDITION
name | type | example/description |
---|---|---|
source_name | STRING | |
source_record_pos | INT | |
work | ID-REF | |
authors | ID-REFs | Tolkien, J. R. R. |
contributors | STRINGs | "Illustrated by: Steve Bjorkman" |
agencies/organizations | STRINGs | American Civil Liberties Union. Berkeley Chapter |
title | STRING | The adventures of Tom Sawyer |
"by" statement | STRINGs | Herman Melville ; [illustrated by Barry Moser] |
sort title | INT | adventures of Tom Sawyer |
other titles | STRINGs | Mark Twain's The Adventures of Tom Sawyer |
edition | STRING | 2nd. editon |
publisher | STRING | W. W. Norton & Co., |
publish_place | STRING | New York : |
publish_date | DATE | c2007. |
number_of_pages | STRING | viii, 383 p. : |
subjects | STRINGs | Runaway children -- Fiction |
series | STRINGs | Oxford world's classics |
notes | STRINGs | |
BISAC_subject_categories | STRINGs | see definitions here |
language_code | STRING | code from ISO 639-2/B; e.g., "tel" |
language | STRING | human-readable description of the text's language, e.g, "Telugu" |
physical_format | STRING | |
description | HTML | |
table of contents | STRINGs | |
Dewey number | STRINGs | 914.3 |
LC Classification | STRING | BJ1533.C4 L49 |
cover_image | URL | |
scan_contributor | STRING | |
scan_sponsor | STRING | |
ISBN_10 | STRING | 0393926036 |
ISBN_13 | STRING | 9780393926033 |
UCC_13 | STRING | |
UPC | STRING | |
ISMN | STRING | |
DOI | STRING | |
LCCN | STRING | |
GTIN_14 | STRING | |
oca_identifier | STRING | "albertgallatinja00stevrich" |
New EDITION with MARC and ONIX fields
[1] The 240 "work title" is used in the OCLC FRBR algorithm. The 130 is also used, and there should be either a 130 or a 240 in a record, but not both. It would be ideal if we could pick up either for the work title.
[2] There are two sources in the MARC record for date of publication. The 260 $c may contain characters beyond the year ("c1997" or "1946 [reprinted 1965]"). Positions 07-10 of the 008 field have a normalized date ("1997" or "1946"). The dates as represented in the 260 will not be found outside of library records, so the 008 date can be substituted for it. For ONIX, the publication date often has month and day as well as year. For uses in terms of merging and for faceting, only the year should be used.
[3] MARC has a wide range of notes that appear in fields that begin with "5". All notes EXCEPT the 505 (table of contents) and 520 (summary) can be placed in a notes field. Notes fields can be repeatable.
[4] The ISBN field is not necessarily "clean" – there can be trailing data (0195144953 (alk. paper)). Take only the 10 or 13-character token, which should appear first. The token is all numeric EXCEPT that the final character can be "X".
[5] There are two possible locations for the ISBN_13 in MARC records. Records from some sources, including LC, will have the ISBN-13 in an 020 field. Many records will have two 020 fields, one with the ISBN-10 and one with the ISBN-13. Records from sources other than LC may have the ISBN-13 in the 024 field. There can be other 13-digit EANs in the 024 field, so the ISBN is identified by a "3" in the first indicator position.
[6]The LCCN field is not necessarily "clean" – there can be trailing data ($a 3400058678 /rev). If you wish to use the LCCN for matching, take only the numeric token from the subfield.
[7] this field is a potential facet for display and selection
History
- Created April 9, 2008
- 4 revisions
October 28, 2019 | Edited by Tom Morris | Add link to current schema repository |
March 31, 2009 | Edited by Edward Betts | warning, out of date |
August 17, 2008 | Edited by Karen Coyle | added subtitle |
April 9, 2008 | Created by Alexis Rossi | adding page |