Preserving Web-Based Auction Catalogs

Gretchen Nadasky IMLS Grant Project Pratt Institute MSLIS LIS-698 Dr. Tula Giannini

Literature Review

Section 1:  Auction catalogs and auction houses


Auction catalogs are essential tools for the sale of art and foundational texts for art researchers (J. Glazer, personal communication, April 23, 2013). The auction catalog and the published sale price list are the primary records that contribute to the valuation of the work.  Dealers, appraisers, gallery owners, collectors, connoisseurs, investors, attorneys, investigators, insurers and tax adjusters all need the information found in auction house publications (Lee, 1999).

The format of an auction catalog varies widely from house to house.  It can contain a wealth of information or brief listing of ‘lots.’  In the case of large auction houses like Christie’s and Sotheby’s the catalog entry contains; title of the work, artist attribution, description, exhibition history, provenance, a catalog note and the estimated price. Condition reports are created for interested buyers (Glazer, 2013).

Auction catalogs are marketing materials beautifully designed to entice buyers. The language of the catalog is very particular.  For example,  “after ” means similar to and “the circle” of means influenced by. These distinctions are made to determine a work’s value (McLeod, 2001).


The Modern auction format can be traced to 16-17th c Netherlands.  Auctions were an important part of the speculative art market in London 17th century when King Charles I sold his collection and the Royal Academy of Art was established.   Christie’s dominated the London art auction market in the 19th and early 20th century. Later John Sotheby took over his uncle’s book selling business and began auctioning letters, drawings and sketches. Now, Sotheby’s is an equal rival to Christies (McLeod, 2001).

Tools of modern art valuation like indexes and catalogs started to appear more frequently in the 18th century.  Determining provenance became paramount when collecting art was seen as a way to enhance social status(Raux, 2012).  In 1741 Pierre-Jean Mariette wrote the first sales catalog called Recueil Crozat for the auction of wealthy art patron Pierre Crozat. Mariette is credited with establishing the practice of including classification of a work by school and provenance (McLeod, 2001). The commercialization of art was upsetting to some at the time.  One critic complained that “such sterile knowledge dazzles only an ignoramous” (Dézallier, 1745).


In spite of some naysayers, the value of auction catalogs continued to be recognized. Frits Lugt recorded the essential details of sales catalogs published from 1600–1900 for public auctions in Europe and North America in his famous Répertoire des catalogues de ventes publiques intéressant l’art ou la curiosité. He numbered each catalog creating the convention of the “Lugt number”(Hugenholtz, 2005). His work is considered to be a valuable resource for art history even today and is offered on-line by Brill publishing in a database of auction catalogs that also  includes contributions from the Frick.


Helen Clay Frick started the Frick Art Library in 1920.  She initially built the collection by buying rare auction catalogs and other materials from collectors. Miss. Frick traveled extensively through Europe and had a network of friends and agents who would acquire materials for her. Among her friends, Mme. Cothilde Briere Misme the wife of a Louvre curator gathered many of the 17th and 18th century catalogs that are part of the collection today (Reist, 2004).

Currently, FARL has a collection of over 90,000 auction catalogs.  Auction catalogs are the most requested resource at the library. In 2012 library staff filled 2, 836 requests for auction materials and received 10-15 requests a month from the inter-library loan system. An increase in object-based research, interest in art as a financial commodity and greater use of primary resources have brought a wider audience to the library (S. Massen, personal communication, April 2, 2013).  The Frick recently opened a The Center for the History of Collecting in response to inquiries about the past habits of art buyers.

Prior sales price is the best way to predict how much a work of art will sell for at auction because appraising works is so difficult. Before the availability of sale information on-line by auction houses both large and small, only an elite group of people were participating to set the value. Now, there are more people contributing input to determine valuation.  It is less likely in the internet age to find a completely undiscovered work because so many experts can see and evaluate it (Glazer, 2013).


The Post-War US art market was considered inferior to Europe until 1964 when Sotheby’s bought New York’s most venerable auction house Parke-Berne (Rathbone, 2010). American Art and European art were still considered two wholly different things and thus the concept of departments for auction houses was born. At that time auctions were considered a “wholesale” business and most of the buyers were dealers. Catalogs contained fewer details and sometimes no price ranges because they were serving customers who were expected to use their own judgment. A turning point came to the American market when Frederick Church’s The Icebergs sold at a 1979 auction for a record $2.5 million and brought new prominence to American art (Harvey, 2002).

The tone turned again when Alfred Taubman bought Sotheby’s in 1983, bringing a more populist and commercial attitude to the auction business.  Retail trade practices like taking art on a consignment system and giving sellers money up front were introduced.  More people came into the art market in the eighties and nineties driving prices ever-higher. However, the good times for auction houses were briefly interrupted by a massive price-fixing scandal in 2000. Taubman himself was convicted, Sotheby’s and Christie’s paid $512mill in fines (Bertoni, 2012).


Following the price-fixing scandal of 2000 Christie’s and Sotheby’s were under pressure to find new sources of revenue and shore up their reputation.  Both houses launched new initiatives to pursue more private sales, expand into emerging markets and hold on-line auctions that changed market overall and art scholarship in different ways.

It is estimated that in 2013 private sale revenue will exceed $1bn each at Christie’s and Sotheby’s.  Private sales take market share from the auction business and lure clients from galleries. In general, catalogs aren’t produced for individual transactions so the trail of provenance can disappear. Some private sales are orchestrated through curated selling shows but they are rare and the sale prices are not always disclosed. Private sales bring the auction house specialists role toward trusted advisers and helps the auction houses keep control over the rapidly growing  market for art. (Tully, 2013)

Much of the new demand for art comes from international markets. Additionally, the last 15 years has also seen international competition with the ascendance of Asian companies like China Guardian and Beijing Poly International.  Foreign players have brought newly-minted tycoons from Asia, Russia, and South America into the market.  More far-flung buyers make it difficult to track the journey of works of art and also increase the potential for fraud.  The hegemony of the larger shops also make it harder for smaller companies to survive and those may ultimately get pushed out of the market entirely (Tully, 2013).

New technology has played a big role in the evolution of the auction business.  E-commerce platforms such as Artsy, VIP Art, and Vintage + Modern have become vehicles for art sales (French, 2012).  Christie’s and Sotheby’s have both staged on-line auctions including one running concurrent to the salesroom event for Elizabeth Taylor’s estate.  Away from the Big 2, Heritage Auctions has leveraged the internet with art 50% of the winning bids coming from through their website (Tully, 2013).  “The digital economy is all about individual customization, with companies striving to create a direct relationship with their clients” (Gennochio, 2013).

Section 2: On-line auction catalogs 


There are several reasons why on-line catalogs make sense for auction houses.  It saves on printing costs and time to market, can create a viral buzz, and is a convenient way to keep in touch with clients. Most houses still print catalogs as well as producing them on-line but it is important that both documents contain the same information and corrections can be made directly to both versions. Information is now available to everyone, everywhere (Glazer, 2013).


Art resources have changed with the availability of on-line resources and the ability to aggregate information on the web.  Several indexes of auction catalogs are now available electronically. Artnet provides pricing data and analytics from 700 auction houses around the world with some price data going back to the 1980s (French, 2012). ArtPrice is  an index to results and images from auction sales of art from about 1990 to present with some coverage of auctions from the 1980’s.

The availability of information has brought more buyers into the market and changed the auction landscape but valuing art is still a complicated process. Mike Moses, founder of the Mei Moses index comments “No one can predict the value of an individual work of art.  Sotheby’s and Christie’s have the experts to do that—they offer high and low estimates in the catalog, and yet two-thirds of the works sold fall outside of these estimates. Who knows who’s in the sales room at the day and how much they want the painting?” (French, 2012).


Deborah Kempe, Manager of Collections at FARL began the “Reframing Collections for a Digital Age” project because of concern about the stability of the on-line auction catalog. The art world in general has been sanguine until recently about the safety of works of art and materials that are produced solely on-line (Kempe, 2013). A new sense of urgency is emerging as more librarians and scholars feel the loss of digital assets themselves.  A study completed in 2009 reviewed on-line citations for 5 and 10 year-old research papers.  It found that 25.89% of papers from around 2004 were missing web references and 74.14% of missing citations resulted in an HTTP 404 (page not found) error (Bhat, 2009).  Self-published websites are much less likely to be found over time.

It is clear that the web requires stewardship for preservation but the big questions are: “How do we Archive the Web, Who Should Capture the Web, How Do We Pay for Archiving the Web, and How Do People Find Captured Things in our Library Delivery Systems?” (Kempe, 2012).


As early as 2002, the Bibliothèque Nationale de France (BNF) wanted to archive websites pertaining to the French presidential elections. Event-based websites are by their nature ephemeral but contain important information that can be used later by the media, political scientists, sentiment trackers, and researchers. The BNF used the Internet Archive’s Heritrix software to crawl the web, harvest webpages, and use the Archive-It interface for access. Capturing  all of the information that they wanted was a time-consuming and manual process requiring adjustment of the crawl parameters and downloading webpages individually. (Masanès, 2005).

The BNF study identified the difficulties in archiving election websites including:  inability to track changes to the site, embedded files and links that need to be maintained, diversity of technologies needed to open files, infinite amounts of content, the difficulty of filtering, software that blocks collection, and finding content located deep within the sites structure. The conclusion of this study was that that web content acquisitions must be analyzed on a case-by-case basis. (Masanès, 2005).


Art scholarship continues to evolve as more institutions are able to collaborate and share resources on-line. FARL is involved with several global initiatives to allow remote users to find its treasures and reduce redundancy of records. The Future of Art Bibliography Initiative (FAB) initiated by the Getty Research Institute seeks to “promote open access to art historical content especially digital content” and sees itself as an umbrella organization to capture the electronic records of ever-expanding networks of art consortiums. The goal is to aggregate bibliographic data to create a united art catalog. (Simane, 2013)  In order to be an important contributor to such global initiatives, FARL must be able to offer high-quality records or digital surrogates of its collection. The opportunity to expand its global influence is another motivation for FARL to continue the “Reframing Collections for a Digital Age” project.

Section 3: Preserving the web using Archive-It


The Internet Archive is a non-profit started by tech billionaire Brewster Kahl who sold two data storage companies before he started Internet Archive. Having experience with digital files Kahl recognized the instability of the web and the need to preserve digitally-created content. The Internet Archive itself crawls the web at various intervals with Heritrix software that takes snapshots of individual pages. The pages can then be searched via URL using the Wayback Machine interface (McClure, 2006).  The Wayback Machine current holds 240 billion webpages of archived content (Internet Archive, 2013).

Initially, the Internet Archive developed an archival standard for gathering digital resources. The standard was later revised into an open source format called WebARChive (WARC) under the auspices of the International Internet Preservation Consortium (IIPC) working groups. These tools make up the standard tool-kit for archival web capture around the world. The IIPC was started in 2003 by the National Library of France and now has members from 25 countries whose web-preservation initiatives that can be accessed on-line.

Some of IIPC members use Heritrix software but others have developed their own. The National Library of Australia (NLA) uses their own system called PANDORA. Although PANDORA is not capable of harvesting on the scope of Heritrix, users at the NLA feel that is it more easily customizable and requires less quality control. In addition, there is more flexibility for access since PANDORA does not reply on the Wayback Machine for the front-end (Webb, 2013).

For those who want to initiate their own web preservation projects the Internet Archive offers the Archive-It toolkit by subscription. Archive-It allows users to enter “seed” URL’s that the software will access and harvest. It also captures URL’s that are associated with the seed site. The interface has a reporting tool for users to do quality assurance and an interface to allow access to the harvested materials. There are four types of crawls depending on the goals for the archive:

  • Broad crawling: High band-width crawls to collect a large number of pages with limited quality control.
  • Focused crawling: Small-medium sized crawls of around 10 million pages where the entire site needs to be collected.
  • Continuous crawling: Small-medium crawls of entire sites, possibly around an event
  • Experimental crawling: Used by specific groups who want to investigate crawling and archiving websites that address collections (Mohr, 2004).


Web archiving activities are similar to other library tasks like appraisal and acquisition, organization, storage, creating records and providing access. Web-archiving is different from ‘digital preservation’ it deals with materials that were ‘born-digital’ (Jinfang, 2012). Cataloging a digital object is iterative as websites are dynamic and subject to change unlike a traditional media. Additionally, capturing the desired content can take some trial and error. The basic steps of a web-archiving initiative are:  (Bragg, 2013)

Set Vision and Objectives

Given that web content is essentially infinite it is very important to set clear project objectives and clearly identify priorities for the particular institution.   Organizations need to be sure that the assets they collect support the needs of their users.

Resources and Workflow

Web-preservation adds an additional monetary and time cost to library staff and is my no means automated. Library staff must  make assessments, run crawls, analyze results, create metadata or records and build a user interface or add to existing finding mechanisms. The commitment of resources and workflow guidelines should be developed so the preservation process continues seamlessly.


Mechanisms need to be put in place to assess copyright and permissions issues. Internet Archive hosts the content of the crawl but permissions must be set. Some organizations design a separate user interface for access to the archived websites while others create hyper-linked MARC records in their OPAC. (A. Thurman, personal communication, April 26, 2013) Others link to Archive-It through their websites. Two examples are: Arizona State Library, Archives and Public Records and the Michigan Government Web Collection (Niu, 2012).


Downloading archived content into a digital repository is still rare. Most rely on Archive-It to maintain the content and information. However, continued adjustments to the crawl scope are part of the on-going preservation efforts.

Risk Management

While Section 108 of the Copyright Act provides libraries with the ability to archive materials without asking permission, it doesn’t address digital preservation. (Grotke, 2011) A Section 8 Working Group recommended changes to the Act be made but in the meantime, Internet Archive developed a set of recommendations called the Oakland Archive Policy. These guidelines suggest mechanisms for content holders to opt-out of being crawled or harvested or to allow for limits to public access (School of Information Management and Systems, U.C. Berkeley, 2002).

Individual tasks associated with web-archiving are: Appraisal and Selection, Scoping, Host-constraining or expanding, Data Capture, Storage and Organization, Quality Assurance and Analysis (Bragg, 2013).


The internet is a shared repository for digital culture. Much of the information created on the web is unavailable anywhere else. The life of the content is dependent on the content creator and the duration of a website can be less than 2 months.  These resources are enormous and cannot be preserved by any single entity.  The Library of Congress under the National Digital Stewardship Alliance is working in partnership with many other organizations to take on the task of preserving web-based materials (NDIIPP, 2013) However, there are many challenges capturing the totality and essence of the web. In particular for capturing auction catalogs the problem is that web crawlers are good at capturing websites but are less successful at capturing web-based documents. A group in Australia described the frustration as accepting the limits of what can be saved  and any web archiving initiative can keep this advice in mind:

“ Therefore, the NLA accepts that what is to be preserved is not a mirror representation of the web nor of a website but, rather, a snapshot of content that was once arranged and published as a website, with only limited functionality of the original. The archived artifact is formed out of the collecting process which is inevitably lossy. Our aim is to define and control this loss. In addition, the way in which the content is collected and displayed places a significant limitation on the presentation of the archived artifact as an authentic record of the publisher’s original data or of the version of that data originally published on the web” (Webb, 2013).


There are several large scale web archiving initiatives that have been collecting for several years.  Some of the English-language sites are:

  • IA Wayback Machine (started in 1996)
  • Preserving and Accessing Networked Documentary Resources of Australia (PANDORA) Web Archive created by the National Library of Australia (started in 1996)
  • UK Government Web Archive created by the UK National Archives (started in 1997)
  • New Zealand Web Archive of the National Library of New Zealand (started in 1999)
  • Library of Congress Web Archives (started in 2000)
  • Web Archiving Service (WAS) of the California Digital Library(started in 2003)
  • Government of Canada Web Archive created by Library and Archives Canada (started in 2005)
  • Web Archive Collection Service (WAX) created by Harvard University Library (launched in 2009, piloted in 2006)
  • UK Web Archive provided by the British Library in partnership with the National Library of Wales, Joint Information Systems Committee and the Wellcome Library (started in 2005)

There are now 247 organizations archiving websites through the Internet Archive. They are listed  on the Internet Archive website.  One of the most extensive is Columbia University’s Human Rights Archive.


About the Wayback Machine. Retrieved, April 26, 2013 from
Aubry, Sara. (2010). Introducing web archives as a new library service: The experience of the National Library of France.Liber Quarterly, 20:2, 179–199.
Bertoni, Steven. (May 3, 2011) How former Sotheby’s boss Al Taubman shook up the art world. Forbes Magazine.  Retrieved April 24, 2013, from
Bhat, Mohammad Hanief. (2009). Missing web references-A case study of five scholarly journals. Liber Quarterly, 19:2. Retrieved, April 25, 2013 from
Bragg, Molly and Kristine Hanna, et. al. (2013). The Web Archiving Life Cycle Model. San Francisco: Internet Archive.
Dézallier d’Argenville, A.J. (1745). Abrégé de la Vie des plus Fameux Peintres. Paris: De Bure.
French, Kristen. (2012, May, 16). Building a better art index. Retrieved March 4, 2013 from
Gennochio, Benjamin. (2013, April 21) What will the future of the art world look like? Reading the tea leaves. Art + Auction Magazine. Retrieved May 5, 2013 from
Grotke, Abbie. (December, 2011). Web archiving at the library of congress. Computers in Libraries, 31:10.  15-20.
Harvey, Eleanor Jones. (2002) The Voyage of the Iceberg: Frederic Church’s Arctic Masterpiece. Dallas: Dallas Museum of Art.
Hugenholtz, Liesbeth (Spring 2005) Historical art sales catalogues: The migration of a primary source into an online research tool. Visual Resource Association Bulletin, 31:3. 56-57.
International Internet Preservation Consortium (IIPC). (2013). About IIPC.  Retrieved from:
Jacobsen, Grethe. (2008).  Web archiving: Issues and problems in collection building and access. Liber Quarterly 20:2. P179-199.
Kempe, Deborah. (2011).  “Adventures in Web Archiving: Capturing born-digital content from auction house websites.” SCIPIO Meeting, May 2011.
Kempe, Deborah. (2012). “Reframing Collections for a Digital Age: The challenges of web-based art resources”  Art Libraries Net, Paris, September 2012.
“Lugt, {Frederick Johannes] ‘Frits’”. Dictionary of Art Historians, Lee Sorensen, ed. Retrieved April 24, 2013,
Masanès, Julien. (Summer 2005). Web archiving methods and approaches:A comparative study. In Deborah Woodward-Robinson (Ed.) Digital Preservation: Finding Balance.’ Library Trends, 54:1.  72-90.
Masanès, Julien, (2010) Andreas Rauber and Marc Spaniol. Proceeding, 10th International Workshop, Vienna, Austria. September 22-23, 2010.
Library of Congress. Sustainability of digital formats planning for Library of Congress Collections. Retrieved April 26, 2013, from
McLeod, Lynda. (2001).  Auction catalogs.  In Simon Ford (Ed.), Information Sources in Art, Art History and Design. Munchen: KG Saur. 90-95.
Mohr, Gordon, (2004). Introduction to Heritrix: An open source archival quality web crawler. San Francisco: Internet Archive.
NDIIPP. (2013). Preserving Digital Culture: Digital Preservation. Retrieved  April 29, 2013, from:
Niu, Jinfang. (2012). Functionalities of web archives. D-Lib Magazine, 18:3/4. Accessed January 10, 2013.
Reist, Inge (2004) Auction Catalogs: helping scholars find the facts. The FrickCollection Members’ Magazine, Winter 2004, p64.
Robinson, Lee. (1999). “Auction catlogs and indexes as reference tools”.  ArtDocumentation, 18:1. P. 24-28.
Rathbone, Peter (2010). Buying at Auction. In Diane McManus Jensen (Ed.), The Art of Collecting. New York: Janson Fine Arts, p 193-196.
Raux, Sophie.(2012). “From Mariette to Joullain Provenance and Value in the 18th Centruy French Auction Catalogs”. Gail Feigenbaum and Inge Reist (eds.)  Provenance: an alternate history (p 86-101). Los Angeles, Getty Institute.
Simane, Jan. (2013) The Year of the Snake.  ARLIS/NA Conference, Pasadena, CA.
Webb, Colin, David Pearson and Paul Koerbin. (January/February 2013). ‘Oh, you wanted us to preserve that?!’ Statements of Preservation Intent for the National Library of Australia’s Digital Collections. D-Lib Magazine, 19:1/2.
School of Information Management and Systems, U.C. Berkeley. (2002) Oakland Archive policy. Recommendations for Managing Removal Requests And Preserving Archival Integrity. Accessed, April 20, 2013.
Tully, Judd. (2013, March 12). The Auction House 2.0: How New Strategies, and Growth, May Shift an Old Duopoly. Art + Auction. Retrieved April, 1, 2013 from




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Gretchen Nadasky

Gretchen Nadasky

%d bloggers like this: