Preserving Web-Based Auction Catalogs

Gretchen Nadasky IMLS Grant Project Pratt Institute MSLIS LIS-698 Dr. Tula Giannini

Findings and Data

The raw data is included in the attached spreadsheet. auctionwebsiteanalysisdata

Introduction:

Auction catalogs are an essential tool to the sale of art and are foundational text for art researcher. The Frick Art Reference Library (FARL)  in New York has one of the largest collections of sales catalogs and is a recognized leader in art scholarship globally. Web-publishing of auction catalogs is both an opportunity and a preservation concern for art libraries. The “Reframing Collections for a Digital Age” project seeks identify how born-digital auction catalogs can be preserved, archived, described and made accessible to researchers.

This phase of the project focused on prioritizing auction catalog websites for preservation using Archive-It. Web-archiving is not an automated process and can be costly and time-consuming. Identifying specific websites to be preserved can be viewed as a collection development process. However, in order to avoid replicating preservation efforts by the Internet Archive an experiment on the Wayback Machine was conducted to determine if a link to catalogs in the Wayback Machine could be added to FARL records instead of running a customized archiving program.

Methodology:

Three studies were done on 137 auction house websites:

  • A study of the profile of the websites
  • A survey to ascertain if FARL still receives the catalog in print form
  • An investigation of archived auction catalogs in Internet Archive’s Wayback Machine.

Concurrent with this research was the continuation of a live web-archiving pilot project. The final product will be a priority list to web-archive. (The priority list will not be made public).

Results:

Auction Websites Profile Summary:  The studied identified the primary language of 75.9% of the websites is English or German and 76% of the websites contained at least some ‘archive-friendly’ HTML catalog content.

 Language Distributions of FARL Auction House Websites:

language chart

Print Acquisition: The survey revealed that FARL continues to receive print catalogs from 72% auction houses whose websites we examined. A brief review of costs to acquire was also undertaken.  Prices for auction subscriptions can range from $25 per catalog to $4,000 a year.

Auction House Print Status Auction House Print Status Auction House Print Status
Galerie Hassfurther No longer receive Aste di antiquariato Boetto On request Zürichsee Auktionen Discontinued
AB Stockholms Auktionsverket No longer receive Goteborgs Auktionsverk On request Finarte Casa d’Aste Discontinued
Castells & Castells No longer receive Bukowskis On request Auction of illustration On-line only
ALIS Auction House No longer receive James R Bakker On request Nagyházi Galéria With payment
Important American fine art No longer receive Auktion Schneider-Henn On request Farsetti arte With payment
Leonard Joel No longer receive Jeschke Meinke & Hauff On request Hampel With payment
Mü-Terem Galéria No longer receive Lawson’s On request Bloomsbury Auctions With payment
Hagelstam No longer receive Hodgins Art Auctions Ltd On request Kieselbach With payment
Auktionshaus Stahl No longer receive Shinwa Art Auction On request
Cheffins Fine art auctioneers No longer receive Schmidt Kunstauktionen On request
Galerie Národní 25 No longer receive Auktionshaus Ineichen On request
Bay East Auctions No longer receive Casa d’aste Babuino On request
Castellana, Subastas de Arte No longer receive
Semenzato Casa d’aste No longer receive
Fernando Duran, Subastas de Arte No longer receive

Wayback Machine Preservation: The investigation showed that 27% of the auction house websites catalogs  are being “automatically” (though usually only partially)  captured  by the Wayback Machine.  The chart below show some of the technical issues that can arise in web-archiving.  The top three issues are:  the software opens into the live website, the website but not the catalogs are archived, and broken links.

Crawl Issues

IssuesChart

Results from the Archive-It subscription crawls [Specific details can be found in the Weekly Report section of this website]

Initiating and maintaining a web-crawling schedule can be complex and take time and attention to detail.  Once the candidate  or ‘seed’ is chosen a test crawl is run to determine what URL’s, if any can be harvested.  The results of the test crawl reveal if the set is blocked by robots.txt, if there are an excess of unusable pages being collected and if the crawl can capture a large portion of the site.  If the test crawl looks successful seed is added to the regular schedule.  Once the crawl is completed several reports must be examined to see if adjustments must be made.  In addition, at this point a user can actually examine the archived materials for completeness.  One advantage is that the archived site has been indexed by the program and can be key-word searched.

The crawls that were run from September 2012-2013 had varying results and levels of success. Some issues were:

  • Web pages were captured but not the catalogs
  • Catalogs were archived without images
  • The crawler ran out of time and there were many documents in the ‘queue’
  • Catalogs of importance were identified as ‘out-of-scope’
  • Seeds were wholly or partially blocked by robots.txt
  • The catalog area of the site was protected by password (this issue was addressed in the latest version)

Some of these issues could be addressed by changing the scope but overall the process was cumbersome.  The best results came from smaller sites that used mostly HTML technology and those that posted pdf versions of the catalog on pages close to the home page.

Conclusions

The results suggests that auction catalogs made available on-line can become an important resource for for FARL, and preserving these materials is a priority of the library.   Although most auction houses send print catalogs, the trend is starting to change so the shift to collecting web-based materials will need to accelerate.  FARL and other institutions cannot rely on large-scale web-archiving initiatives like Internet Archive to preserve auction catalogs.  The difficulty accessing catalogs underscore that web-crawlers are designed to capture web pages as opposed to documents posted to the web.

Executing web-crawls using Archive-It indicates that even archive-friendly materials can be hard to capture. Web-archiving initiatives requires an investment of time to customize collection policy and crawl scope. Therefore, priorities must be set on which websites can and should be archived.  The attached data provides the criteria that can be sorted to design an efficient web-archiving strategy.  The priority list delivered to FARL is confidential.

The raw data is included in the attached spreadsheet.

auctionwebsiteanalysisdata

Next Steps

The next phase of the “Reframing Collections for the Digital Age” project might include using the priority list developed here to establish a formal preservation and access program for born-digital art resources at FARL.  It would also involve the next step in the workflow of adding item-level metadata to the archived materials and creating a record in FRESCO.  Regular quality assurance checks are necessary to make sure the links work and that the crawls continue to gather the appropriate materials.  In an ideal situation, there would be a dedicated member of the FARL staff to undertake these steps while also engaging with collaborators globally in the art community.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Gretchen Nadasky

Gretchen Nadasky

%d bloggers like this: