Preserving Web-Based Auction Catalogs

Gretchen Nadasky IMLS Grant Project Pratt Institute MSLIS LIS-698 Dr. Tula Giannini

Using Archive-It

images-3

THE TOOLS – The Internet Archive crawls the web at various intervals with Heritrix software that takes snapshots of individual pages.  The pages can be searched by URL using the Wayback Machine interface. A customizable version called Archive-It allows users to enter “seed” URL’s that the software will access and harvest.  The crawler also captures URL’s that are associated with the seed site.  The interface has a reporting tool for users to do quality assurance and an interface to allow access to the harvested materials.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on April 26, 2013 by .

Navigation

Gretchen Nadasky

Gretchen Nadasky