Preserving Web-Based Auction Catalogs

Gretchen Nadasky IMLS Grant Project Pratt Institute MSLIS LIS-698 Dr. Tula Giannini

Using Archive-It


THE TOOLS – The Internet Archive crawls the web at various intervals with Heritrix software that takes snapshots of individual pages.  The pages can be searched by URL using the Wayback Machine interface. A customizable version called Archive-It allows users to enter “seed” URL’s that the software will access and harvest.  The crawler also captures URL’s that are associated with the seed site.  The interface has a reporting tool for users to do quality assurance and an interface to allow access to the harvested materials.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


This entry was posted on April 26, 2013 by .


Gretchen Nadasky

Gretchen Nadasky