雷Bankoski
多亏了数字时代,我们能够获得的信息是前所未有的。虽然单个在线数据库可以承载数以百万计的数字图像,用户可以方便快捷地找到他们通过输入一个简单的搜索项所需的确切件。那么,这个过程可能看似简单to the user, but an incredible amount of work goes into creating the ease of access so many take for granted. This seemingly magical solution is metadata, and it is the key to the world-renown Gale Digital Collections product line produced by Gale, part of Cengage Learning.
Defining Documents
元数据是变化很大取决于它所描述的文档的类型的弹性模型。与书相关联的特点是比一份报纸的不同;这些报纸的那些日记,等等的不同。每种主源的在被称为文档类型定义(DTD)文件中描述,而捕获的元数据是可扩展标记语言(XML)文件输出。
Capturing Metadata
“捕获metadata” is a tad misleading. Metadata elements are actually defined by manually entering information about each and every page image into a file. This is a time consuming task that requires a team of hundreds. Each page is placed under careful scrutiny, requiring an approach unique to its content type. Among the elements captured are page numbers, chapter headings, article titles, and graphic captions (when available). The process requires multiple operators to key in information from the same pages, compare their input, and review the results to ensure the capture is correct.
捕获的第一级已经通过了质量控制检查后,二级开始。这涉及从关注单词移动到被称为智能标签的过程的焦点上的图像片段。首先,所有的图形接收分配的图形类型,如卡通,肖像,或图表,以及与它们相关联的字幕相处记录。这增加的步骤,用户可以搜索特定的图形,如亚伯拉罕·林肯的肖像或此相同的智能标签的技术适用于报纸,期刊和杂志文章食品价格在1856年的图表。可能的分类包括广告,插图,和出生通知。其结果是,最终用户能够限制他们的搜索只返回所需的分类,很容易寻找到他们要找到确切的信息。
Recognizing Hand Scripts
Hand-written notes and documents present special challenges, since this material cannot be captured using the Optical Character Recognition (OCR) software that is used to transform printed text into searchable words. At Gale, we have developed an exciting, new proprietary process that aids in the capture of people’s names, place names, and dates found on these documents. The result is that a wealth of contextual information is being made discoverable for the very first time, thus opening the door to new research opportunities that just weren’t possible before.
继续致力于质量
金宝搏彩票大风捕获数以百万计,每年材料的网页,并致力于提供高质量的元数据来帮助用户发现与易用性和速度这些页面。我们的工作流允许的基本信息快速,准确地采集和利用主题专家时更复杂的捕获必要。收集的每一个的元数据元素也被审查了通过大风操作的质量控制设施精度。金宝搏彩票
Yes, Gale truly has made finding a needle in a large data stack possible. So the next time you use a Gale Digital Collection, consider all that has gone into giving you a quick and accurate search experience.
[警告信息]
About the Author
在他作为副总裁,电子资产管理公司,为圣智学习角色,雷监督珍稀的材质为大风数字集合产品线的采集,转换和质量保证的各个方面。金宝搏彩票
[/警报信息]