Keven’s Blog 数图研究

十一月 24, 2005

“首届元数?与语义研究?网络会议报?

类归于: Theses (论文写作) — keven @ 9:18 下午

会议??称为:First on-Line conference on Metadata and Semantics Research。

从订阅的邮件列表得知这样一个会议,本?是?收费注册的,好?还?便宜,临开会?收到Email说有20个?费???,就去?了一个,于是能够了解网络会议如何开,并感?一下”虚拟”的气氛。

看了内容之??现这是一个高水平的会议,有?少?语义研究和应用实践的牛人,能够?加颇感?幸。

这个?网会议采用一个?机培训的网页,类似于公告?。分四个区域:公告?文档?讨论?链接,本?还有一个User区,???能由于?密原因去掉了。公告区主?由Miguel-Angel Sicilia 主?,作为组织者,??醒?公告皆由其?布。

讨论区是会议主?的互动区域。主?分三个方?的讨论:

1? 邀请报告:分别于05/11/23?25?28三天邀请了三?专家(Dr. Amit P. Sheth?Dr. Tom Gruber?Dr. Ambjörn Naeve)作三个报告,关于专家的建立和报告的主?内容都?以下载。

2? ?组织者询问问题:如何组织?如何进行讨论?为何?用Dokeos平??自我介??会?准备?上载文件和演示等等。

3? ?个相关主题的讨论区,目?有七个议题(如下),目?都由论文??交者准备演示ppt文件,然?大家围绕ppt展开一些讨论(论?形?)。

  • Special session: Open research 开放研究(注?:?是新概念,网络环境下的新型科学研究形?) 将于28日开始,主?:Dr. Miltiadis Lytras
  • Metadata-intensive applications of Semantic Web technologies 语义Web技术中的元数?密集型应用,已于22日开始。
  • Ontologies for the annotation of particular kinds of resources 用于标注特殊信?资?的本体,将于明天(25日)开始
  • Semantic Web approaches to Information Systems 情报系统的语义Web方法(实际上就是情报系统如何应用语义Web技术)已于今天开始
  • Learning Object metedata studies 教育对象元数?研究(这历?是个热门?题,?能IEEE LOM和DC Learning AP都会?与?)已?开始
  • Web Services and Learning Objects 教育对象的Web?务(肯定是语义Web?务,问都?用问?)
  • Metadata and schemas for cultural heritage 有关文化?产的元数??其模?(正是我们所?关心的)

个人以为邀请报告是这次会议的精?所在。到?帖为止还?进行了一个邀请报告,第二个已?开始展开准备工作和讨论了,第三个还?知?题目(?能是我没 查到)。从会议录(目??有18篇文章)看?这次会议水平?差??,从问的问题?看还比较适??入门者,基本上还是进行一些概念澄清和普?工作,但是其中 有一些大牛平时是难得?到他们关于本体?元数??语义Web应用最基本问题的?解和认识的。因而这次会议还是?常好的。总体说?规模并?大,组织者希望能 有100-300人?加,由于讨论的并?热烈,组织者曾多次?通告和email希望大家??交讨论。

?邀报告到目?为止的情况如下:
报告一:
报告人:Dr. Amit P. Sheth,LSDIS Lab, The University of Georgia and Semagix
题 ??:基于语义的?业和科学应用:研究?技术与应用实施(Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications)
主?内容:Dr. Amit Sheth回顾了其所在机构和公??与开?的语义Web在?业界和科研领域的一些应用,主?集中于需求?功能和涉?以下内容的先进技术(review the requirements, capabilities and state of the art technologies related to the following):

  • expressiveness of knowledge representations (ontology representation language),
  • development of large populated ontologies that are regularly updated,
  • automatic metadata extraction and annotation involving heterogeneous textual as well as scientific experiment data,
  • high-performance and scalable query and rule processing
  • reasoning that computes semantic associations leading to identification or discovery of patterns or interesting/suspicious paths and complex relationships,
  • semantic visualization and semantic virtual interfaces for high-bandwidth user interactions with heterogeneous data, metadata and ontologies, and
  • the role of standardards including RDF, RDFS, OWL, SPARQL, SWRL, etc.

?外有说:We will also review our experiences in building practical ontologies that have involved hundreds of classes to a few million instances/assertions, and the approaches to deal with scalability and performance challenges in building real-world applications.
?考/背景资料:

*****这里有个?常好的demo演示了语义Web技术如何应用于?业知识管?:http://www.semagix.com/downloads/product_demo.htm

在这个报告的讨论里?有一些很好的问答(许多问题都是入门问题,并?需??常多的技术基础),???翻译,先放在这里:

Two Questions for Amit by Tanja Sieber – extracting semantics / success of SW?
NAME:Tanja Sieber
ORGANIZATION:University of Miskolc-Hungary and SAP AG, Germany
Country:Hungary/Germany
E-mail (to be notified for the replies on your question):tanja.sieber@t-dos.de
QUESTION:
1. Concerning your topic ‘Semantics Enabled Industrial Applications’ I would be very interested to know if there are any research activities concerning the matching of extracting semantics out of existing ISO-Standards. As an example I would like to mention the process of documentation within an enterprise – depending on the kind of the product different standards have to be considered.
2. Out of a paper of Siegfried Handschuh, 2003: ‘The success of the Semantic Web crucially depends on the easy creation, integration and use of semantic data’ – what do you think, is the actual situation?

re: Two Questions for Amit by Tanja Sieber – extracting semantics / success of SW?
Tanja:
Both questions are important.
(a) Exploiting exisitng standardards such as those by ISO can be highly valuable.
My group has developed a semantic web test bed ontology (Google: SWETO), and an extended version of it called SWETO-GS uses two ISO standardards in the geopgraphic information areas. (Google: SWETO-GS, you will get to the paper that gives details). I do not know of use of ISO standardards in the domain you mentioned.
In a related matter, Martin Hepp has done good work on exploiting exisitng product taxonomies in developing ontologies. Such efforts give insights into possible success of strategies that could exploit ISO and other standardards. His paper comes out in the next issue, issue 2(1) of International Journal on Semantic Web and Information Systems (Google: IJSWIS).
(b) Siegfied is right. There are three fundamental issues in enabling and realizing SW:
1. Creation of ontologies (esp. domain specific ontologies) that capture domain knowledge (it is important that these ontologies are populated with instances, schema alone has little value; many ontologies LSDIS lab and Semagix have created have hundreds of thousands to well over one million instances).
2. Autonamtic extraction of semantic metadata (in some exception cases such as medical literature, the community is doing exceptional job in manual annotation/tagging, but it is unlikely more filed can afford to do that).
3. Reasoning techniques (but here, DL-inferencing in my view is of limited value, what is used even more often is graph analytic techniques, such as path computation for “connecting the dots” applications, subgraph mining, etc).
Hope this adequately responds to your questions.
Amit

Do we need in Sw to speed up the process of knowledge creation and dissemination?
Dear Amit: (and MTSR friends)
I have a simple question: Why you feel that activities like W3C Semantic Web activity do not produce as fast as maybe it is required for the promotion of SW the required knowledge in a period that the whole world is really thirsty for such knowledge?
Best
Miltiadis Lytras

re: Do we need in Sw to speed up the process of knowledge creation and dissemination?
SW activity focuses on creation of standardards–eg right now there is a lots of debate on representation for Semantic Web Services (http://lists.w3.org/Archives/Public/public-sws-ig). Eg. recently my group and IBM submitted a proposal for WSDL-S, and earlier OWL-s, WSMO and FLOWS were submitted.
Another active area is semantic web for life sciences and healthcare (they have posted several show case applications- see: http://www.w3.org/2005/04/swls/
However, to my knowledge W3 is not activiely involved in creation/aggregation of knowledge (I am a W3 Advisory Committee member).
amit

Metadata quality?
Dear Amit,
what is your experience with metadata quality extracted automatically by algorithms or manual by domain experts? Can they be seen equal or does automatic extraction always needs a human evaluation? How did you adress this problem in your various projects?
Thanks.
Sebastian

re: Metadata quality?
Dear Sabastian:
Excellent question.
Let’s start with the most well known example of human/manual metadata annotation, that of PubMed in which all medical documents are annotated by trained persons who use well crafted specialist lexicon and taxonomy. Human also make judgements such as whether a term is a primary term for that document. Such higher quality of well disciplined metadata annotation cannot be matched in any broad domain (such as medical domain) in foreseeable future.
On the other hand, if one needs to extract from and tag 1 million documents per hour, clearly human tagging won’t scale. The annotation that commercial systems such as semantic enhancement engine (Google: Semantic Enhancement Engine) do is pretty good in terms of identfying entitites and phrases of interest, especially because the match/extraction is constrained to a prebuild, highly populated (often over million concepts/instances). Unlike humans, these systems are not quite capable of extracting relationships, although we are doing some research in the context of PubMed where prior domain knowledge captured in schema helps us identify instance level relationships.
Hope this is an adequate initial reply to this question which has many points of discussions.
Amit

re: re: Metadata quality?
Hi,
I want to post a follow up Question please, are there any Metrices like the one used in software enginering or in other areas, that are used to measure the quality of Automatically generated metadata? if not, can you please advice about what is the best way to evaluate auto-generated metadata?
Thanks
Hend

re: re: re: Metadata quality?
This is a difficult topic, just as measuring a quality of ontology is a difficult topic.
In semantic metadata extraction, the question can be posed as follows:
Given a document and an ontology with respect to which semantic extraction is performed, what is the quality (precision and recall) of extraction.
If simple techniques such as exact string match is used, precision would be reasonably good (although an ontology may have a two instances for same label–eg two persons with the same name), which can be further improved with disambiguation techniques, but poor recall. If advanced matching techniques are used (eg fuzzy match, starting with string/lexical matching), recall would improve but precision would suffer.
There are some emerging effort that deal with quality and clean-ness of ontologies (google: OntoQA ; google: Ontoclean), but I have not seen efforts that quantify semantic metadata extraction quality.

报告二:Ontology of Folksonomy: A Mash-up of Apples and Oranges (http://tomgruber.org/writing/ontology-of-folksonomy.htm)
By Tom Gruber
Ontologies are enabling technology for the Semantic Web. They are a means for people to state what they mean by formal terms used in data that they might generate or consume. Folksonomies are an emergent phenomenon of the social web. They are created as people associate terms with data that they generate or consume. Recently the two ideas have been put in opposition, as if they were right and left poles of a political spectrum. This piece is an attempt to shed some cool light on the subject, and to preview some new work that applies the two ideas together to enable an Internet ecology for folksonomies.

这个Tom Gruber我想?用介?了??Ontology的大牛,真正的大牛,如果想了解?以去看他的主页:http://tomgruber.org/
关于这个报告的问答(目??有一?)

Hans LENZ asking Tom: Modern repositories used in DW/DBMS are missing semantic metadata
NAME:Hans-J. LENZ
ORGANIZATION:Bussiness Intelligence Group, Free Univ. Berlin
Country:Germany
E-mail (to be notified for the replies on your question):hjlenz@wiwiss.fu-berlin.de
QUESTION:Modern repositories used in DW/DBMS are missing semantic metadata useful for math. modelling, prediction and monitoring schemes. Where are the contributions to a UML /ERD modelling of metadata expanding ideas of Lenz (1984) in SSDBM 1984? How much to expand repositories w.r.t. semantic metadata?

re: Hans LENZ asking Tom: Modern repositories used in DW/DBMS are missing semantic metadata
Relational Databases are highly optimized machines. They can store metadata of almost any type, but they only perform a special class of inferences. Commercial DBMSs have been moving in the direction of resoning over special purpose metadata, such as information retrieval (IR), Geographical Information (GIS), or parametric search. However, in these cases, special purpose software outside of the database usually out performs the DB at the same task (except when accounting for the needs for tight integration with storage, such as realtime).
New types of metadata such as you mentioned for math or UML modeling has been the province of special applications such as CAD systems, whose reasoning engines are also above and outside of the database. I don’t see this changing, as long as the database does its core job well. If the commercial object-oriented databases had survived the 1990’s, perhaps they would have been the storage medium for these modeling metadata systems.
I suspect this is was not exactly what you were asking, but I hope it gets at the issue and perhaps others could respond.
tom

1 条评论 »

  1. [...] MTSR会议(http://mtsr.sigsemis.org/)的网上讨论?过延长,已于今天(12月5日)结?,?知网站今?还能?能进去,在此想把会议讨论的东西?总结一下。主?是三个特邀报告和7个专题的情况,作为11月24日的帖?“首届元数?与语义研究?网络会议报?的补充。由于?附原文,比较长,这里先是三个特邀报告的一些讨论补充。 [...]

    Pingback 由 Keven’s Blog 数图研究 » MTSR会议?续报?:三个特邀报告的讨论 — 十二月 24, 2005 @ 11:42 下午


RSS方式表示的feed TrackBack URI

留下评论

Blog at WordPress.com.