Collaborative Semantic Analysis.
Nov. 8th, 2004 08:35 pmI just had a random thought that I figured I should write down before I forget it. When
_sps_ was here on Saturday, I mentioned to him a paper I had seen on the difficulty of storing and retrieving scientific papers that are relevant to a field of research. It has gotten so bad in the field of mathematics that it is now often easier to spend a year re-solving a tricky mathematical problem than it is to find an existing paper with the solution. There is a (woefully underfunded) institute that tries to produce a controlled-vocabulary description of the semantic elements in new papers, and record them. They keep falling further and further behind.
Anyway,
_sps_ had some not-unreasonable ideas on how to encode useful indexes of these math papers so that relevant materials could be searched for. The big question is: how do you do the semantic analysis? For something like Math, you need a human, and one that understands the math as well. Plus, it would help if they just happened to know of all of the other bits of math that the paper overlapped, even if they are in other fields and use different nomenclature.
Anyway, it suddenly occurred to me that it might be possible (I'm not sure how) to design a mathematics-paper search-engine and browser which had the express purpose of eliciting from a mathematician information about the nature of the paper being studied, and how closely its contents matched that mathematicians current work. This would be done, not by asking questions, but by allowing the mathematician to categorize his searches by project, and to pay attention to how long he spent studying various sections of the paper. As well, if we provided various renaming and renomenclaturing systems, we might get further information by observing the transformations that were performed on the paper.
In the end, I would hope the gathered data from a large number of mathematicians could be used to build a fuzzy index of any given paper, and to let us build a map of which things seemed to be close to each other in a semantic space. I don't know, ultimately, how well such a system would work, but I think it would be worth giving it a try.
Anyway,
Anyway, it suddenly occurred to me that it might be possible (I'm not sure how) to design a mathematics-paper search-engine and browser which had the express purpose of eliciting from a mathematician information about the nature of the paper being studied, and how closely its contents matched that mathematicians current work. This would be done, not by asking questions, but by allowing the mathematician to categorize his searches by project, and to pay attention to how long he spent studying various sections of the paper. As well, if we provided various renaming and renomenclaturing systems, we might get further information by observing the transformations that were performed on the paper.
In the end, I would hope the gathered data from a large number of mathematicians could be used to build a fuzzy index of any given paper, and to let us build a map of which things seemed to be close to each other in a semantic space. I don't know, ultimately, how well such a system would work, but I think it would be worth giving it a try.
no subject
Date: 2004-11-08 06:56 pm (UTC)no subject
Date: 2004-11-08 07:08 pm (UTC)no subject
Date: 2004-11-08 07:13 pm (UTC)Mathematical knowledge management is needed
no subject
Date: 2004-11-08 07:16 pm (UTC)Equation of "Hobson's group" and "Gerber manifold" sounds like you'd need to have mathematicians constantly updating the keyword lists to indicate synonyms, but that's much easier than having them review the papers.
Those numbers are a little startling, though - I knew there was overloading, but that's a hell of a difference.
no subject
Date: 2004-11-08 07:30 pm (UTC)no subject
Date: 2004-11-08 07:56 pm (UTC)Interesting problem of taxonomy
Date: 2004-11-09 06:58 am (UTC)Studying eye movements across a browser and time spent on the subject while reading would provide some structure in which to base the metadata, but wouldn't you get as much studying the references (e.g.: bibliography, citations, etc) as well? I'm not sure examining the reader of the paper would be as worthwhile as studying the writer. Certainly having both as part of the generation of the metadata would provide interesting results - but don't you normally have to read an entire paper before deciding if it fits within the context of what you are researching? How do you discount those who read the paper and found it was not helpful? If these are a small number, I suppose they'd work themselves out of your index; but it could lead to some very quirky results.
Wee. That was fun. Thanks for the brain stretch.
Re: Interesting problem of taxonomy
Date: 2004-11-09 08:34 am (UTC)As to the question of detecting perceived relevance, that shouldn't be too hard. My own experience of doing paper searches is that one first reads the abstract to see if the article might be relevant, then one skims the article to get the gist of what its really about, and if they even talk about the area you are interested in. Finally, if all looks good, you read the whole thing carefully. If you measure how long someone spends on each of these activities, you get a good idea of how relevant they think it is. As well, if they drop the item into a 'relevant citation' list, you note that. You also let them drop it into a 'checked but irrelvant' list so they know they've already looked at it, and that gives you added data. If done carefully, you can do it all with an instrumented browser and an eye tracker isn't even necessary (although it would help).
Re: Interesting problem of taxonomy
Date: 2004-11-09 09:22 am (UTC)no subject
Date: 2004-11-09 09:24 am (UTC)