swestrup: (Default)
[personal profile] swestrup
I just had a random thought that I figured I should write down before I forget it. When [livejournal.com profile] _sps_ was here on Saturday, I mentioned to him a paper I had seen on the difficulty of storing and retrieving scientific papers that are relevant to a field of research. It has gotten so bad in the field of mathematics that it is now often easier to spend a year re-solving a tricky mathematical problem than it is to find an existing paper with the solution. There is a (woefully underfunded) institute that tries to produce a controlled-vocabulary description of the semantic elements in new papers, and record them. They keep falling further and further behind.

Anyway, [livejournal.com profile] _sps_ had some not-unreasonable ideas on how to encode useful indexes of these math papers so that relevant materials could be searched for. The big question is: how do you do the semantic analysis? For something like Math, you need a human, and one that understands the math as well. Plus, it would help if they just happened to know of all of the other bits of math that the paper overlapped, even if they are in other fields and use different nomenclature.

Anyway, it suddenly occurred to me that it might be possible (I'm not sure how) to design a mathematics-paper search-engine and browser which had the express purpose of eliciting from a mathematician information about the nature of the paper being studied, and how closely its contents matched that mathematicians current work. This would be done, not by asking questions, but by allowing the mathematician to categorize his searches by project, and to pay attention to how long he spent studying various sections of the paper. As well, if we provided various renaming and renomenclaturing systems, we might get further information by observing the transformations that were performed on the paper.

In the end, I would hope the gathered data from a large number of mathematicians could be used to build a fuzzy index of any given paper, and to let us build a map of which things seemed to be close to each other in a semantic space.  I don't know, ultimately, how well such a system would work, but I think it would be worth giving it a try.

Date: 2004-11-08 06:56 pm (UTC)
From: [identity profile] joenotcharles.livejournal.com
Hmm. Seems to me that math, using a technical vocabulary, should be fairly amenable to straight keyword analysis. What am I missing?

Date: 2004-11-08 07:16 pm (UTC)
From: [identity profile] joenotcharles.livejournal.com
For things like "end", "form" and "set", most of the distinct meanings would commonly be found with another set of keywords - the names are often a giveaway, for instance.

Equation of "Hobson's group" and "Gerber manifold" sounds like you'd need to have mathematicians constantly updating the keyword lists to indicate synonyms, but that's much easier than having them review the papers.

Those numbers are a little startling, though - I knew there was overloading, but that's a hell of a difference.

Date: 2004-11-08 07:30 pm (UTC)
From: [identity profile] joenotcharles.livejournal.com
Hmm. I think he points out a problem that needs solving, yes, but I still think you can make big strides with a more intelligent search system before needing to go through and add metadata to every document by hand. He seems to be assuming that "full text search" can be discarded simply because it doesn't work all that well right now.

Interesting problem of taxonomy

Date: 2004-11-09 06:58 am (UTC)
From: [identity profile] ketherian.livejournal.com
Sounds like a problem that's grown beyond the reasonable bounds of taxonomy and metadata Good primer here. It may even go beyond the abilities of Semiotics; in which case I could never see a human being able to create a Controlled vocabulary large enough and flexible enough to keep up with the field.

Studying eye movements across a browser and time spent on the subject while reading would provide some structure in which to base the metadata, but wouldn't you get as much studying the references (e.g.: bibliography, citations, etc) as well? I'm not sure examining the reader of the paper would be as worthwhile as studying the writer. Certainly having both as part of the generation of the metadata would provide interesting results - but don't you normally have to read an entire paper before deciding if it fits within the context of what you are researching? How do you discount those who read the paper and found it was not helpful? If these are a small number, I suppose they'd work themselves out of your index; but it could lead to some very quirky results.

Wee. That was fun. Thanks for the brain stretch.

Re: Interesting problem of taxonomy

Date: 2004-11-09 09:22 am (UTC)
From: [identity profile] sps.livejournal.com
I can't see this. Bibliographies are political things: academics are promoted based on how often they are cited, so when you are considering what to put in a bibliography you are thinking, is this person my friend? Do I owe them a favour? Do I want them to owe me? Can I leave out this person because I hate them so much, or will they then vote against me on the committee? And even then you can only include things you can remember, which for a mathematician (who is trained to work things out over again) isn't usually much!

Date: 2004-11-09 09:24 am (UTC)
From: [identity profile] sps.livejournal.com
Mathematics uses very few concepts, which come up over and over again. Progress in maths is almost entirely about collapsing things together. In fact, since proofs are in principle automatically checkable once found, you could argue that finding the minimal set of keywords is the only content of mathematics (of course, the keywords would be structured into lambda-trees...).

January 2017

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Dec. 26th, 2025 02:28 pm
Powered by Dreamwidth Studios