Menu

#118 quality of keywords

open
5
2007-09-11
2007-09-11
DianeE_OU
No

I have just been checking the results of KWE against the metadata already assigned for the LOs I had hoped to use for the KWE trial. This is an example for one of the LOs.

The keywords bear no ressemblance to those before.
Plus the results this week are not as good as before the latest changes at the end of last week.

Discussion

  • DianeE_OU

    DianeE_OU - 2007-09-11

    the results from examination of LOs - current keywords vz previously assigned

     
  • Alex Killing

    Alex Killing - 2007-09-11

    Logged In: YES
    user_id=131330
    Originator: NO

    Lukasz, could this somehow be related to the latest changes?

     
  • Alex Killing

    Alex Killing - 2007-09-11
    • assigned_to: alexkill --> rybencjusz
     
  • Łukasz Degórski

    Logged In: YES
    user_id=559872
    Originator: NO

    1. Claudia changed the English linguistic model (which POS tags qualify words/sequences as potential keyword candidates). The new model is more compact, so some candidates that qualified before the changes may be disqualified now. On the other hand, we hoped to filter out more junk. This is a matter for discussion with Claudia, as I just put her changes into the code.

    2. A bug was corrected (for all languages) that made the qualifier ignore "CMLU" category (=POS tags that qualify a word as a potential part of a multi-word keyword candidate, but not as a single-word candidate, for instance VBP, VBD, VBZ, VBN in the current model). Before the bug was corrected, CMLUs were treated as MLUs, i.e. could form one-word keywords. This bugfix can only make the results better - if the linguistic model is sane; the only possible change caused by this bugfix is that some one-word keywords (assigned CMLU category) proposed in the old version are not proposed now. If this is the case, the linguistic model should be changed - appropriate POS should be assigned MLU, not CMLU category. CMLU means "I don't want words of this POS to be one-word keywords".

    I can't find the comparison part ("current keywords vz previously assigned") in the file. Nonetheless, it's another thing to be analysed with the knowledge of English POS categories: what candidates disappeared, what was their "POS pattern" (e.g. "JJ NN"), is this pattern forbidden now, what candidates appeared, what was their POS pattern, was this pattern allowed before, ...

    Another possibility is that Mirek did not build the current English language model from scratch (AFAIK he did, but he didn't answer Claudia's question openly yet).

     
  • Alex Killing

    Alex Killing - 2008-03-16

    Logged In: YES
    user_id=131330
    Originator: NO

    Can we close this one?

     

Log in to post a comment.

MongoDB Logo MongoDB