Thread: [PyIndexer] Thoughts on MySQL Implementation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

A couple of observations:

The index population could seems pretty inefficient to
me in sqlindexer. You are essentially sending one sql
INSERT per word when something is indexed. I think it
would be worthwhile to test an implementation that
uses a single LOAD DATA statement when the text is
indexed.

This would involve writing the rows to be inserted
into textindex to a temporary file and then using LOAD
DATA to batch load it to mySQL. See the following for
docs on LOAD DATA:

http://www.mysql.org/documentation/mysql/bychapter/manual_Reference.html#LOAD_DATA

As the search end, it seems to me that app side
processing will be best for positional matches, such
as for phrases. I'd imagine such processing should be
eventually coded in C, but it should be acceptable in
Python if done efficiently (like using Python arrays,
IISets or some-such).

You might want to check out this Guido-essay for some
ideas here:

http://www.python.org/doc/essays/list2str.html

I agree that storing a document count for each word
could help with optimizing since you could start with
the smallest dataset first and prune it from there.
Perhaps IISets could be used to get UNION/INTERSECT
functionality efficiently if mySQL can't do it for
you.

As for partial indexes, they may help, especially
reducing memory paging and improving cache usage. Here
are the docs for that:

http://www.mysql.org/documentation/mysql/bychapter/manual_Reference.html#CREATE_INDEX

It would probably be worthwhile to test times for a
relatively large size (32 bytes?) vs a small one (4 or
8 bytes?). Sometimes less is more 8^). Its near
impossible to tell which would be faster on a given
architecture without real-world testing.

BTW: Have you seen mySQL's built-in full-text indexing
support?

http://www.mysql.org/documentation/mysql/bychapter/manual_Reference.html#Fulltext_Search

Not sure whether it does everything we need, but it
probably worth a look...

-Casey

__________________________________________________
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

Thread: [PyIndexer] Thoughts on MySQL Implementation

pythonindexer-discuss