On Wed, Mar 4, 2009 at 12:24 PM, John Machin <sjmac...@lexicon.net> wrote:

> On 4/03/2009 2:48 PM, Peng Huang wrote:
> > On Wed, Mar 4, 2009 at 11:40 AM, John Machin <sjmac...@lexicon.net>
> wrote:
> >
> >> On 4/03/2009 2:12 PM, Peng Huang wrote:
> >>> Hi Igor Tandetnik,
> >>>
> >>> Thanks for your quick reply.
> >>>
> >>> Your solution works. But in some cases, each y%d may has two or three
> >>> choices. So the SQL will become very complex, we need ( 2 * 2 * 2 * 2)
> >> sub
> >>> where statements. Does SQLite have some build-in features to optimize
> >> those
> >>> kinds of SQL statements? Or do you have other suggestions to optimize
> the
> >>> database of SQL statements?
> >>>
> >>> For example:
> >>> "SELECT * FROM py_phrase WHERE ylen = ? and (y0 = ? or y0 = ?) and (y1
> =
> >> ?
> >>> or y1 =? ) and (y2 = ? or y2 = ?) and (y3 = ? or y3 = ? or y3 = ?)
> ORDERY
> >> BY
> >>> user_freq DESC, freq DESC"
> >>>
> >> (y3 = ? or y3 = ? or y3 = ?)
> >> is *logically* equivalent to
> >> (y3 IN (?, ?, ?))
> >>
> >> Does it run at the same speed?
> >
> >
> > Its improvement is not obviously. :(
>
> OK, next try:
>
> Input is 1 to 4 pinyin characters ... with tone or without tone?
> Even with tone, (number of different pinyin characters (a few hundred,
> about 410 IIRC)) * 5 tones will fit easily in 16 bits. So a possible
> pronunciation of a 4-character phrase will easily fit in a 64-bit number.
>
> So instead of complications with y0, y1, y2, y3 just use one composite
> number e.g. (((y0 * 10000 + y1) * 10000) + y2 * 10000) + y3 -- note that
> for phrases shorter tan 4 characters, you would need a special code
> (zero) for "no character at all".
>
> Instead of complicated logic to generate varing sizes and shapes of
> <query1> union all <query 2> ...., you just have to produce a list of
> the possible composite keys and select * from the_table where
> composite_key in (?, ?, ?,...)
>

This is a good idea to make thing easily. But it is not suitable for other
kinds of use cases. I don't know if you know Chinese and Hanyu Pinyin. Let
me explain it for you. Most of Chinese chars' pronunciation (Pinyin) has two
parts - vowel and consonant. One use case is some users will not type
complete Pinyin for each Chinese chars. For example: User types complete
Pinyin for first char, but just types consonant for second and third chars.
In this case, I will use below SQL cmd. The columns s%d is for chars'
consonant. So composing four y%d into one is not suitable. :(


>
> I'd be interested to get Igor's thoughts on how much speed up might be
> achievable by doing this.


For Igor's idea. I did not write test code. Just type SQL command in sqlite3
program. The improvement is obviously. I think it is acceptable for user
typing.


>
>
> Another thought: if you have enough memory (is this for a PC or a
> phone?), an in-memory hash-table might be a good idea.


It is a desktop service. Other applications need it to type Chinese. So it
should not use many memory. BTW, the database of phrases is about 100M
includes index.


>
>
> HTH,
>
> John
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to