Re: [Zope-dev] Re: Spitter.c Hack

2001-01-05 Thread Casey Duncan

Jason Spisak wrote:
> 
> Zopists,
> 
> I finally got Splitter.c to let me index numbers and 'C++' in a TextIndex.
> I have about 50,000 objects in that index, and search performance is nearly
> instantaneous still.  I am running on a big machine though.  If anyone
> wants those changes there's really easy.  Just mail me directly, since it's
> a long file to post.

Could you maybe post just the diff for poserity?

-- 
| Casey Duncan
| Kaivo, Inc.
| [EMAIL PROTECTED]
`-->

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] Re: Spitter.c Hack

2001-01-05 Thread Jason Spisak

Casey Duncan:

It truely is nothing more than cutting out the two parts that eliminate
single letter words and numbers:

*** Zope-2.2.4-src/lib/python/SearchIndex/Splitter.c 
--- Zope-2.2.4-src/lib/python/SearchIndex/Splitter_Old.c 
***
*** 169,192 
  len = PyString_Size(word) - 1;
  
  len = PyString_Size(word);
- /*if(len < 2)  Single-letter words are stop words!
- {
-   Py_INCREF(Py_None);
-   return Py_None;
- } */
- 
- /*
-   Test whether a word has any letters.   */
  
  for (; --len >= 0 && ! isalpha((unsigned char)cword[len]); );
- /*if (len < 0)
- {
- Py_INCREF(Py_None);
- return Py_None;
- }
- 
-  * If no letters, treat it as a stop word.
-  */
  
  Py_INCREF(word);
  
--- 169,176 


All my best,


Jason Spisak
CIO
__ ___   ____
   / // (_)_/_  __/__ / /  ___  ___  __ _
  / _  / / __/ -_) / / -_) __/ _ \(_-<_/ __/ _ \/  ' \
 /_//_/_/_/  \__/_/  \__/\__/_//_/___(_)__/\___/_/_/_/

6151 West Century Boulevard
Suite 900
Los Angeles, CA 90045
P. 310.665.3444
F. 310.665.3544

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] Re: Spitter.c Hack

2001-01-05 Thread Jason Spisak

Erik,

> [Jason Spisak]
> 
> | I am running on a big machine though.  If anyone wants those changes
> | there's really easy.  Just mail me directly, since it's a long file
> | to post.
> 
> Hi.  I would be interested in the file :-).
> 

Okay, here's the diff. It truely is nothing more than cutting out the two
parts that eliminate single letter words and numbers:

*** Zope-2.2.4-src/lib/python/SearchIndex/Splitter.c 
--- Zope-2.2.4-src/lib/python/SearchIndex/Splitter_Old.c 
***
*** 169,192 
  len = PyString_Size(word) - 1;
  
  len = PyString_Size(word);
- /*if(len < 2)  Single-letter words are stop words!
- {
-   Py_INCREF(Py_None);
-   return Py_None;
- } */
- 
- /*
-   Test whether a word has any letters.   */
  
  for (; --len >= 0 && ! isalpha((unsigned char)cword[len]); );
- /*if (len < 0)
- {
- Py_INCREF(Py_None);
- return Py_None;
- }
- 
-  * If no letters, treat it as a stop word.
-  */
  
  Py_INCREF(word);
  
--- 169,176 



> Would you also be willing to share some statistics on how many objects
> you have in how many indexes, and how much time "complex" searches
> take?  I do understand if this is not possible, but it'd be appetiated
> if it was possible. :-)
> 
> Thanks.

Well, here's the some output of the "Status" tab in the Catalog.

Subtransactions are Disabled

 Subtransactions

  -

Index Status

   * 48205 object are indexed in bobobase_modification_time
   * 48205 object are indexed in calendar_date
   * 48205 object are indexed in calendar_day
   * 48205 object are indexed in call_date
   * 48205 object are indexed in curators
   * 48205 object are indexed in data
   * 48205 object are indexed in id
   * 48205 object are indexed in meta_type
   * 48205 object are indexed in resume_in
   * 48205 object are indexed in status
   * 48205 object are indexed in users_calendar

The only TextIndex is the 'data' index though.  It is the one that gets
hammered.

Let's see...time stats...hmmm

I put a REQUEST.set with the ZopeTime at the top of the search page and at
the bottom after the 'in' tag for the Catalog. 

Search terms are:  los and angeles and C++ and MFC and 310

Subtracting the float of the two times I get 1.85400104523  I'm not sure
what that comes out to, I think it's part of a day though because of
DateTime.

The server stats:

Dual Intel 400mhz Xenon w/ 1MB cache each
LVD RAID 5 7200 RPM disk array
1GB RAM
RedHat Linux 6.1 with some kernel updates...
And the best piece of open source software I know:  Zope 2.2.4 binary
release
 
Hope that helps.


All my best,


Jason Spisak
CIO
__ ___   ____
   / // (_)_/_  __/__ / /  ___  ___  __ _
  / _  / / __/ -_) / / -_) __/ _ \(_-<_/ __/ _ \/  ' \
 /_//_/_/_/  \__/_/  \__/\__/_//_/___(_)__/\___/_/_/_/

6151 West Century Boulevard
Suite 900
Los Angeles, CA 90045
P. 310.665.3444
F. 310.665.3544

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] Re: Spitter.c Hack

2001-01-06 Thread Tres Seaver

> From: "Jason Spisak" <[EMAIL PROTECTED]> wrote:
> 
> Zopists,
> 
> I finally got Splitter.c to let me index numbers and 'C++' in a TextIndex. 
> I have about 50,000 objects in that index, and search performance is nearly
> instantaneous still.  I am running on a big machine though.  If anyone
> wants those changes there's really easy.  Just mail me directly, since it's
> a long file to post.

Can you post a patch, or upload it to your Zope.org member folder
and post the link?

  cvs -q diff -u lib/python/SearchIndex/Splitter.c

would do it, if you were working in a CVS sandbox for Zope.

Tres.
-- 
===
Tres Seaver[EMAIL PROTECTED]
Digital Creations "Zope Dealers"   http://www.zope.org

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] Re: Spitter.c Hack

2001-01-08 Thread Jason Spisak

Tres:

Okay, I uploaded it to my member folder.

http://www.zope.org/Members/jspisak/Splitter/

I wasn't usinga sandbox for this, I just downloaded the source for 2.2.4

Here's the diff -u though:

--- Zope-2.2.4-src/lib/python/SearchIndex/Splitter.cThu Jan  4 10:41:15
2001
+++ Zope-2.2.4-src/lib/python/SearchIndex/Splitter_Old.cFri Jan  5
17:29:43 2001
@@ -169,24 +169,8 @@
 len = PyString_Size(word) - 1;
 
 len = PyString_Size(word);
-/*if(len < 2)   Single-letter words are stop words!
-{
-  Py_INCREF(Py_None);
-  return Py_None;
-} */
-
-/*
-  Test whether a word has any letters.   */
 
 for (; --len >= 0 && ! isalpha((unsigned char)cword[len]); );
-/*if (len < 0)
-{
-Py_INCREF(Py_None);
-return Py_None;
-}
-
- * If no letters, treat it as a stop word.
- */
 
 Py_INCREF(word);
 


Let me know what else I can do.  Did you see my other mails regarding
stats?

> > From: "Jason Spisak" <[EMAIL PROTECTED]> wrote:
> > 
> > Zopists,
> > 
> > I finally got Splitter.c to let me index numbers and 'C++' in a TextIndex. 
> > I have about 50,000 objects in that index, and search performance is nearly
> > instantaneous still.  I am running on a big machine though.  If anyone
> > wants those changes there's really easy.  Just mail me directly, since it's
> > a long file to post.
> 
> Can you post a patch, or upload it to your Zope.org member folder
> and post the link?
> 
>   cvs -q diff -u lib/python/SearchIndex/Splitter.c
> 
> would do it, if you were working in a CVS sandbox for Zope.
> 
> Tres.
> -- 
> ===
> Tres Seaver[EMAIL PROTECTED]
> Digital Creations "Zope Dealers"   http://www.zope.org

All my best,


Jason Spisak
CIO
__ ___   ____
   / // (_)_/_  __/__ / /  ___  ___  __ _
  / _  / / __/ -_) / / -_) __/ _ \(_-<_/ __/ _ \/  ' \
 /_//_/_/_/  \__/_/  \__/\__/_//_/___(_)__/\___/_/_/_/

6151 West Century Boulevard
Suite 900
Los Angeles, CA 90045
P. 310.665.3444
F. 310.665.3544

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )