Pat could you take a look at: http://dl.dropbox.com/u/1175879/output.zip
As you can see, this is an output for just 1 restaurant and it's totally weird :\ Best Regards, Pedro On Mar 3, 6:57 am, Pat Allan <[email protected]> wrote: > Hi Pedro > > Not that it's particularly helpful, but this is definitely not how Sphinx > behaves. It definitely should not take an hour to index 15 documents. It > shouldn't even take a minute, unless the query is particularly complex - and > this isn't what I'd call complex! > > The reason for adding the group_by call, is that PostgreSQL is strict with > the SQL it expects - anything that isn't within an aggregation function > should be mentioned in the GROUP BY clause. > > Thinking Sphinx will do this automatically for any fields it references to - > but it can't look into SQL snippets to figure out which columns are used, > hence why you need to manually add it yourself, because of your CRC'd state > attribute. > > The repeated data is definitely odd. Thinking Sphinx concatenates data from > columns in associations, but the query shouldn't duplicate data (unless the > record is attached to many associated records that look the same - but that's > not a SQL problem, that's a data problem). > > When you limit the query to one restaurant, can you confirm it's joining only > on the menu categories and menu items you're expecting? (Add the id columns > from both tables to the start of your query, and to the GROUP BY clause). > > -- > Pat > > On 03/03/2011, at 6:13 AM, Pedro Cunha wrote: > > > > > > > > > Hello Pat, > > > Have been investigating last days and I'm almost at the same place. > > > I'm using PostgreSQL. > > > As far as I read we need to trust how postgres analyses queries and he > > can choose or not to use indexes. > > > Herehttp://explain.depesz.com/s/sA9is a bit (only a bit..) more > > detailed the EXPLAIN ANALYSE. > > > I've been digging through the query that TS generates, and for the 10 > > entities (restaurants) i'm trying to index, 8500~ rows are created. > > > What is the reason for when including all "has x" to use "group by"? > > > Also I noticed something weird, imagine I pick one of the lines of the > > output, and now pick a column, is it supposed to see lots of > > information repeated? like: > > > ".. Negócios Negócios Negócios Negócios Negócios Negócios Sair a dois > > Negócios Negócios Negócios Negócios Negócios Negócios Negócios Sair a > > dois Negócios Negócios Negócios Sair a dois Negócios Sair a dois > > Descontrair Sair a dois Sair a dois Sair a dois Sair a dois Sair a > > dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair > > a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois > > Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a > > dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair > > a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois > > Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a > > dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair > > a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois > > Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a > > dois Sair a dois Sair a dois ..." > > > Production database is at the moment with 15 restaurants.. and it's > > taking 1h to index > > > As DB grows, time is not incremental but exponential and this is > > worrying me a lot. Also because delta indexing is taking around 15~ > > mins. > > > Is it supposed that 1:N associations decay so badly? > > > I'm running out of ideas and considering breaking references for this > > case to save one inner join to happen. > > > Restaurant (1 : N ) Menu Categories > > Menu Categories ( 1 : N ) Menu Items <====== Shortcut this one > > > Best regards, > > Pedro > > > On Feb 24, 3:59 am, Pat Allan <[email protected]> wrote: > >> Hi Pedro > > >> Wow, PostgreSQL's EXPLAIN output is pretty complex! > > >> I'm not spotting anything obviously odd, but I'm not sure what to look for. > > >> Perhaps you try it all again, but comment out all the fields and > >> attributes except those from menu items? Indexing should be a bit faster, > >> but I'd imagine there's still a speed hit to some extent. I'm not sure if > >> that'll shine any light on the situation though... > > >> Is this all on your development environment? If it's production or > >> staging, are you sure that the database indexes on the foreign keys exist? > >> (ie: latest migrations have been run) > > >> -- > >> Pat > > >> On 24/02/2011, at 11:47 AM, Pedro Cunha wrote: > > >>> Hello Pat, > > >>> Here is the link: > >>>https://gist.github.com/841520 > > >>> With: > >>> - define_index block > >>> - sql query on conf > >>> - explain > >>> - explain analyse > > >>> Thanks in advance > > >>> On Feb 23, 11:51 pm, Pat Allan <[email protected]> wrote: > >>>> Hi Pedro > > >>>> Perhaps post the EXPLAIN and EXPLAIN ANALYSE outputs to a gist? To be > >>>> honest, given I've not used the PostgreSQL equivalents, I'm not sure how > >>>> much sense I'll be able to make out of it, but let's start with that. > > >>>> Also, can you add the SQL query to the gist as well? > > >>>> Cheers > > >>>> -- > >>>> Pat > > >>>> On 24/02/2011, at 6:07 AM, Pedro Cunha wrote: > > >>>>> Hey Pat, I'm using PostgresSQL, and command is the same. > > >>>>> I tried looking at the output and it kinda looks like chinese to me. > >>>>> From what I see, I can see no reference to indexes whatsoever. That > >>>>> can be a problem. According to Postgres docs I must do an ANALYZE > >>>>> <table_name> to make pg server aware of an index on that specific > >>>>> table. > > >>>>> After I ve done that and run the EXPLAIN and EXPLAIN ANALYSE I still > >>>>> can't see any reference to indexes. That might be a problem. > > >>>>> Do you want me to copy paste here the whole log or is there some way I > >>>>> can attach that file to this thread? (it's a bit lengthy) > > >>>>> Best regards > > >>>>> On Feb 23, 12:31 am, Pat Allan <[email protected]> wrote: > >>>>>> Hi Pedro > > >>>>>> Given you've got the database indexes, this definitely feels far too > >>>>>> slow. Are you using MySQL or PostgreSQL? > > >>>>>> If you're using MySQL, then I would recommend taking the generated > >>>>>> sql_query value from the restaurant_core_0 source in your sphinx > >>>>>> configuration file, remove the $start and $end parts of the WHERE > >>>>>> clause, and add EXPLAIN to the start of the query. Then, run this > >>>>>> query in a MySQL console: > > >>>>>> EXPLAIN SELECT ... > > >>>>>> This should give you some information on what database indexes are > >>>>>> being used for the query. > > >>>>>> I'm not sure what the equivalent of the EXPLAIN command for PostgreSQL > >>>>>> is, but I'd be surprised if such a thing didn't exist. > > >>>>>> Cheers > > >>>>>> -- > >>>>>> Pat > > >>>>>> On 23/02/2011, at 1:32 AM, Pedro Cunha wrote: > > >>>>>>> Hello there, > > >>>>>>> I'm having a weird slow indexing. Tried looking for similar issues on > >>>>>>> this group and couldn't find. > > >>>>>>> When I index an has_many through association, perfomance decays a lot. > >>>>>>> To help trace this issue I ll give as much info as I can. > > >>>>>>> So the association looks like this: > > >>>>>>> Restaurant (1 : N ) Menu Categories > >>>>>>> Menu Categories ( 1 : N ) Menu Items > > >>>>>>> Index on table menu_categories for restaurant_id (DB level) > >>>>>>> Index on table menu_items for menu_category_id (DB level) > > >>>>>>> define_index on restaurant > >>>>>>> ... > >>>>>>> indexes menu_categories.menu_items(:name), :as => :menu_item_name > >>>>>>> indexes menu_categories.menu_items(:subtitle), :as > >>>>>>> => :menu_item_subtitle > >>>>>>> indexes menu_categories.menu_items(:description), :as > >>>>>>> => :menu_item_description > >>>>>>> ... > >>>>>>> end > > >>>>>>> Without the indexes before, and with a lot of other stuff I get this > >>>>>>> perfomance: > > >>>>>>> indexing index 'restaurant_core'... > >>>>>>> collected 12 docs, 3.0 MB > >>>>>>> collected 122936 attr values > >>>>>>> sorted 0.2 Mvalues, 100.0% done > >>>>>>> sorted 2.2 Mhits, 100.0% done > >>>>>>> total 12 docs, 2973437 bytes > >>>>>>> total 16.741 sec, 177611 bytes/sec, 0.71 docs/sec > >>>>>>> distributed index 'restaurant' can not be directly indexed; skipping. > > >>>>>>> With the indexes > > >>>>>>> indexing index 'restaurant_core'... > >>>>>>> collected 12 docs, 15.9 MB > >>>>>>> collected 423920 attr values > >>>>>>> sorted 0.8 Mvalues, 100.0% done > >>>>>>> sorted 11.4 Mhits, 100.0% done > >>>>>>> total 12 docs, 15885288 bytes > >>>>>>> total 320.783 sec, 49520 bytes/sec, 0.03 docs/sec > >>>>>>> distributed index 'restaurant' can not be directly indexed; skipping. > > >>>>>>> table ids are pretty small so I think that issue is not causing > >>>>>>> problems.. still I tried adding the sql step config and performance > >>>>>>> stays the same > > >>>>>>> Properties on define_index are: > > >>>>>>> set_property :min_prefix_len => 3 > >>>>>>> set_property :stopwords => "#{RAILS_ROOT}/config/sphinx/ > >>>>>>> stoplist.txt" > >>>>>>> set_property :delta => ThinkingSphinx::Deltas::ResqueDelta > > >>>>>>> Tried commenting min_prefix_len and perfomance keeps the same.. > > >>>>>>> Running out of ideas :( > > >>>>>>> Best regards, > >>>>>>> Pedro > > >>>>>>> -- > >>>>>>> You received this message because you are subscribed to the Google > >>>>>>> Groups "Thinking Sphinx" group. > >>>>>>> To post to this group, send email to [email protected]. > >>>>>>> To unsubscribe from this group, send email to > >>>>>>> [email protected]. > >>>>>>> For more options, visit this group > >>>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en. > > >>>>> -- > >>>>> You received this message because you are subscribed to the Google > >>>>> Groups "Thinking Sphinx" group. > >>>>> To post to this group, send email to [email protected]. > >>>>> To unsubscribe from this group, send email to > >>>>> [email protected]. > >>>>> For more options, visit this group > >>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en. > > >>> -- > >>> You received this message because you are subscribed to the Google Groups > >>> "Thinking Sphinx" group. > >>> To post to this group, send email to [email protected]. > >>> To unsubscribe from this group, send email to > >>> [email protected]. > >>> For more options, visit this > > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
