Hi Walter
If you're trying to match on words deep in the contents field, then you may
need to have a read through this part of the docs:
http://pat.github.com/ts/en/common_issues.html#mysql_large_fields
If you're not seeing results returned when searching on the title field, then
that's probably a different matter. Either way, let me know how you go.
Cheers
--
Pat
On 07/09/2012, at 5:44 PM, Walter Davis wrote:
> I have the following relationships:
>
> Title has_many :contents
> Content belongs_to :title
>
> In title.rb, I have the following index declaration:
>
> define_index do
> set_property :group_concat_max_len => 10.megabytes
>
> indexes :title, :sortable => true
> indexes teaser
> indexes role.person(:name), :as => :author, :sortable => true
> has contents(:plain), :as => :contents
> has created_at, updated_at
> where sanitize_sql(["publish", true])
> end
>
> When I run the index, it appears to work:
>
> indexing index 'title_core'...
> WARNING: collect_hits: mem_limit=0 kb too low, increasing to 14880 kb
> collected 1466 docs, 0.6 MB
> collected 3668728 attr values
> WARNING: sort_mva: merge_block_size=8 kb too low, increasing mem_limit may
> improve performance
> sorted 7.3 Mvalues, 100.0% done
> sorted 0.1 Mhits, 100.0% done
> total 1466 docs, 560458 bytes
> total 62.465 sec, 8972 bytes/sec, 23.46 docs/sec
> skipping non-plain index 'title'...
> total 445634 reads, 0.159 sec, 0.1 kb/call avg, 0.0 msec/call avg
> total 1836 writes, 0.092 sec, 34.1 kb/call avg, 0.0 msec/call avg
> Started successfully (pid 17630).
>
> But searches do not match any of the content strings.
>
> When I run the generated query directly in SQL:
>
> SELECT SQL_NO_CACHE `titles`.`id` * CAST(4 AS SIGNED) + 3 AS `id` ,
> `titles`.`title` AS `title`, `titles`.`teaser` AS `teaser`, `people`.`name`
> AS `author`, `titles`.`id` AS `sphinx_internal_id`, 0 AS `sphinx_deleted`,
> 3942078319 AS `class_crc`, IFNULL('Title', '') AS `sphinx_internal_class`,
> IFNULL(`titles`.`title`, '') AS `title_sort`, IFNULL(`people`.`name`, '') AS
> `author_sort`, GROUP_CONCAT(DISTINCT IFNULL(`contents`.`plain`, '0')
> SEPARATOR ' ') AS `contents`, UNIX_TIMESTAMP(`titles`.`created_at`) AS
> `created_at`, UNIX_TIMESTAMP(`titles`.`updated_at`) AS `updated_at` FROM
> `titles` LEFT OUTER JOIN `roles` ON `roles`.`id` = `titles`.`role_id` LEFT
> OUTER JOIN `people` ON `people`.`id` = `roles`.`person_id` LEFT OUTER JOIN
> `contents` ON `contents`.`title_id` = `titles`.`id` WHERE (`titles`.`id` >=
> $start AND `titles`.`id` <= $end AND publish) GROUP BY `titles`.`id` ORDER BY
> NULL
>
> I get some genuinely odd results in the contents column. This may be an
> artifact of Sequel Pro, but the column doesn't appear to be very large at
> all, at most there are a dozen lines of text, some just include a single 0
> character. The contents table includes thousands of rows of data and each row
> has up to 400 lines of text in it. When concatenated, these composite
> contents range from 300K to 8MB per title.
>
> Can anyone suggest a way to go here? Is there a better way to index text
> (XML) documents than slurping out their content into MySQL so that Sphinx can
> index them?
>
> Thanks in advance,
>
> Walter
>
> --
> You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/thinking-sphinx/-/V_uNKsdAenIJ.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/thinking-sphinx?hl=en.
--
You received this message because you are subscribed to the Google Groups
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/thinking-sphinx?hl=en.