I'm not sure I'm out of memory per se. It just feels like I'm incurring a huge cost going out to the DB row-by-row when the system could be doing a batch SELECT from the DB and calculating/caching locally. But really, I'm not sure.
Is a UserSimilarity approach expected to be this slow with the amount of data I have? Is an item based approached preferable when considering speed? On Tue, Jul 12, 2011 at 11:00 PM, Lance Norskog <[email protected]> wrote: > Mysql has some quirk about reading in batches. See this in the Solr > wiki about it: > > http://wiki.apache.org/solr/DataImportHandlerFaq?highlight=%28mysql%29#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F > > I don't know how to set special properties in the JDBC data source. > > On Tue, Jul 12, 2011 at 10:09 PM, Salil Apte <[email protected]> wrote: >> Oh yea, at runtime, I'm getting back a BasicDataSource object for my >> DataSource. Is that correct? >> >> On Tue, Jul 12, 2011 at 9:59 PM, Salil Apte <[email protected]> wrote: >>> So I started actually looking at performance today and it is pretty >>> horrendous. I've got about 61,000 rows in my database which I'm >>> assuming isn't *that* many rows. But recommendations are taking > 20 >>> seconds. Is there some way to ensure pooling is turned on? What else >>> is a big driver for performance? My tables are setup so that I have a >>> multiple index (for uniqueness) for <user_id, item_id> pairs. That >>> way, there cannot be two entries with the same <user_id, item_id>. I'm >>> not sure where to go from here. >>> >>> Thanks for the help! >>> >>> On Tue, Jul 12, 2011 at 12:47 AM, Sean Owen <[email protected]> wrote: >>>> You can ignore it. It just doesn't know for sure you have a pool. >>>> I believe I have even removed this in a recent refactoring. >>>> >>>> On Tue, Jul 12, 2011 at 2:21 AM, Salil Apte <[email protected]> wrote: >>>> >>>>> So I keep getting this warning from either Mahout or the server (I'm >>>>> guessing the former): >>>>> >>>>> WARNING: You are not using ConnectionPoolDataSource. Make sure your >>>>> DataSource pools connections to the database itself, or database >>>>> performance will be severely reduced. >>>>> >>>>> I'm not really sure why this is happening. I have the following >>>>> resource in my webapp's context.xml file. Is there anything else I >>>>> need to do enable connection pooling with a JNDI resource? >>>>> >>>>> <Resource name="jdbc/offline-local" auth="Container" >>>>> type="javax.sql.DataSource" username="root" password="" >>>>> driverClassName="com.mysql.jdbc.Driver" >>>>> >>>>> url="jdbc:mysql://localhost:3306/offlinedevel?autoReconnect=true&cachePreparedStatements=true&cachePrepStmts=true&cacheResultSetMetadata=true&alwaysSendSetIsolation=false&elideSetAutoCommits=true" >>>>> validationQuery="select 1" maxActive="16" maxIdle="4" >>>>> removeAbandoned="true" logAbandoned="true" /> >>>>> >>>>> Thanks in advance. >>>>> >>>>> -Salil >>>>> >>>> >>> >> > > > > -- > Lance Norskog > [email protected] >
