Thanks!

On Thu, Jul 16, 2015 at 1:59 PM Vetle Leinonen-Roeim <ve...@roeim.net>
wrote:

> By the way - if you're going this route, see
> https://github.com/datastax/spark-cassandra-connector
>
> On Thu, Jul 16, 2015 at 2:40 PM Vetle Leinonen-Roeim <ve...@roeim.net>
> wrote:
>
>> You'll probably have to install it separately.
>>
>> On Thu, Jul 16, 2015 at 2:29 PM Jem Tucker <jem.tuc...@gmail.com> wrote:
>>
>>> Hi Vetle,
>>>
>>> IndexedRDD is persisted in the same way RDDs are as far as I am aware.
>>> Are you aware if Cassandra can be built into my application or has to be a
>>> stand alone database which is installed separately?
>>>
>>> Thanks,
>>>
>>> Jem
>>>
>>> On Thu, Jul 16, 2015 at 12:59 PM Vetle Leinonen-Roeim <ve...@roeim.net>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Not sure how IndexedRDD is persisted, but perhaps you're better off
>>>> using a NOSQL database for lookups (perhaps using Cassandra, with the
>>>> Cassandra connector)? That should give you good performance on lookups, but
>>>> persisting those billion records sounds like something that will take some
>>>> time in any case.
>>>>
>>>> Regards,
>>>> Vetle
>>>>
>>>>
>>>> On Thu, Jul 16, 2015 at 10:02 AM Jem Tucker <jem.tuc...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have been using IndexedRDD as a large lookup (1 billion records) to
>>>>> join with small tables (1 million rows). The performance of indexedrdd is
>>>>> great until it has to be persisted on disk. Are there any alternatives to
>>>>> IndexedRDD or any changes to how I use it to improve performance with big
>>>>> data volumes?
>>>>>
>>>>> Kindest Regards,
>>>>>
>>>>> Jem
>>>>>
>>>>

Reply via email to