Re: need help with store.CassandraStore

kaveh minooie Fri, 09 Aug 2013 19:55:29 -0700

:) yes I do regenerate the job file. I actually have scripts that makesa fresh copy of git and applies my changes and run ants to generate thejob file every time I make a change. the cassandra cluster that I amtrying to use here consist of 10 servers, and the Hadoop cluster onwhich I run the nutch has 11 nodes as well. as you can imagine everynode or role for that matter is found through DNS and anything localhostis kinda meaningless here. ( the only things in my /etc/hosts files islocalhost and 127...)

I would be surprise if you remember me since I can see on the lists howmany emails you go through everyday, but I was trying to do similarthing with hbase couple of months ago. That turned out to be veryunstable under some load (over 5o million pages) and its mostly has todo with the fact that gora does not support the new version of hbasewhich supposedly don't have this problem anymore, which by the way ifyou could point me the right direction I like to start working onupdating hbase support for gora. I should say that you actually are thereason that I am trying with cassandra this time, cause at the time, Iremember, you said you were using Cassandra, so i figured at least Iknow of one person who is successfully doing this :) , thatautomatically means that I am gonna have better odds this time, since Ididn't know of anyone, and I still don't, who was actually using hbasein production for this purpose. ( nutch load, at least as it is now andas long as it does the filtering on its own, is very particular, don'tyou agree? )

anyway as for thie issue at hand, I am going back a bit in git commits.in the very narrow chance that this is because of a recently brokenwiring or what ever. if it start working at some point it could let usisolate the issue, but so far no luck. I am pretty sure I am doingsomething stupid some where and it is going to be hell finding it :). soI guess this would be a good time for me to thank and apologize inadvance to you and all the other people who spend time here for theirattention and the amount of spam that I am gonna be generating on the list.




On 08/09/2013 06:14 PM, Lewis John Mcgibbney wrote:

I am assuming that you are regenrating your job file if this in inNutch distributed mode?If not, and your running this as a local Nutch server, then alsoplease check that there are no temp files

ls -al
gora.properties
gora.properties~

The entry in gora.rpoperties should be
gora.cassandrastore.servers=localhost:9160 (if running locally)

and the host in gora-cassandra-mnapping.xml should reflect the hostyou use here.

You can check that the host mapps properly by looking in to /etc/hosts
hth
Lewis

On Fri, Aug 9, 2013 at 5:04 PM, kaveh minooie <ka...@plutoz.com<mailto:ka...@plutoz.com>> wrote:


    nope. same exact result and I tried 'cassandraStore' with both
    uppercase and lowercase ( they are case sensitive, right? )


    13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: starting at
    2013-08-09 16:59:17
    13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: Injecting
    urlDir: /2locos/temp/url
    13/08/09 16:59:19 INFO connection.CassandraHostRetryService:
    Downed Host Retry service started with queue size -1 and retry
    delay 10s
    13/08/09 16:59:19 ERROR connection.HConnectionManager: Could not
    start connection pool for host localhost(127.0.0.1):9160
    13/08/09 16:59:19 INFO connection.CassandraHostRetryService: Host
    detected as down was added to retry queue: localhost(127.0.0.1):9160
    13/08/09 16:59:19 WARN connection.CassandraHostRetryService:
    Downed localhost(127.0.0.1):9160 host still appears to be down:
    Unable to open transport to localhost(127.0.0.1):9160 ,
    java.net.ConnectException: Connection refused
    13/08/09 16:59:19 INFO service.JmxMonitor: Registering JMX
    me.prettyprint.cassandra.service_Test
    Cluster:ServiceType=hector,MonitorType=hector
    13/08/09 16:59:19 ERROR store.CassandraStore: All host pools
    marked down. Retry burden pushed out to client.
    13/08/09 16:59:19 ERROR store.CassandraStore:
    [Ljava.lang.StackTraceElement;@7a6b653f
    13/08/09 16:59:19 INFO crawl.InjectorJob: InjectorJob: Using class
    org.apache.gora.cassandra.store.CassandraStore as the Gora storage
    class.
    13/08/09 16:59:20 INFO input.FileInputFormat: Total input paths to
    process : 1
    13/08/09 16:59:20 INFO util.NativeCodeLoader: Loaded the
    native-hadoop library
    13/08/09 16:59:20 WARN snappy.LoadSnappy: Snappy native library
    not loaded
    13/08/09 16:59:21 INFO mapred.JobClient: Running job:
    job_201308091131_0009
    13/08/09 16:59:22 INFO mapred.JobClient:  map 0% reduce 0%
    13/08/09 16:59:29 INFO connection.CassandraHostRetryService: Not
    checking that localhost(127.0.0.1):9160 is a member of the ring
    since there are no live hosts
    13/08/09 16:59:29 WARN connection.CassandraHostRetryService:
    Downed localhost(127.0.0.1):9160 host still appears to be down:
    Unable to open transport to localhost(127.0.0.1):9160 ,
    java.net.ConnectException: Connection refused
    13/08/09 16:59:29 INFO connection.CassandraHostRetryService:
    Downed Host retry status false with host: localhost(127.0.0.1):9160
    13/08/09 16:59:30 INFO mapred.JobClient:  map 100% reduce 0%
    13/08/09 16:59:31 INFO mapred.JobClient: Job complete:
    job_201308091131_0009
    13/08/09 16:59:31 INFO mapred.JobClient: Counters: 19



    On 08/09/2013 04:51 PM, Renato Marroquín Mogrovejo wrote:

        Could you please try with this one:

        gora.cassandraStore.host=cass-node:9160




        2013/8/9 kaveh minooie <ka...@plutoz.com
        <mailto:ka...@plutoz.com> <mailto:ka...@plutoz.com
        <mailto:ka...@plutoz.com>>>


            so it is not working:

            from gora.properties:

            #############################
            # CassandraStore properties #
            #############################

            gora.cassandrastore.servers=__cass-node:9160

            #gora.cassandra.servers=cass-__node:9160


            #######################
            # MemStore properties #


            from gora-cassandra-mapping.xml:


            <keyspace name="host" cluster="DoslocosCluster"
        host="cass-node">
                     <family name="mtdt" type="super"/>
                     <family name="il" type="super"/>
                     <family name="ol" type="super"/>
                 </keyspace>


            and inject output:

            13/08/09 16:38:34 INFO
        connection.__CassandraHostRetryService:

            Downed Host Retry service started with queue size -1 and
        retry delay 10s
            13/08/09 16:38:34 ERROR connection.HConnectionManager:
        Could not
            start connection pool for host localhost(127.0.0.1):9160
            13/08/09 16:38:34 INFO
        connection.__CassandraHostRetryService: Host

            detected as down was added to retry queue:
        localhost(127.0.0.1):9160
            13/08/09 16:38:34 WARN
        connection.__CassandraHostRetryService:

            Downed localhost(127.0.0.1):9160 host still appears to be
        down:
            Unable to open transport to localhost(127.0.0.1):9160 ,
            java.net.ConnectException: Connection refused
            13/08/09 16:38:35 INFO service.JmxMonitor: Registering JMX
            me.prettyprint.cassandra.__service_Test
            Cluster:ServiceType=hector,__MonitorType=hector

            13/08/09 16:38:35 ERROR store.CassandraStore: All host
        pools marked
            down. Retry burden pushed out to client.
            13/08/09 16:38:35 ERROR store.CassandraStore:
            [Ljava.lang.StackTraceElement;__@20c449e3

            13/08/09 16:38:35 INFO crawl.InjectorJob: InjectorJob:
        Using class
            org.apache.gora.cassandra.__store.CassandraStore as the
        Gora storage

            class.
            13/08/09 16:38:36 INFO input.FileInputFormat: Total input
        paths to
            process : 1
            13/08/09 16:38:36 INFO util.NativeCodeLoader: Loaded the
            native-hadoop library
            13/08/09 16:38:36 WARN snappy.LoadSnappy: Snappy native
        library not
            loaded
            13/08/09 16:38:36 INFO mapred.JobClient: Running job:
            job_201308091131_0007
            13/08/09 16:38:37 INFO mapred.JobClient:  map 0% reduce 0%
            13/08/09 16:38:44 INFO
        connection.__CassandraHostRetryService: Not

            checking that localhost(127.0.0.1):9160 is a member of the
        ring
            since there are no live hosts
            13/08/09 16:38:44 WARN
        connection.__CassandraHostRetryService:

            Downed localhost(127.0.0.1):9160 host still appears to be
        down:
            Unable to open transport to localhost(127.0.0.1):9160 ,
            java.net.ConnectException: Connection refused
            13/08/09 16:38:44 INFO
        connection.__CassandraHostRetryService:

            Downed Host retry status false with host:
        localhost(127.0.0.1):9160
            13/08/09 16:38:47 INFO mapred.JobClient:  map 100% reduce 0%
            13/08/09 16:38:49 INFO mapred.JobClient: Job complete:
            job_201308091131_0007


            I tried 10.0.0.10 instead of cass-node, and I got the same
        result.
            it just goes for localhost!!!

            :(

            On 08/09/2013 04:37 PM, Lewis John Mcgibbney wrote:

                Both properties should certainly match up.


                On Fri, Aug 9, 2013 at 4:28 PM, Renato Marroquín Mogrovejo
                <renatoj.marroq...@gmail.com
        <mailto:renatoj.marroq...@gmail.com>
                <mailto:renatoj.marroq...@gmail.com
        <mailto:renatoj.marroq...@gmail.com>>
                <mailto:renatoj.marroquin@
        <mailto:renatoj.marroquin@>__gmail.com <http://gmail.com>

                <mailto:renatoj.marroq...@gmail.com
        <mailto:renatoj.marroq...@gmail.com>>>> wrote:

                     You are right, it'd be redundant. But I guess the
        idea
                behind this
                     is that at some point you'd be able to read or
        write to
                different
                       clusters from the same application, but that
        feature is
                not in yet.
                     @Lewis, do we even have a JIRA for such thing? or
        am I just
                crazy?


                     Renato M.


                     2013/8/9 kaveh minooie <ka...@plutoz.com
        <mailto:ka...@plutoz.com>
                <mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>>
        <mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>

                <mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>>>>

                         if you are talking about :

                         <keyspace name="host" cluster="Test Cluster"
                host="localhost">
                         <family name="mtdt" type="super"/>
                         <family name="il" type="super"/>
                         <family name="ol" type="super"/>
                         </keyspace>

                         in gora-cassandra-mapping.xml file. the
        answer is no.

                         thanks lewis,


                         P.S so it needs to be set in both places? (
                gora.properties &
                         gora-cassandra-mapping.xml ) isn't it redundant?



                         On 08/09/2013 04:17 PM, Lewis John Mcgibbney
        wrote:

                             Hi

                             On Fri, Aug 9, 2013 at 4:13 PM, kaveh minooie
                             <ka...@plutoz.com
        <mailto:ka...@plutoz.com> <mailto:ka...@plutoz.com
        <mailto:ka...@plutoz.com>>
                <mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>
        <mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>>>
                             <mailto:ka...@plutoz.com
        <mailto:ka...@plutoz.com> <mailto:ka...@plutoz.com
        <mailto:ka...@plutoz.com>>
                <mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>
        <mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>>>>> wrote:

                                  gora.cassandrastore.servers


                             The one above is correct


                             What about the host value in your
                gora-cassandra.xml file?
                             Is it set properly as well?


                         --
                         Kaveh Minooie





                --
                /Lewis/


            --
            Kaveh Minooie

--Kaveh Minooie





--
/Lewis/

Re: need help with store.CassandraStore

Reply via email to