:) yes I do regenerate the job file. I actually have scripts that makes
a fresh copy of git and applies my changes and run ants to generate the
job file every time I make a change. the cassandra cluster that I am
trying to use here consist of 10 servers, and the Hadoop cluster on
which I run the nutch has 11 nodes as well. as you can imagine every
node or role for that matter is found through DNS and anything localhost
is kinda meaningless here. ( the only things in my /etc/hosts files is
localhost and 127...)
I would be surprise if you remember me since I can see on the lists how
many emails you go through everyday, but I was trying to do similar
thing with hbase couple of months ago. That turned out to be very
unstable under some load (over 5o million pages) and its mostly has to
do with the fact that gora does not support the new version of hbase
which supposedly don't have this problem anymore, which by the way if
you could point me the right direction I like to start working on
updating hbase support for gora. I should say that you actually are the
reason that I am trying with cassandra this time, cause at the time, I
remember, you said you were using Cassandra, so i figured at least I
know of one person who is successfully doing this :) , that
automatically means that I am gonna have better odds this time, since I
didn't know of anyone, and I still don't, who was actually using hbase
in production for this purpose. ( nutch load, at least as it is now and
as long as it does the filtering on its own, is very particular, don't
you agree? )
anyway as for thie issue at hand, I am going back a bit in git commits.
in the very narrow chance that this is because of a recently broken
wiring or what ever. if it start working at some point it could let us
isolate the issue, but so far no luck. I am pretty sure I am doing
something stupid some where and it is going to be hell finding it :). so
I guess this would be a good time for me to thank and apologize in
advance to you and all the other people who spend time here for their
attention and the amount of spam that I am gonna be generating on the list.
On 08/09/2013 06:14 PM, Lewis John Mcgibbney wrote:
I am assuming that you are regenrating your job file if this in in
Nutch distributed mode?
If not, and your running this as a local Nutch server, then also
please check that there are no temp files
ls -al
gora.properties
gora.properties~
The entry in gora.rpoperties should be
gora.cassandrastore.servers=localhost:9160 (if running locally)
and the host in gora-cassandra-mnapping.xml should reflect the host
you use here.
You can check that the host mapps properly by looking in to /etc/hosts
hth
Lewis
On Fri, Aug 9, 2013 at 5:04 PM, kaveh minooie <ka...@plutoz.com
<mailto:ka...@plutoz.com>> wrote:
nope. same exact result and I tried 'cassandraStore' with both
uppercase and lowercase ( they are case sensitive, right? )
13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: starting at
2013-08-09 16:59:17
13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: Injecting
urlDir: /2locos/temp/url
13/08/09 16:59:19 INFO connection.CassandraHostRetryService:
Downed Host Retry service started with queue size -1 and retry
delay 10s
13/08/09 16:59:19 ERROR connection.HConnectionManager: Could not
start connection pool for host localhost(127.0.0.1):9160
13/08/09 16:59:19 INFO connection.CassandraHostRetryService: Host
detected as down was added to retry queue: localhost(127.0.0.1):9160
13/08/09 16:59:19 WARN connection.CassandraHostRetryService:
Downed localhost(127.0.0.1):9160 host still appears to be down:
Unable to open transport to localhost(127.0.0.1):9160 ,
java.net.ConnectException: Connection refused
13/08/09 16:59:19 INFO service.JmxMonitor: Registering JMX
me.prettyprint.cassandra.service_Test
Cluster:ServiceType=hector,MonitorType=hector
13/08/09 16:59:19 ERROR store.CassandraStore: All host pools
marked down. Retry burden pushed out to client.
13/08/09 16:59:19 ERROR store.CassandraStore:
[Ljava.lang.StackTraceElement;@7a6b653f
13/08/09 16:59:19 INFO crawl.InjectorJob: InjectorJob: Using class
org.apache.gora.cassandra.store.CassandraStore as the Gora storage
class.
13/08/09 16:59:20 INFO input.FileInputFormat: Total input paths to
process : 1
13/08/09 16:59:20 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
13/08/09 16:59:20 WARN snappy.LoadSnappy: Snappy native library
not loaded
13/08/09 16:59:21 INFO mapred.JobClient: Running job:
job_201308091131_0009
13/08/09 16:59:22 INFO mapred.JobClient: map 0% reduce 0%
13/08/09 16:59:29 INFO connection.CassandraHostRetryService: Not
checking that localhost(127.0.0.1):9160 is a member of the ring
since there are no live hosts
13/08/09 16:59:29 WARN connection.CassandraHostRetryService:
Downed localhost(127.0.0.1):9160 host still appears to be down:
Unable to open transport to localhost(127.0.0.1):9160 ,
java.net.ConnectException: Connection refused
13/08/09 16:59:29 INFO connection.CassandraHostRetryService:
Downed Host retry status false with host: localhost(127.0.0.1):9160
13/08/09 16:59:30 INFO mapred.JobClient: map 100% reduce 0%
13/08/09 16:59:31 INFO mapred.JobClient: Job complete:
job_201308091131_0009
13/08/09 16:59:31 INFO mapred.JobClient: Counters: 19
On 08/09/2013 04:51 PM, Renato Marroquín Mogrovejo wrote:
Could you please try with this one:
gora.cassandraStore.host=cass-node:9160
2013/8/9 kaveh minooie <ka...@plutoz.com
<mailto:ka...@plutoz.com> <mailto:ka...@plutoz.com
<mailto:ka...@plutoz.com>>>
so it is not working:
from gora.properties:
#############################
# CassandraStore properties #
#############################
gora.cassandrastore.servers=__cass-node:9160
#gora.cassandra.servers=cass-__node:9160
#######################
# MemStore properties #
from gora-cassandra-mapping.xml:
<keyspace name="host" cluster="DoslocosCluster"
host="cass-node">
<family name="mtdt" type="super"/>
<family name="il" type="super"/>
<family name="ol" type="super"/>
</keyspace>
and inject output:
13/08/09 16:38:34 INFO
connection.__CassandraHostRetryService:
Downed Host Retry service started with queue size -1 and
retry delay 10s
13/08/09 16:38:34 ERROR connection.HConnectionManager:
Could not
start connection pool for host localhost(127.0.0.1):9160
13/08/09 16:38:34 INFO
connection.__CassandraHostRetryService: Host
detected as down was added to retry queue:
localhost(127.0.0.1):9160
13/08/09 16:38:34 WARN
connection.__CassandraHostRetryService:
Downed localhost(127.0.0.1):9160 host still appears to be
down:
Unable to open transport to localhost(127.0.0.1):9160 ,
java.net.ConnectException: Connection refused
13/08/09 16:38:35 INFO service.JmxMonitor: Registering JMX
me.prettyprint.cassandra.__service_Test
Cluster:ServiceType=hector,__MonitorType=hector
13/08/09 16:38:35 ERROR store.CassandraStore: All host
pools marked
down. Retry burden pushed out to client.
13/08/09 16:38:35 ERROR store.CassandraStore:
[Ljava.lang.StackTraceElement;__@20c449e3
13/08/09 16:38:35 INFO crawl.InjectorJob: InjectorJob:
Using class
org.apache.gora.cassandra.__store.CassandraStore as the
Gora storage
class.
13/08/09 16:38:36 INFO input.FileInputFormat: Total input
paths to
process : 1
13/08/09 16:38:36 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
13/08/09 16:38:36 WARN snappy.LoadSnappy: Snappy native
library not
loaded
13/08/09 16:38:36 INFO mapred.JobClient: Running job:
job_201308091131_0007
13/08/09 16:38:37 INFO mapred.JobClient: map 0% reduce 0%
13/08/09 16:38:44 INFO
connection.__CassandraHostRetryService: Not
checking that localhost(127.0.0.1):9160 is a member of the
ring
since there are no live hosts
13/08/09 16:38:44 WARN
connection.__CassandraHostRetryService:
Downed localhost(127.0.0.1):9160 host still appears to be
down:
Unable to open transport to localhost(127.0.0.1):9160 ,
java.net.ConnectException: Connection refused
13/08/09 16:38:44 INFO
connection.__CassandraHostRetryService:
Downed Host retry status false with host:
localhost(127.0.0.1):9160
13/08/09 16:38:47 INFO mapred.JobClient: map 100% reduce 0%
13/08/09 16:38:49 INFO mapred.JobClient: Job complete:
job_201308091131_0007
I tried 10.0.0.10 instead of cass-node, and I got the same
result.
it just goes for localhost!!!
:(
On 08/09/2013 04:37 PM, Lewis John Mcgibbney wrote:
Both properties should certainly match up.
On Fri, Aug 9, 2013 at 4:28 PM, Renato Marroquín Mogrovejo
<renatoj.marroq...@gmail.com
<mailto:renatoj.marroq...@gmail.com>
<mailto:renatoj.marroq...@gmail.com
<mailto:renatoj.marroq...@gmail.com>>
<mailto:renatoj.marroquin@
<mailto:renatoj.marroquin@>__gmail.com <http://gmail.com>
<mailto:renatoj.marroq...@gmail.com
<mailto:renatoj.marroq...@gmail.com>>>> wrote:
You are right, it'd be redundant. But I guess the
idea
behind this
is that at some point you'd be able to read or
write to
different
clusters from the same application, but that
feature is
not in yet.
@Lewis, do we even have a JIRA for such thing? or
am I just
crazy?
Renato M.
2013/8/9 kaveh minooie <ka...@plutoz.com
<mailto:ka...@plutoz.com>
<mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>>
<mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>
<mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>>>>
if you are talking about :
<keyspace name="host" cluster="Test Cluster"
host="localhost">
<family name="mtdt" type="super"/>
<family name="il" type="super"/>
<family name="ol" type="super"/>
</keyspace>
in gora-cassandra-mapping.xml file. the
answer is no.
thanks lewis,
P.S so it needs to be set in both places? (
gora.properties &
gora-cassandra-mapping.xml ) isn't it redundant?
On 08/09/2013 04:17 PM, Lewis John Mcgibbney
wrote:
Hi
On Fri, Aug 9, 2013 at 4:13 PM, kaveh minooie
<ka...@plutoz.com
<mailto:ka...@plutoz.com> <mailto:ka...@plutoz.com
<mailto:ka...@plutoz.com>>
<mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>
<mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>>>
<mailto:ka...@plutoz.com
<mailto:ka...@plutoz.com> <mailto:ka...@plutoz.com
<mailto:ka...@plutoz.com>>
<mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>
<mailto:ka...@plutoz.com <mailto:ka...@plutoz.com>>>>> wrote:
gora.cassandrastore.servers
The one above is correct
What about the host value in your
gora-cassandra.xml file?
Is it set properly as well?
--
Kaveh Minooie
--
/Lewis/
--
Kaveh Minooie
--
Kaveh Minooie
--
/Lewis/