Ah yeah, I found the answer to that question: https://stackoverflow.com/questions/3762347/
So I don't think that bit is a bug. I'm not really sure why inputs with tabs don't work for you. I'm using Hadoop 1.0.4 and jdk1.6.0_30 on Ubuntu 12.04 x64, if that helps you. Young On Mon, Mar 31, 2014 at 4:50 PM, ghufran malik <ghufran1ma...@gmail.com>wrote: > Hey, > > Yes when originally debugging the code I thought to check what \t actually > split by and created my own test class: > > import java.util.regex.Pattern; > > class App > { > private static final Pattern SEPARATOR = Pattern.compile("[\t ]"); > public static void main( String[] args ) > { > String line = "1 0 2"; > String[] tokens = SEPARATOR.split(line.toString()); > > System.out.println(SEPARATOR); > System.out.println(tokens.length); > > for(String token : tokens){ > > System.out.println(token); > } > } > } > > and the pattern worked as I thought it should by tab spaces. > > I'll try your test as well to double check > > > On Mon, Mar 31, 2014 at 9:34 PM, Young Han <young....@uwaterloo.ca> wrote: > >> Weird, inputs with tabs work for me right out of the box. Either the "\t" >> is not the cause or it's some Java-version specific issue. Try this toy >> program: >> >> >> import java.util.regex.Pattern; >> >> public class Test { >> public static void main(String[] args) { >> Pattern SEPARATOR = Pattern.compile("[\t ]"); >> String[] tokens = SEPARATOR.split("3 4 5 6 7"); >> >> for (int i = 0; i < tokens.length; i++) { >> System.out.println("--" + tokens[i] + "--"); >> } >> } >> } >> >> >> Does it split the tabs properly for your Java? >> >> Young >> >> >> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik >> <ghufran1ma...@gmail.com>wrote: >> >>> Yep you right it is a bug with all the InputFormats I believe, I just >>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat >>> and the example ConnectedComponents class and it worked like a charm with >>> just the normal spacing. >>> >>> >>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <young....@uwaterloo.ca>wrote: >>> >>>> Huh, it might be a bug in the code. Could it be that Pattern.compile >>>> has to take "[\\t ]" (note the double backslash) to properly match tabs? If >>>> so, that bug is in all the input formats... >>>> >>>> Happy to help :) >>>> >>>> Young >>>> >>>> >>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik <ghufran1ma...@gmail.com >>>> > wrote: >>>> >>>>> Hi, >>>>> >>>>> I removed the spaces and it worked! I don't understand though. I'm >>>>> sure the separator pattern means that it splits it by tab spaces?. >>>>> >>>>> Thanks for all your help though some what relieved now! >>>>> >>>>> Kind regards, >>>>> >>>>> Ghufran >>>>> >>>>> >>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <young....@uwaterloo.ca>wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> That looks like an error with the algorithm... What do the Hadoop >>>>>> userlogs say? >>>>>> >>>>>> And just to rule out weirdness, what happens if you use spaces >>>>>> instead of tabs (for your input graph)? >>>>>> >>>>>> Young >>>>>> >>>>>> >>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik < >>>>>> ghufran1ma...@gmail.com> wrote: >>>>>> >>>>>>> Hey, >>>>>>> >>>>>>> No even after I added the .txt it gets to map 100% then drops back >>>>>>> down to 50 and gives me the error: >>>>>>> >>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input >>>>>>> format specified. Ensure your InputFormat does not require one. >>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>> format vertex index type is not known >>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>> format vertex value type is not known >>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>> format edge value type is not known >>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is >>>>>>> disabled (default), do not allow any task retries (setting >>>>>>> mapred.map.max.attempts = 0, old value = 4) >>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job: >>>>>>> job_201403311622_0004 >>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient: map 0% reduce 0% >>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient: map 50% reduce 0% >>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient: map 100% reduce 0% >>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient: map 50% reduce 0% >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete: >>>>>>> job_201403311622_0004 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job Counters >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: >>>>>>> SLOTS_MILLIS_MAPS=1238858 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by all >>>>>>> reduces waiting after reserving slots (ms)=0 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by all >>>>>>> maps waiting after reserving slots (ms)=0 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Launched map tasks=2 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Failed map tasks=1 >>>>>>> >>>>>>> >>>>>>> I did a check to make sure the graph was being stored correctly by >>>>>>> doing: >>>>>>> >>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat >>>>>>> input/* >>>>>>> 1 2 >>>>>>> 2 1 3 4 >>>>>>> 3 2 >>>>>>> 4 2 >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >