Hey, Yes when originally debugging the code I thought to check what \t actually split by and created my own test class:
import java.util.regex.Pattern; class App { private static final Pattern SEPARATOR = Pattern.compile("[\t ]"); public static void main( String[] args ) { String line = "1 0 2"; String[] tokens = SEPARATOR.split(line.toString()); System.out.println(SEPARATOR); System.out.println(tokens.length); for(String token : tokens){ System.out.println(token); } } } and the pattern worked as I thought it should by tab spaces. I'll try your test as well to double check On Mon, Mar 31, 2014 at 9:34 PM, Young Han <young....@uwaterloo.ca> wrote: > Weird, inputs with tabs work for me right out of the box. Either the "\t" > is not the cause or it's some Java-version specific issue. Try this toy > program: > > > import java.util.regex.Pattern; > > public class Test { > public static void main(String[] args) { > Pattern SEPARATOR = Pattern.compile("[\t ]"); > String[] tokens = SEPARATOR.split("3 4 5 6 7"); > > for (int i = 0; i < tokens.length; i++) { > System.out.println("--" + tokens[i] + "--"); > } > } > } > > > Does it split the tabs properly for your Java? > > Young > > > On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <ghufran1ma...@gmail.com>wrote: > >> Yep you right it is a bug with all the InputFormats I believe, I just >> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat >> and the example ConnectedComponents class and it worked like a charm with >> just the normal spacing. >> >> >> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <young....@uwaterloo.ca>wrote: >> >>> Huh, it might be a bug in the code. Could it be that Pattern.compile has >>> to take "[\\t ]" (note the double backslash) to properly match tabs? If so, >>> that bug is in all the input formats... >>> >>> Happy to help :) >>> >>> Young >>> >>> >>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik >>> <ghufran1ma...@gmail.com>wrote: >>> >>>> Hi, >>>> >>>> I removed the spaces and it worked! I don't understand though. I'm sure >>>> the separator pattern means that it splits it by tab spaces?. >>>> >>>> Thanks for all your help though some what relieved now! >>>> >>>> Kind regards, >>>> >>>> Ghufran >>>> >>>> >>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <young....@uwaterloo.ca>wrote: >>>> >>>>> Hi, >>>>> >>>>> That looks like an error with the algorithm... What do the Hadoop >>>>> userlogs say? >>>>> >>>>> And just to rule out weirdness, what happens if you use spaces instead >>>>> of tabs (for your input graph)? >>>>> >>>>> Young >>>>> >>>>> >>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik < >>>>> ghufran1ma...@gmail.com> wrote: >>>>> >>>>>> Hey, >>>>>> >>>>>> No even after I added the .txt it gets to map 100% then drops back >>>>>> down to 50 and gives me the error: >>>>>> >>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input format >>>>>> specified. Ensure your InputFormat does not require one. >>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>> format vertex index type is not known >>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>> format vertex value type is not known >>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>> format edge value type is not known >>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is >>>>>> disabled (default), do not allow any task retries (setting >>>>>> mapred.map.max.attempts = 0, old value = 4) >>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job: >>>>>> job_201403311622_0004 >>>>>> 14/03/31 18:22:58 INFO mapred.JobClient: map 0% reduce 0% >>>>>> 14/03/31 18:23:16 INFO mapred.JobClient: map 50% reduce 0% >>>>>> 14/03/31 18:23:19 INFO mapred.JobClient: map 100% reduce 0% >>>>>> 14/03/31 18:33:25 INFO mapred.JobClient: map 50% reduce 0% >>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete: >>>>>> job_201403311622_0004 >>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6 >>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job Counters >>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1238858 >>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by all >>>>>> reduces waiting after reserving slots (ms)=0 >>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by all >>>>>> maps waiting after reserving slots (ms)=0 >>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Launched map tasks=2 >>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Failed map tasks=1 >>>>>> >>>>>> >>>>>> I did a check to make sure the graph was being stored correctly by >>>>>> doing: >>>>>> >>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat >>>>>> input/* >>>>>> 1 2 >>>>>> 2 1 3 4 >>>>>> 3 2 >>>>>> 4 2 >>>>>> >>>>> >>>>> >>>> >>> >> >