The current mechanism is geared towards fast virtual address ->
symbol names lookup. This is fine for the normal use cases
(BUG_ON, WARN_ON, etc), but for xSplice - where we need to find
hypervisor symbols - it is slow.

To understand this patch, a description of the existing
method is explained first. For folks familar go to 'NEW CODE:'.

HOW IT WORKS:

The symbol table lookup mechanism uses a simple encoding mechanism
where it extracts the common ascii characters that the symbol's use.

This saves us space. The lookup mechanism is geared towards looking
up symbols based on address. We have one 0..N (where N is
the number of symbols, so 6849 for example) table:

symbols_addresses[0..N]

And an 1-1 (in a loose fashion) of the symbols (encoded) in a
symbols_names stream of size N.

The N is variable (later on that below)

The symbols_names are sorted based on symbols_addresses, which
means that the decoded entries inside symbols_names are not in
ascending or descending order.

There is also the encoding mechanism - the table of 255 entries
called symbols_token_index[]. And the symbols_token_table which
is an stream of ASCIIZ characters, such as (it really
is not a table as the values are variable):

@0   .asciz  "credit"
@6   .asciz  "mask"
..
@300 .asciz  "S"

And the symbols_token_index:
@0        .short  0
@1        .short  7
@2        .short  12
@4        .short  16
...
@84         .short  300

The relationship between them is that the symbols_token_index
gives us the offset to symbols_token_table.

The symbol_names[] array is a stream of encoded values. Each value
follows the same pattern - <len> followed by <encoding values>.
And the another <len> followed by <encoding values>.

Hence to find the right one you need to read <len>, add <len>
(to skip over), read <len>, add <len>, and so on until one
finds the right tuple offset.

The <encoding values> are the indicies into the symbols_token_index.

Meaning if you have:
  0x04, 0x54, 0xda, 0xe2, 0x74
  [4, 84, 218, 226, 116 in human numbering]

The 0x04 tells us that the symbol is four bytes past this one (so next
symbol offset starts at 5). If we lookup symbols_token_index[84] we get 300.
symbols_token[300] gets us the "S". And so on, the string eventually
end up being decode to be 'S_stext'. The first character is the type,
then optionally follwed by the filename (and # right after filename)
and then lastly the symbol, such as:

tvpmu_intel.c#core2_vpmu_do_interrupt

Keep in mind that there are two fixed sized tables:
symbols_addresses[0..symbols_num_syms], and
symbols_markers[0..symbols_num_syms/255].

The symbols_markers is used to speed searching for the right address.
It gives us the offsets within symbol_names that start at the <len><encoded 
value>.

The way to find a symbol based on the address is:
1) Figure out the 'tuple offset' from symbols_address[0..symbols_num_syms].
   This table is sorted by virtual addresses so finding the value is simple.
2) Get starting offset of symbol_names by retrieving value of
   symbol_markers['tuple offset' / 255].
3). Iterate up to 'tuple_offset & 255' in symbols_markers stream starting
   at 'offset'.
4). Decode the <len><encoded value>

This however does not work very well if we want to search the other
way - we have the symbol name and want to find the address.

NEW CODE:

To make that work we add three tables:
1) symbols_addresses_index_sorted[0..symbols_num_syms]. The values are
 indicies into symbols_addresses array. This is a 1-1 table:

 symbols_addresses_index_sorted:
     .long   6315
     .long   6097

 To get the address of a symbol at offset 0 in symbols_name_sorted
 (see below) we look in symbols_addresses_index_sorted[0] to get
 6315 and then look in symbols_addresses_index[6315] to find it.

2) The symbols_names_sorted stream - same format as symbols_names but
 constructued by sorting the <type>[filename#]<symbol> by <symbol>,
 then [filename#] and lastly type. Recall that symbol_names is sorted
 based on addresses of the symbols.

3) To make it simpler to search we also add an symbols_markers_sorted
 array (of size symbols_num_syms/255).

Searching for symbols is simplified as we can do a binary search
on symbol_names_sorted (and using symbols_markers_sorted). Since the
symbols are sorted it takes on average 13 calls to symbols_expand_symbol.

Doing small tests (search for five different symbols) using the old
and new mechanism show 2ms vs 0ms improvement.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
---
CC: Keir Fraser <k...@xen.org>
CC: Jan Beulich <jbeul...@suse.com>
CC: Andrew Cooper <andrew.coop...@citrix.com>

v8: New
 - Remove the debug code
 - Return the 'mid' index in symbol_addresses, not the 'low'.
---
---
 xen/arch/x86/Makefile      |  3 ++
 xen/common/Kconfig         | 12 ++++++
 xen/common/symbols-dummy.c |  6 +++
 xen/common/symbols.c       | 75 +++++++++++++++++++++++++++++++++----
 xen/tools/symbols.c        | 93 +++++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 176 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 57c93e1..4ce2346 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -74,6 +74,9 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h 
-o \
 
 ifdef CONFIG_XSPLICE
 all_symbols = --all-symbols
+ifdef CONFIG_FAST_SYMBOL_LOOKUP
+all_symbols = --all-symbols --add-extra-sorted
+endif
 else
 all_symbols =
 endif
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index fea33d3..639dc4b 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -200,4 +200,16 @@ config XSPLICE
 
          If unsure, say Y.
 
+config FAST_SYMBOL_LOOKUP
+       bool "Fast symbol lookup (bigger binary)"
+       default y
+       depends on XSPLICE
+       ---help---
+         When searching for symbol addresses we can use the built-in system
+         that is optimized for searching symbols using addresses as the key.
+         However using it for the inverse (find address using the symbol name)
+         it is slow. This extra data (~128kB) speeds up the search.
+         The only user of this is xSplice.
+
+         If unsure, say Y.
 endmenu
diff --git a/xen/common/symbols-dummy.c b/xen/common/symbols-dummy.c
index 5090c3b..18d572c 100644
--- a/xen/common/symbols-dummy.c
+++ b/xen/common/symbols-dummy.c
@@ -14,6 +14,12 @@ const unsigned long symbols_addresses[1];
 const unsigned int symbols_num_syms;
 const u8 symbols_names[1];
 
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+const u8 symbols_names_sorted[1];
+const unsigned int symbols_addresses_index_sorted[1];
+const unsigned int symbols_markers_sorted[1];
+#endif
+
 const u8 symbols_token_table[1];
 const u16 symbols_token_index[1];
 
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index 781765d..cb792cd 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -31,6 +31,12 @@ extern const unsigned long symbols_addresses[];
 extern const unsigned int symbols_num_syms;
 extern const u8 symbols_names[];
 
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+extern const u8 symbols_names_sorted[];
+extern const unsigned int symbols_addresses_index_sorted[];
+extern const unsigned int symbols_markers_sorted[];
+#endif
+
 extern const u8 symbols_token_table[];
 extern const u16 symbols_token_index[];
 
@@ -38,13 +44,13 @@ extern const unsigned int symbols_markers[];
 
 /* expand a compressed symbol data into the resulting uncompressed string,
    given the offset to where the symbol is in the compressed stream */
-static unsigned int symbols_expand_symbol(unsigned int off, char *result)
+static unsigned int symbols_expand_symbol(const u8 *array, unsigned int off, 
char *result)
 {
     int len, skipped_first = 0;
     const u8 *tptr, *data;
 
     /* get the compressed symbol length from the first symbol byte */
-    data = &symbols_names[off];
+    data = array + off;
     len = *data;
     data++;
 
@@ -77,14 +83,19 @@ static unsigned int symbols_expand_symbol(unsigned int off, 
char *result)
 
 /* find the offset on the compressed stream given and index in the
  * symbols array */
-static unsigned int get_symbol_offset(unsigned long pos)
+static unsigned int get_symbol_offset(unsigned long pos, bool_t sorted)
 {
     const u8 *name;
     int i;
 
     /* use the closest marker we have. We have markers every 256 positions,
      * so that should be close enough */
-    name = &symbols_names[ symbols_markers[pos>>8] ];
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+    if (sorted)
+        name = &symbols_names_sorted[ symbols_markers_sorted[pos>>8] ];
+    else
+#endif
+        name = &symbols_names[ symbols_markers[pos>>8] ];
 
     /* sequentially scan all the symbols up to the point we're searching for.
      * Every symbol is stored in a [<len>][<len> bytes of data] format, so we
@@ -93,6 +104,11 @@ static unsigned int get_symbol_offset(unsigned long pos)
     for(i = 0; i < (pos&0xFF); i++)
         name = name + (*name) + 1;
 
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+    if (sorted)
+        return name - symbols_names_sorted;
+#endif
+
     return name - symbols_names;
 }
 
@@ -136,7 +152,7 @@ const char *symbols_lookup(unsigned long addr,
         --low;
 
         /* Grab name */
-    symbols_expand_symbol(get_symbol_offset(low), namebuf);
+    symbols_expand_symbol(symbols_names, get_symbol_offset(low, 0), namebuf);
 
     /* Search for next non-aliased symbol */
     for (i = low + 1; i < symbols_num_syms; i++) {
@@ -195,10 +211,10 @@ int xensyms_read(uint32_t *symnum, char *type,
         next_offset = next_symbol = 0;
     if ( next_symbol != *symnum )
         /* Non-sequential access */
-        next_offset = get_symbol_offset(*symnum);
+        next_offset = get_symbol_offset(*symnum, 0);
 
     *type = symbols_get_symbol_type(next_offset);
-    next_offset = symbols_expand_symbol(next_offset, name);
+    next_offset = symbols_expand_symbol(symbols_names, next_offset, name);
     *address = symbols_address(*symnum);
 
     next_symbol = ++*symnum;
@@ -208,8 +224,51 @@ int xensyms_read(uint32_t *symnum, char *type,
     return 0;
 }
 
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
 unsigned long symbols_lookup_by_name(const char *symname)
 {
+    char namebuf[KSYM_NAME_LEN + 1] = { 0 };
+    unsigned long low, high;
+    static const char *filename_token = "#";
+
+    if ( symname == '\0' )
+        return 0;
+
+    /* Unsupported search for filename in symbol right now. */
+    if ( strpbrk(symname, filename_token) )
+        return 0;
+
+    low = 0;
+    high = symbols_num_syms;
+    while ( low < high )
+    {
+        unsigned long mid = low + ((high - low) / 2);
+        unsigned long offset;
+        const char *p;
+        int rc;
+
+        offset = get_symbol_offset(mid, 1);
+        (void)symbols_expand_symbol(symbols_names_sorted, offset, namebuf);
+        /* Format is: [filename]#<symbol>. symbols_expand_symbol eats type.*/
+        p = strpbrk((const char *)namebuf, filename_token);
+        if ( !p )
+            p = namebuf;
+        else
+            p++; /* Skip # */
+        rc = strcmp(symname, p);
+        if ( rc < 0 )
+            high = mid;
+        else if ( rc > 0 )
+            low = mid + 1;
+        else
+            return symbols_address(symbols_addresses_index_sorted[mid]);
+    }
+    return 0;
+}
+
+#else
+unsigned long symbols_lookup_by_name(const char *symname)
+ {
     char name[KSYM_NAME_LEN + 1] = { 0 };
     uint32_t symnum = 0;
     char type;
@@ -231,7 +290,7 @@ unsigned long symbols_lookup_by_name(const char *symname)
 
     return 0;
 }
-
+#endif
 /*
  * Local variables:
  * mode: C
diff --git a/xen/tools/symbols.c b/xen/tools/symbols.c
index 196db74..0a56c5e 100644
--- a/xen/tools/symbols.c
+++ b/xen/tools/symbols.c
@@ -40,6 +40,10 @@ struct sym_entry {
        unsigned long long addr;
        unsigned int len;
        unsigned char *sym;
+       unsigned char *symbol;
+       unsigned char *filename;
+       unsigned int idx;
+       unsigned char type;
 };
 #define SYMBOL_NAME(s) ((char *)(s)->sym + 1)
 
@@ -47,8 +51,10 @@ static struct sym_entry *table;
 static unsigned int table_size, table_cnt;
 static unsigned long long _stext, _etext, _sinittext, _einittext, _sextratext, 
_eextratext;
 static int all_symbols = 0;
+static int extra_sorted = 0;
 static char symbol_prefix_char = '\0';
 static enum { fmt_bsd, fmt_sysv } input_format;
+static int compare_name(const void *p1, const void *p2);
 
 int token_profit[0x10000];
 
@@ -79,6 +85,7 @@ static int read_symbol(FILE *in, struct sym_entry *s)
        char *sym, stype;
        static enum { symbol, single_source, multi_source } last;
        static char *filename;
+       unsigned int len;
        int rc = -1;
 
        switch (input_format) {
@@ -165,18 +172,29 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 
        /* include the type field in the symbol name, so that it gets
         * compressed together */
-       s->len = strlen(str) + 1;
+       s->len = len = strlen(str) + 1;
        if (islower(stype) && filename)
                s->len += strlen(filename) + 1;
        s->sym = malloc(s->len + 1);
+       if (extra_sorted)
+               s->symbol = malloc(len + 1);
        sym = SYMBOL_NAME(s);
        if (islower(stype) && filename) {
                sym = stpcpy(sym, filename);
                *sym++ = '#';
-       }
+               if (extra_sorted) {
+                       s->filename = malloc(strlen(filename) + 1);
+                       strcpy((char *)s->filename, filename);
+               }
+       } else
+               s->filename = NULL;
+
        strcpy(sym, str);
+       if (extra_sorted) {
+               strcpy((char *)s->symbol, str);
+               s->type = stype; /* As s->sym[0] ends mangled. */
+       }
        s->sym[0] = stype;
-
        rc = 0;
 
  skip_tail:
@@ -276,6 +294,36 @@ static int expand_symbol(unsigned char *data, int len, 
char *result)
        return total;
 }
 
+/* Sort by symbol names, then filename, and lastly type. */
+static int compare_name_orig(const void *p1, const void *p2)
+{
+       const struct sym_entry *sym1 = p1;
+       const struct sym_entry *sym2 = p2;
+       int rc;
+
+       rc = strcmp((char *)(sym1->symbol), (char *)(sym2->symbol));
+
+       if (!rc) {
+               if (sym1->filename && sym2->filename)
+                       rc = strcmp((char *)sym1->filename, (char 
*)sym2->filename);
+               else if (!sym1->filename && sym2->filename)
+                       rc = -1;
+               else if (sym1->filename && !sym2->filename)
+                       rc = 1;
+
+               if (!rc) {
+                       if (sym1->type < sym2->type)
+                               rc = -1;
+                       else if (sym1->type > sym2->type)
+                               rc = 1;
+                       else
+                               rc = 0;
+               }
+       }
+
+       return rc;
+}
+
 static void write_src(void)
 {
        unsigned int i, k, off;
@@ -334,7 +382,6 @@ static void write_src(void)
                printf("\t.long\t%d\n", markers[i]);
        printf("\n");
 
-       free(markers);
 
        output_label("symbols_token_table");
        off = 0;
@@ -350,6 +397,40 @@ static void write_src(void)
        for (i = 0; i < 256; i++)
                printf("\t.short\t%d\n", best_idx[i]);
        printf("\n");
+
+       if (!extra_sorted) {
+               free(markers);
+               return;
+       }
+
+       /* Sorted by original symbol names, filename, and lastly type. */
+       qsort(table, table_cnt, sizeof(*table), compare_name_orig);
+
+       /* Lookup table to symbols_addresses (or symbols_offsets).*/
+       output_label("symbols_addresses_index_sorted");
+       for (i = 0; i < table_cnt; i++) {
+               printf("\t.long\t%d\n", table[i].idx);
+       }
+       printf("\n");
+
+       output_label("symbols_names_sorted");
+       off = 0;
+       for (i = 0; i < table_cnt; i++) {
+               if ((i & 0xFF) == 0)
+                       markers[i >> 8] = off;
+               printf("\t.byte 0x%02x", table[i].len);
+               for (k = 0; k < table[i].len; k++)
+                       printf(", 0x%02x", table[i].sym[k]);
+               printf("\n");
+               off += table[i].len + 1;
+       }
+       printf("\n");
+       output_label("symbols_markers_sorted");
+       for (i = 0; i < ((table_cnt + 255) >> 8); i++)
+               printf("\t.long\t%d\n", markers[i]);
+       printf("\n");
+
+       free(markers);
 }
 
 
@@ -409,7 +490,7 @@ static void compress_symbols(unsigned char *str, int idx)
 
                len = table[i].len;
                p1 = table[i].sym;
-
+               table[i].idx = i;
                /* find the token on the symbol */
                p2 = memmem_pvt(p1, len, str, 2);
                if (!p2) continue;
@@ -561,6 +642,8 @@ int main(int argc, char **argv)
                                input_format = fmt_sysv;
                        else if (strcmp(argv[i], "--sort") == 0)
                                unsorted = true;
+                       else if (strcmp(argv[i], "--add-extra-sorted") == 0)
+                               extra_sorted = 1;
                        else if (strcmp(argv[i], "--warn-dup") == 0)
                                warn_dup = true;
                        else
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to