Hi all.

Just a post for giving you my interpretation of how making a sitemap.
Base code is in this post :
https://groups.google.com/forum/#!searchin/web2py/sitemap/web2py/TUMn6R3BJ10/TRSLCY_JQ8UJ

With a lot of URLs, in my case more than 2000 (a lot of products), Google 
Webmaster turns me back an error with my XML sitemap. Splitting the sitemap 
in more files was a little bit tricky, so I've change to a txt sitemap. No 
more errors.
For having everyday a "fresh" sitemap, I've put the function in the 
scheduler.
It works fine. These products that are sells in all over the world by a lot 
of online merchants, and my website is always at the first or second place 
in a Google search.
Here is the code, I hope it could help someone:
def sitemap_txt_auto():
    import os
    from gluon.myregex import regex_expose
    # Functions URLs
    exclusions = ['index', 'user', 'unsubscribe', 'download', 'call', 'data'
, 'upload', 'browse', 'delete']
    ctldir = os.path.join(request.folder,"controllers")
    ctls=os.listdir(ctldir)
    if 'appadmin.py' in ctls: ctls.remove('appadmin.py')
    if 'manage.py' in ctls: ctls.remove('manage.py')
    sitemap='http://www.mydomain.com/it/index.html'
    sitemap += '\r\n'
    sitemap += 'http://www.mydomain.com/en/index.html'
    for ctl in ctls:
        if ctl.endswith(".bak") == False:
            filename = os.path.join(ctldir,ctl)
            data = open(filename, 'r').read()
            functions = regex_expose.findall(data)
            for f in functions:
                if not any(f in s for s in exclusions): # if function is 
not in exclustions
                    sitemap += '\r\n'
                    sitemap += 'http://www.mydomain.com/it/%s' % (f)
                    sitemap += '\r\n'
                    sitemap += 'http://www.mydomain.com/en/%s' % (f)
    # Products
    products = db().select(db.products.ALL, orderby=db.products.id)
    pdf_paths = []
    for item in products:
        # Product pages
        sitemap += '\r\n'
        sitemap += 
'http://www.mydomain.com/it/products_listing/view/products/%s' % (str(item.
id))
        sitemap += '\r\n'
        sitemap += 
'http://www.mydomain.com/en/products_listing/view/products/%s' % (str(item.
id))
        # PDF files
        if pdf_paths.count(item.pdf_path) < 1:#Usefull because some 
products have a same pdf file
            pdf_paths.append(item.pdf_path)
            sitemap += '\r\n'
            sitemap += item.pdf_path
    file = open('%s/static/sitemaps/sitemap.txt' %request.folder, 'w')
    file.write(sitemap)
    file.close()
    db.commit()

from gluon.scheduler import Scheduler
Scheduler(db,dict(sitemap_txt_auto=sitemap_txt_auto))



-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to