Webmastersite.net
Register Log In

DMOZ importer
and regerating after it

Comments on DMOZ importer

mariow
Forum Regular

Usergroup: Customer
Joined: Jul 09, 2008

Total Topics: 22
Total Comments: 110
mariow
#16 - Quote - Permalink
Posted Jul 15, 2008 - 4:32 AM:

the first one i imported was my country computer category.
that was over 2MB and 5400 links
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
#17 - Quote - Permalink
Posted Jul 15, 2008 - 3:53 PM:

TulipChain seems to work better now than it used to for some reason, I can do fairly large categories in it now.

The DMOZ importer will run about 50 times faster in today's release, and thus handle much larger files. It can also handle the report file format of your previous post.
mariow
Forum Regular

Usergroup: Customer
Joined: Jul 09, 2008

Total Topics: 22
Total Comments: 110
mariow
#18 - Quote - Permalink
Posted Jul 15, 2008 - 4:03 PM:

You upgraded the dmoz importer?
Your fast... smiling face
Is that a single file download Paul ?

(note: it sure is positive that the importer can handle big files better,but we still cant gain time on the regenerate process)
mariow
Forum Regular

Usergroup: Customer
Joined: Jul 09, 2008

Total Topics: 22
Total Comments: 110
mariow
#19 - Quote - Permalink
Posted Jul 15, 2008 - 6:35 PM:

That sounds good huh...
But thats local....generally a server would take longer i think..
but paul....it was so fast i can hardly imagine the import went well...
im of to bed and will see it in the morning as it has to regenerate about 3900 cats and 22000 links all together..

EDIT: ok,the file i did while sleeping was 4.6MB and had 8857 links.but the category shows 3,817 .
its still regenerating...and dont know if that has an effect.
it looks like everything is imported as i see that all by alphabet is inserted...
the last cat from the big file (letter W) shows.
but strange it doesnt show 8857 links.

And...when i went to my computer i should see the starting page of the regenerator but saw a buch of java stuff with name dynamicdrive.
so.i didnt expected any problems as you upgraded the file and i have max execution time set to 0.
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
#20 - Quote - Permalink
Posted Jul 16, 2008 - 12:18 PM:

Server should be the same time, and that's about what I'd expect from my tests. Cutting directly to mysql insertions instead of creating objects make a huge difference. In my tests the number shown matches the number in the report file once regeneration is fully complete. If it doesn't for you, send me your report file.

And...when i went to my computer i should see the starting page of the regenerator but saw a buch of java stuff with name dynamicdrive.

That sounds like a page dying before it could complete, maybe, which would mean regeneration didn't finish. If you could give me the actual HTML source of what you saw, and then tell me what you see when you press the back button, I'd have a better idea. Could be you had some sort of intensive process that ran at a set time which temporarily killed your local server.
mariow
Forum Regular

Usergroup: Customer
Joined: Jul 09, 2008

Total Topics: 22
Total Comments: 110
mariow
#21 - Quote - Permalink
Posted Jul 16, 2008 - 12:31 PM:

hi paul, well copying the error is what i normaly do but completely forgot as i just came out of bed when i saw it..grin
but the final counting did showed but after a long time regenerating and when it did run the cats counting again by itself....thats what happend when i inserted another file..
all seems to be working,but the regenerating is a killer.
it takes to much time and specialy when the totals increase.

amazing how you solved the importing cause it goes with such a speed.
but a solution to the regenerating would be a good thing..

isnt it possible for it to continue counting from its last run ?
after a big is added?
mariow
Forum Regular

Usergroup: Customer
Joined: Jul 09, 2008

Total Topics: 22
Total Comments: 110
mariow
#22 - Quote - Permalink
Posted Jul 17, 2008 - 7:48 AM:

ok paul...
i finished abnother category..
Before i started i had 34878 links and 9 categories.
the file was 5.22 MB ,had 13219 link in it.
I started the regeneration at 03:04 (my time) and it was finished 8 hours later.
total now 48097 links in 10 categories.
Thats a bit long huh.....
Thats local...with dual core processor....so thats fast enough...
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
#23 - Quote - Permalink
Posted Jul 17, 2008 - 11:51 AM:

Mysql is the bottleneck for regeneration, so there's very little I can do about it. I'll check over the code to combine updates and remove anything unnecessary in 5.0, but I doubt the improvement will be very significant.

Is it categories that are slowest to regenerate, or links? I can probably speed up links more than categories.

Running mysql table optimization (near the top of advanced options) could also speed it up.
mariow
Forum Regular

Usergroup: Customer
Joined: Jul 09, 2008

Total Topics: 22
Total Comments: 110
mariow
#24 - Quote - Permalink
Posted Jul 17, 2008 - 12:03 PM:

Well its hard to tell which one is slowing down..
now i have 10 categories (4827 counting subcategories0
but i do have a feeling the cats are slower now...,and as it goes 10 by 10 a time it takes up a long time..
mariow
Forum Regular

Usergroup: Customer
Joined: Jul 09, 2008

Total Topics: 22
Total Comments: 110
mariow
#25 - Quote - Permalink
Posted Jul 17, 2008 - 12:11 PM:

Hi paul,

I just noticed something weird...
From a lot of imported links the site description isnt taken along with it...
I checked the import file and the links include site description...


Any way to do a mass "get meta description" somehow ?
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
#26 - Quote - Permalink
Posted Jul 17, 2008 - 6:22 PM:

Looks like an issue with the new dmoz importer... I really should've only put it in 5.0. Fixed shortly.

No way to get descriptions after the fact.
mariow
Forum Regular

Usergroup: Customer
Joined: Jul 09, 2008

Total Topics: 22
Total Comments: 110
mariow
#27 - Quote - Permalink
Posted Jul 17, 2008 - 11:25 PM:

ah i see....
so its kinda messed up now...
so i have to start over again...sad
mariow
Forum Regular

Usergroup: Customer
Joined: Jul 09, 2008

Total Topics: 22
Total Comments: 110
mariow
#28 - Quote - Permalink
Posted Jul 17, 2008 - 11:28 PM:

btw paul....it happend again...
was just file with 8000 links and when regenerating i received the following error..
not sure if it was with the cats or links..


function toggleBox(szDivID) { var obj = document.getElementById(szDivID); if (obj.style.visibility == "visible") iState = 0; else iState = 1; obj.style.visibility = iState ? "visible" : "hidden"; obj.style.height = iState ? "100%" : "0px"; } /* Select and Copy form element script- By Dynamicdrive.com For full source, Terms of service, and 100s DTHML scripts Visit http://www.dynamicdrive.com */ //specify whether contents should be auto copied to clipboard (memory) //Applies only to IE 4+ //0=no, 1=yes var copytoclip=1 function HighlightAll(theField) { var tempval=eval("document."+theField) tempval.focus() tempval.select() if (document.all&©toclip==1){ therange=tempval.createTextRange() therange.execCommand("Copy") window.status="Contents highlighted and copied to clipboard" setTimeout("window.status=''",3800) } } function checkall() { var boxes = document.getElementsByName('selection[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('link[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('linkedit[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('linkid[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('cat[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('comment[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('member[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('feed[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('rating[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('attach[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('event[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; var boxes = document.getElementsByName('quote[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = true; } function uncheckall() { var boxes = document.getElementsByName('selection[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = false; var boxes = document.getElementsByName('link[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = false; var boxes = document.getElementsByName('linkedit[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = false; var boxes = document.getElementsByName('linkid[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = false; var boxes = document.getElementsByName('cat[]'); for (i=0; i < boxes.length; i++) boxes[i].checked = false; var boxes = document.getElementsByName('comment[]'); for (i=0; i < boxes.length; i++)

Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
#29 - Quote - Permalink
Posted Jul 18, 2008 - 2:15 AM:

Fixed descriptions... but out of a report file with 16,839 I only get 1,268 loaded, so I'm going to put the old dmoz importer back into the 4.1 series and keep the experimenting to 5.0.
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
#30 - Quote - Permalink
Posted Jul 18, 2008 - 2:21 AM:

Implimented (in 5.0) a couple of new speedup option checkboxes, then tested with 21096 links, 1408 subcategories. "Regenerate everything" took 76 minutes. Links are the definite slowest spot for me, just due to the number of them.

At any rate, figure out what you want to import and do all at once so you only have to regenerate it once.
Search thread for
Download thread as
  • 0/5
  • 1
  • 2
  • 3
  • 4
  • 5



Sorry, you don't have permission to post posts. Log in, or register if you haven't yet.