Register Log In

DMOZ import

Comments on DMOZ import


Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada

Total Topics: 12
Total Comments: 32
Posted Sep 27, 2005 - 5:01 PM:

I just installed WSN links and tried to import using dmozimporter.php

It appears it does not parse the report file from Tulip correctly.

I get the correct descriptions, but

URLS look like [<a href="http://dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.h....etc...

and titles look like this also [<a href="http://dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww......etc....

it does appear to extract the DMOZ categroy name correctly

It only extracts the first category, and does not find any sub-categories.

It also does not complete importing as it finds an error in category relations - but spidering completed, and it scanned the correct number of sites from the category group.

"A category other than your first one cannot find a parent, and is trying to add to the top level. This indicates a problem in your report file -- likely incomplete spidering. "

I've tried a few different categories and keep getting the same errors.

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Posted Sep 28, 2005 - 5:24 AM:

Make sure your tulip settings are the same as in the screenshot, and that you're using 6.02.

Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada

Total Topics: 12
Total Comments: 32
Posted Sep 28, 2005 - 6:24 AM:

I am using 6.02 - I've also used 6.03 and an older version.

I just did another test on a single category with only 4 links.

Here is part of the report.html file - the whole file is attached. It creates new sites links with the title and URLs including all the html - e.g the part starting [<a href="http://editors.dmoz.org/editors/editurl.cgi?url

---- code snip follows

<big>[<a href="http://editors.dmoz.org/editors/editcat.cgi?cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <a href="http://dmoz.org/Arts/Animation/Experimental/Digital/Shockwave/">Arts: Animation: Experimental: Digital: Shockwave</a></big><ul>
<li>[<a href="http://editors.dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.flashtoons.net&cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <span style="background-color: rgb(255, 255, 255);">[]</span> <a href="http://www.flashtoons.net/">Flashtoons.net</a> - Flash animation movies. Enjoy the latest new cartoon characters of the millennium.</li>
<li>[<a href="http://editors.dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.fonztv.nl%2F&cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <span style="background-color: rgb(255, 255, 255);">[]</span> <a href="http://www.fonztv.nl/">Fons Schiedon Visuals</a> - Illustration, graphic design and experiments in animation and webdesign.</li>
<li>[<a href="http://editors.dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.easystreet.com%2F%7Ejoanna%2F&cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <span style="background-color: rgb(255, 255, 255);">[]</span> <a href="http://www.easystreet.com/%7Ejoanna/">Priestley Motion Pictures</a> - Animations by Joanna Priestley (Shockwave plug-in required)</li>
<li>[<a href="http://editors.dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.thewoodcutter.com%2F&cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <span style="background-color: rgb(255, 255, 255);">[]</span> <a href="http://www.thewoodcutter.com/">Thewoodcutter.com</a> - Shockwave based interactive illustrations.</li>
<br clear="all">
<table summary="Totals" border="1">

Attached Files:
Forum Regular

Usergroup: Customer
Joined: Aug 05, 2005

Total Topics: 94
Total Comments: 272
Posted Sep 28, 2005 - 2:35 PM:

Under the Include tab, make sure all boxes are checked
Under the Options tab, make sure Show empty categories is checked

that should then work

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Posted Sep 28, 2005 - 3:02 PM:

I don't see what you have different, but here's a valid report file for comparison.

Attached Files:

Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada

Total Topics: 12
Total Comments: 32
Posted Sep 28, 2005 - 10:16 PM:

The file you supplied appears to import correctly on my system.

There are two differences I see.

You have 'http://editors.dmoz.org:8080 .... '

and I have "http://editors.dmoz.org ...."

I do not think the :8080 is the problem, but I can change that in Tulip - however I'm not sure why your file has single quotes, and mine has double. I did a quick manual edit of my file, and so far it

Note the double/single quotes are in several places e.g. <span style="background-color

If I global change those in my report.html file, then so far my test indicate that it imports. I will play some more.


Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada

Total Topics: 12
Total Comments: 32
Posted Sep 28, 2005 - 10:41 PM:

Ok, the problem is FireFox, after I generate the page, and go to save the HTML, the default is [Web Page Complate], and that causes the single quotes created by Tulip to change to Double quotes. If I change the save option to [Web Page. HTML only], then it saves the page without change.

I alos removed the 8080 option [that's in Tulip/Editor server and port and that works with or without

The solves part of the import, it now imports the first part of the file correctly, and then fails because of lines of code showing unreviewed sites. Now I get these created even if I uncheck that option in Tulip. You do not have these in your sample html file.

I'll have to look at this tomorrow.

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Posted Sep 29, 2005 - 10:28 AM:

Don't save the html through a browser, it's senseless to create duplicates. Use the file you're looking at, the location of which is shown in your browser's address bar.

Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada

Total Topics: 12
Total Comments: 32
Posted Oct 06, 2005 - 5:39 AM:

The other issue with 'unreviewed' entries seems to be a Windows 98 problem with Tulip - or possibly some hidden settings in cookies on my Win98 setup. If I rerun Tulip on the same machine after booting into W
Win XP, the file imports without problems.
Search thread for
Download thread as
  • 0/5
  • 1
  • 2
  • 3
  • 4
  • 5

This thread is closed, so you cannot post a reply.