I just installed WSN links and tried to import using dmozimporter.php
It appears it does not parse the report file from Tulip correctly.
I get the correct descriptions, but
URLS look like [<a href="http://dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.h....etc...
and titles look like this also [<a href="http://dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww......etc....
it does appear to extract the DMOZ categroy name correctly
It only extracts the first category, and does not find any sub-categories.
It also does not complete importing as it finds an error in category relations - but spidering completed, and it scanned the correct number of sites from the category group.
"A category other than your first one cannot find a parent, and is trying to add to the top level. This indicates a problem in your report file -- likely incomplete spidering. "
I've tried a few different categories and keep getting the same errors.
I am using 6.02 - I've also used 6.03 and an older version.
I just did another test on a single category with only 4 links.
Here is part of the report.html file - the whole file is attached. It creates new sites links with the title and URLs including all the html - e.g the part starting [<a href="http://editors.dmoz.org/editors/editurl.cgi?url
The file you supplied appears to import correctly on my system.
There are two differences I see.
You have 'http://editors.dmoz.org:8080 .... '
and I have "http://editors.dmoz.org ...."
I do not think the :8080 is the problem, but I can change that in Tulip - however I'm not sure why your file has single quotes, and mine has double. I did a quick manual edit of my file, and so far it
Note the double/single quotes are in several places e.g. <span style="background-color
If I global change those in my report.html file, then so far my test indicate that it imports. I will play some more.
Ok, the problem is FireFox, after I generate the page, and go to save the HTML, the default is [Web Page Complate], and that causes the single quotes created by Tulip to change to Double quotes. If I change the save option to [Web Page. HTML only], then it saves the page without change.
I alos removed the 8080 option [that's in Tulip/Editor server and port and that works with or without
The solves part of the import, it now imports the first part of the file correctly, and then fails because of lines of code showing unreviewed sites. Now I get these created even if I uncheck that option in Tulip. You do not have these in your sample html file.
Don't save the html through a browser, it's senseless to create duplicates. Use the file you're looking at, the location of which is shown in your browser's address bar.
The other issue with 'unreviewed' entries seems to be a Windows 98 problem with Tulip - or possibly some hidden settings in cookies on my Win98 setup. If I rerun Tulip on the same machine after booting into W Win XP, the file imports without problems.
0/5
1
2
3
4
5
This thread is closed, so you cannot post a reply.
Comments on DMOZ import
Member
Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada
Total Topics: 12
Total Comments: 32
I just installed WSN links and tried to import using dmozimporter.php
It appears it does not parse the report file from Tulip correctly.
I get the correct descriptions, but
URLS look like [<a href="http://dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.h....etc...
and titles look like this also [<a href="http://dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww......etc....
it does appear to extract the DMOZ categroy name correctly
It only extracts the first category, and does not find any sub-categories.
It also does not complete importing as it finds an error in category relations - but spidering completed, and it scanned the correct number of sites from the category group.
"A category other than your first one cannot find a parent, and is trying to add to the top level. This indicates a problem in your report file -- likely incomplete spidering. "
I've tried a few different categories and keep getting the same errors.
developer
Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California
Total Topics: 61
Total Comments: 7868
Make sure your tulip settings are the same as in the screenshot, and that you're using 6.02.
Member
Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada
Total Topics: 12
Total Comments: 32
I am using 6.02 - I've also used 6.03 and an older version.
I just did another test on a single category with only 4 links.
Here is part of the report.html file - the whole file is attached. It creates new sites links with the title and URLs including all the html - e.g the part starting [<a href="http://editors.dmoz.org/editors/editurl.cgi?url
---- code snip follows
document.write("</big></p>");
document.styleSheets[0].cssRules[0].style.display="none";
}
--></script>
<big>[<a href="http://editors.dmoz.org/editors/editcat.cgi?cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <a href="http://dmoz.org/Arts/Animation/Experimental/Digital/Shockwave/">Arts: Animation: Experimental: Digital: Shockwave</a></big><ul>
<li>[<a href="http://editors.dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.flashtoons.net&cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <span style="background-color: rgb(255, 255, 255);">[]</span> <a href="http://www.flashtoons.net/">Flashtoons.net</a> - Flash animation movies. Enjoy the latest new cartoon characters of the millennium.</li>
<li>[<a href="http://editors.dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.fonztv.nl%2F&cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <span style="background-color: rgb(255, 255, 255);">[]</span> <a href="http://www.fonztv.nl/">Fons Schiedon Visuals</a> - Illustration, graphic design and experiments in animation and webdesign.</li>
<li>[<a href="http://editors.dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.easystreet.com%2F%7Ejoanna%2F&cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <span style="background-color: rgb(255, 255, 255);">[]</span> <a href="http://www.easystreet.com/%7Ejoanna/">Priestley Motion Pictures</a> - Animations by Joanna Priestley (Shockwave plug-in required)</li>
<li>[<a href="http://editors.dmoz.org/editors/editurl.cgi?url=http%3A%2F%2Fwww.thewoodcutter.com%2F&cat=Arts/Animation/Experimental/Digital/Shockwave/">EDIT</a>] <span style="background-color: rgb(255, 255, 255);">[]</span> <a href="http://www.thewoodcutter.com/">Thewoodcutter.com</a> - Shockwave based interactive illustrations.</li>
</ul>
<br clear="all">
<table summary="Totals" border="1">
Attached Files:
Forum Regular
Usergroup: Customer
Joined: Aug 05, 2005
Total Topics: 94
Total Comments: 272
Under the Include tab, make sure all boxes are checked
Under the Options tab, make sure Show empty categories is checked
that should then work
developer
Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California
Total Topics: 61
Total Comments: 7868
I don't see what you have different, but here's a valid report file for comparison.
Attached Files:
Member
Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada
Total Topics: 12
Total Comments: 32
The file you supplied appears to import correctly on my system.
There are two differences I see.
You have 'http://editors.dmoz.org:8080 .... '
and I have "http://editors.dmoz.org ...."
I do not think the :8080 is the problem, but I can change that in Tulip - however I'm not sure why your file has single quotes, and mine has double. I did a quick manual edit of my file, and so far it
Note the double/single quotes are in several places e.g. <span style="background-color
If I global change those in my report.html file, then so far my test indicate that it imports. I will play some more.
Member
Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada
Total Topics: 12
Total Comments: 32
Ok, the problem is FireFox, after I generate the page, and go to save the HTML, the default is [Web Page Complate], and that causes the single quotes created by Tulip to change to Double quotes. If I change the save option to [Web Page. HTML only], then it saves the page without change.
I alos removed the 8080 option [that's in Tulip/Editor server and port and that works with or without
The solves part of the import, it now imports the first part of the file correctly, and then fails because of lines of code showing unreviewed sites. Now I get these created even if I uncheck that option in Tulip. You do not have these in your sample html file.
I'll have to look at this tomorrow.
developer
Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California
Total Topics: 61
Total Comments: 7868
Don't save the html through a browser, it's senseless to create duplicates. Use the file you're looking at, the location of which is shown in your browser's address bar.
Member
Usergroup: Customer
Joined: Sep 27, 2005
Location: Canada
Total Topics: 12
Total Comments: 32
The other issue with 'unreviewed' entries seems to be a Windows 98 problem with Tulip - or possibly some hidden settings in cookies on my Win98 setup. If I rerun Tulip on the same machine after booting into W
Win XP, the file imports without problems.