Summary: | wp2git: Import Wikipedia page history to git | ||
---|---|---|---|
Product: | New/proposed packages | Reporter: | Ivan Zakharyaschev <imz> |
Component: | Обычный репозиторий | Assignee: | Andrey Cherepanov <cas> |
Status: | NEW --- | QA Contact: | Andrey Cherepanov <cas> |
Severity: | normal | ||
Priority: | P3 | CC: | viy |
Version: | не указана | ||
Hardware: | all | ||
OS: | Linux | ||
URL: | http://blog.thecybershadow.net/2010/06/16/import-wikipedia-page-history-to-git/ | ||
Bug Depends on: | |||
Bug Blocks: | 31414 |
Description
Ivan Zakharyaschev
2015-11-09 13:26:28 MSK
* It could be used also to import some ALT's documentation into Git repos which is being edited at http://altlinux.org * As for me, I'm going to use it to import the text of the GOST which is implemented by the LaTeX package in https://bugzilla.altlinux.org/show_bug.cgi?id=31414 from wikisource (https://ru.wikisource.org/wiki/%D0%93%D0%9E%D0%A1%D0%A2_7.32%E2%80%942001 ), where it is collaboratively maintained. BTW, when I try to use it, there are some problems. I can't post an issue to the project at github, probably because it is a fork. Though it's the fork where I should post it to, because it looks Python-related. Here are the errors I get (and the last run is successful -- with English Wikipedia; perhaps, my default is Russian because of the locale). As for now, I have no ideas as to whether something can be fixed in this program or in my environment. $ wp2git.py --help usage: wp2git.py [-h] [-n] [-o OUT] [--lang LANG | --site SITE] article_name Create a git repository with the history of the specified Wikipedia article. positional arguments: article_name optional arguments: -h, --help show this help message and exit -n, --no-import Don't invoke git fast-import; only generate fast-import data stream -o OUT, --out OUT Output directory or fast-import stream file --lang LANG Wikipedia language code (default ru) --site SITE Alternate site (e.g. http://commons.wikimedia.org[/w/]) $ wp2git.py --site https://ru.wikisource.org 'ГОСТ 7.32—2001' Connected to https://ru.wikisource.org/w/ Traceback (most recent call last): File "/home/imz/bin/wp2git.py", line 110, in <module> main() File "/home/imz/bin/wp2git.py", line 63, in main page = site.pages[args.article_name] File "/usr/lib64/python2.7/site-packages/mwclient/listing.py", line 156, in __getitem__ return self.get(name, None) File "/usr/lib64/python2.7/site-packages/mwclient/listing.py", line 166, in get namespace = self.guess_namespace(name) File "/usr/lib64/python2.7/site-packages/mwclient/listing.py", line 178, in guess_namespace if name.startswith(u'%s:' % self.site.namespaces[ns].replace(' ', '_')): UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) $ locale LANG=ru_RU.utf8 LC_CTYPE="ru_RU.utf8" LC_NUMERIC="ru_RU.utf8" LC_TIME="ru_RU.utf8" LC_COLLATE="ru_RU.utf8" LC_MONETARY="ru_RU.utf8" LC_MESSAGES=POSIX LC_PAPER="ru_RU.utf8" LC_NAME="ru_RU.utf8" LC_ADDRESS="ru_RU.utf8" LC_TELEPHONE="ru_RU.utf8" LC_MEASUREMENT="ru_RU.utf8" LC_IDENTIFICATION="ru_RU.utf8" LC_ALL= $ wp2git.py --site http://ru.wikisource.org 'ГОСТ 7.32—2001' Connected to http://ru.wikisource.org/w/ Traceback (most recent call last): File "/home/imz/bin/wp2git.py", line 110, in <module> main() File "/home/imz/bin/wp2git.py", line 63, in main page = site.pages[args.article_name] File "/usr/lib64/python2.7/site-packages/mwclient/listing.py", line 156, in __getitem__ return self.get(name, None) File "/usr/lib64/python2.7/site-packages/mwclient/listing.py", line 166, in get namespace = self.guess_namespace(name) File "/usr/lib64/python2.7/site-packages/mwclient/listing.py", line 178, in guess_namespace if name.startswith(u'%s:' % self.site.namespaces[ns].replace(' ', '_')): UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) $ wp2git.py Bear Connected to http://ru.wikipedia.org/w/ Traceback (most recent call last): File "/home/imz/bin/wp2git.py", line 110, in <module> main() File "/home/imz/bin/wp2git.py", line 65, in main p.error('Page %s does not exist' % s) NameError: global name 's' is not defined $ wp2git.py Медведь Connected to http://ru.wikipedia.org/w/ Traceback (most recent call last): File "/home/imz/bin/wp2git.py", line 110, in <module> main() File "/home/imz/bin/wp2git.py", line 63, in main page = site.pages[args.article_name] File "/usr/lib64/python2.7/site-packages/mwclient/listing.py", line 156, in __getitem__ return self.get(name, None) File "/usr/lib64/python2.7/site-packages/mwclient/listing.py", line 166, in get namespace = self.guess_namespace(name) File "/usr/lib64/python2.7/site-packages/mwclient/listing.py", line 178, in guess_namespace if name.startswith(u'%s:' % self.site.namespaces[ns].replace(' ', '_')): UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) $ wp2git.py --lang en Bear Connected to http://en.wikipedia.org/w/ Initialized empty Git repository in /home/imz/tests/test-wp2git/Bear/ >> Revision 239584 by TimShell at Wed Oct 10 21:50:27 2001: * >> Revision 346214979 by Alan Millar at Wed Oct 10 22:43:35 2001: Fixing panda back to giant panda >> Revision 50758 by Conversion script at Mon Feb 25 15:43:11 2002: Automated conversion >> Revision 87603 by Mirwin at Thu Apr 11 20:33:54 2002: Added grizzly bear to list >> Revision 88030 by 24.53.240.203 at Fri Jun 7 12:00:50 2002: * >> Revision 112194 by Stephen Gilbert at Fri Jun 7 17:39:15 2002: removing dictionary.com link >> Revision 132079 by PierreAbbat at Sun Jul 7 08:40:44 2002: restore accidentally deleted end of sentence >> Revision 192849 by Andre Engels at Wed Jul 31 07:41:08 2002: de-orphanizing an image >> Revision 227848 by Montrealais at Tue Sep 3 11:17:32 2002: >> Revision 227861 by 203.48.160.12 at Wed Sep 18 23:44:43 2002: >> Revision 398194 by Mav at Wed Sep 18 23:56:46 2002: REVERT from VANDALISM by 203.48.160.12 >> Revision 398212 by Fred Bauder at Fri Nov 1 17:01:16 2002: further reading >> Revision 590676 by Stormwriter at Fri Nov 1 17:07:50 2002: >> Revision 590687 by Karen Johnson at Thu Jan 16 11:07:36 2003: I'm not sure which type of bear this is, but uploading a pic I took >> Revision 590917 by MartinHarper at Thu Jan 16 11:24:04 2003: link [[bear market]] >> Revision 626017 by Robert Merkel at Thu Jan 16 13:28:02 2003: link to koala (mention it's *not* a bear >> Revision 626559 by Sannse at Tue Jan 28 12:39:35 2003: [[American]] -> [[United States|American]] >> Revision 629029 by 207.213.160.63 at Tue Jan 28 18:52:45 2003: >> Revision 629038 by Bronco~enwiki at Wed Jan 29 18:50:37 2003: Our fifth graders have finished for the time being. >> Revision 629079 by Bronco~enwiki at Wed Jan 29 18:53:21 2003: Done? >> Revision 659547 by Fred Bauder at Wed Jan 29 19:14:46 2003: removed information about authors of the article >> Revision 660956 by Alan Peakall at Tue Feb 11 12:53:34 2003: Copy edit and rationalised links to the Panda articles >> Revision 735916 by Ahoerstemeier at Tue Feb 11 22:09:12 2003: cave bear >> Revision 748674 by Montrealais at Mon Mar 10 01:37:20 2003: >> Revision 769708 by Kricxjo at Sat Mar 15 09:44:21 2003: eo: >> Revision 816500 by Fred Bauder at Sun Mar 23 12:31:46 2003: re use >> Revision 930991 by ArnoLagrange at Thu Apr 10 08:00:09 2003: de >> Revision 931028 by Tannin at Sat May 17 19:38:29 2003: >> Revision 988458 by Tannin at Sat May 17 19:48:11 2003: >> Revision 988462 by Eclecticology at Mon Jun 2 04:11:48 2003: fixing capitalization >> Revision 988465 by Eclecticology at Mon Jun 2 04:12:28 2003: >> Revision 988504 by Tannin at Mon Jun 2 04:12:56 2003: revert to correct case >> Revision 988507 by Eclecticology at Mon Jun 2 04:25:27 2003: revert to correct capitalization >> Revision 988932 by Tannin at Mon Jun 2 04:26:22 2003: revert >> Revision 988936 by Eclecticology at Mon Jun 2 08:13:45 2003: revert >> Revision 1015549 by Tannin at Mon Jun 2 08:14:53 2003: revert to correct version >> Revision 1122344 by ²¹² at Mon Jun 9 12:42:36 2003: >> Revision 1122374 by TeunSpaans at Mon Jul 7 12:34:28 2003: +nl >> Revision 1122394 by Andre Engels at Mon Jul 7 12:46:16 2003: merged Ursidae in here >> Revision 1122415 by Andre Engels at Mon Jul 7 12:55:18 2003: >> Revision 1122422 by Jimfbleak at Mon Jul 7 13:11:09 2003: treid to make text more grown-up >> Revision 1152613 by Rmhermen at Mon Jul 7 13:16:15 2003: typos >> Revision 1152615 by Andre Engels at Tue Jul 15 17:32:46 2003: made images wrap-around >> Revision 1160907 by Andre Engels at Tue Jul 15 17:33:18 2003: >> Revision 1320504 by Baldhur at Thu Jul 17 17:16:36 2003: + taxobox, standardising classification >> Revision 1320681 by 81.203.98.109 at Wed Aug 20 20:46:26 2003: >> Revision 1406742 by Rmhermen at Wed Aug 20 21:24:43 2003: >> Revision 1406748 by 62.64.204.83 at Sun Sep 7 20:43:36 2003: >> Revision 1411109 by 62.64.204.83 at Sun Sep 7 20:44:49 2003: >> Revision 1411169 by Rmhermen at Mon Sep 8 18:51:45 2003: $ That error happens with python-module-mwclient-0.6.5-alt1.1 from t7. Sisyphus has a newer version. I shall try that one. No, the same error happens with python-module-mwclient-0.7-alt1.dev.git20140622 |