532 Commits

Author SHA1 Message Date
Frej Drejhammar
b0d5e56c8d Merge branch 'PR/247' v201029 2020-10-29 19:01:04 +01:00
Frej Drejhammar
787e8559b9 Fix typo in README 2020-10-29 19:00:30 +01:00
Henrik Tunedal
ab500a24a7 Add plugin for dropping commits from output 2020-10-29 12:04:27 +01:00
Frej Drejhammar
ead75895b0 Enable code analysis
Merge github generated workflow into master
2020-10-10 16:26:53 +02:00
Frej Drejhammar
bf5f14ddab Create codeql-analysis.yml 2020-10-10 13:15:54 +00:00
Frej Drejhammar
7057ce2c2b Allow plugins to modify the committer
Plugins have since they were introduced been able to modify the author
of a commit, but not the committer. This patch adds the necessary
support for allowing them to also modify the committer.
2020-09-30 17:47:33 +02:00
Frej Drejhammar
2b6f735b8c Update section about submitting patches in README
Try to cover the most common reasons for requesting changes in PRs.
2020-09-09 14:08:00 +02:00
Frej Drejhammar
71acb42a09 Merge branch 'PR/236-v2' into master
Implement a plugin converting unnamed heads to branches
2020-07-31 17:08:04 +02:00
Ondrej Stanek
a7955bc49b Update head2branch plugin to accept hg commit hash
The revision number isn't a unique identifier of commits across
repository clones and forks, while the hg hash is guaranteed to be stable.
2020-07-31 10:50:57 +02:00
Ondrej Stanek
9c6dea9fd4 Pass original hg commit hash to plugins 2020-07-31 10:50:51 +02:00
Ethan Furman
21827a53f7 Add head2branch plugin
Support converting unnamed heads to named branches during mercurial
conversions.

Co-Authored-By:	ostan89@gmail.com
2020-07-31 10:49:08 +02:00
Ethan Furman
5c1cbf82b0 Add revision to commit_data for commit plugins
Co-Authored-By: ostan89@gmail.com
2020-07-31 10:48:33 +02:00
Ondrej Stanek
50631c4b34 Add option --ignore-unnamed-heads
This option allows the user to ignore only unnamed heads (compared to --force
which ignores all non-fatal issues). The intended use is for a future plugin
converting unnamed heads to named branches.
2020-07-31 10:30:53 +02:00
Ethan Furman
2a9dd53d14 Show all unnamed heads at once
Co-Authored-By: ostan89@gmail.com
2020-07-31 10:27:07 +02:00
Frej Drejhammar
597093eaf1 Merge branch 'fix-233'
Closes #233
2020-07-10 16:52:17 +02:00
Frej Drejhammar
3910044a97 Avoid crash during rev-parse when the default encoding is ascii
In some locales the default encoding is ascii in which case
subprocess.check_output() will fail if it is given a non-ascii ref as
one of the arguments. By forcing the ref to be utf8 we will avoid a
crash while still behaving correctly when the default encoding is
utf8.

The credits for this fix go to Nikita Bazhinov for discovering the fix
and Chris J Billington for explaining it.

Co-Authored-By: Nikita Bazhinov <nbazhinov@syntellect.ru>
Co-Authored-By: Chris J Billington <chrisjbillington@gmail.com>
2020-07-10 16:41:38 +02:00
Frej Drejhammar
44c50d0fae Merge branch 'PR/226' 2020-05-07 20:10:24 +02:00
chrisjbillington
d29d30363b Fix backward incompatible change for hg < 5.1
The port to Python 3 in b961f146 changed `repo.branchmap().iteritems()`
to use `.items()` instead. However, the object returned by mercurial
isn't a dictionary and its `.items()` method was only introduced (as an
alias for `iteritems`) in hg 5.1. `iteritems()` still exists, so let's
keep using it for now to retain compatibility with hg < 5.1.
2020-05-06 11:59:49 -04:00
Frej Drejhammar
f102d2a69f Merge branch 'PR/223'
Closes #223
2020-05-06 16:31:13 +02:00
Ondrej Stanek
cf0e5837b6 Allow converting a repository with git and hg subrepos
In the verification phase, fast-export falsely expects that both hg
and git subrepositories should have the appropriate line in the
subrepo-map file. The case is, that only hg subrepos need a line in
subrepo-map that references a converted subrepo, while git
subrepositories do not.
2020-05-06 16:30:05 +02:00
Frej Drejhammar
61d22307af Merge branch 'PR/217'
Closes: #215
2020-03-26 20:17:20 +01:00
chrisjbillington
3b3f86b71e Allow utf8 in mappings
We were previously processing entries in mapping files (when
`--mappings-are-raw` is not given) with
`.decode('unicode_escape').encode('utf8')` to replace backslash escape
sequences in bytestrings with the utf-8 encoded characters they
represent. However, it turns out that `.decode
('unicode_escape')` assumes latin-1 encoding if it encounters non-ascii
bytes: https://bugs.python.org/issue21331. So this gave incorrect
results if non-ascii utf8 data was present in the mapping.

To fix this, we now add an extra layer of `.decode('utf8').encode
('unicode-escape')` in order to convert any non-ascii characters into
their backslash escape sequences. Then the subsequent
`.decode('unicode_escape')` only encounters ascii characters and gives
correct results.
2020-03-25 12:33:42 -04:00
Frej Drejhammar
e51844cd65 Merge branch 'PR/214'
Closes: #213
2020-03-25 16:09:01 +01:00
Toni Sissala
90eeef2ff4 Fix TypeError when using -M command line argument
hg-fast-export.sanitize_name expects branch name to be a bytes
object. Command line parser gives out str objects. Convert
possible str object to bytes in hg2git.set_default_branch().
2020-03-25 11:19:25 +02:00
Frej Drejhammar
7f4d9c3ad4 Merge branch 'PR/211' 2020-03-10 17:51:47 +01:00
Pi Delport
b37420f404 Fix link markup for hg-export-tool 2020-03-09 16:41:26 +02:00
Frej Drejhammar
f2aa47fdf7 Merge branch 'PR/210'
Closes #210.
2020-03-08 19:43:23 +01:00
chrisjbillington
6361b44c33 Fix bug in ignoring .git files/folders on Windows
Mercurial internally stores (most) filepaths using forward slashes, and
returns them as such from its Python API, even on Windows.

So the splitting up of filepaths with `os.path.sep` was incorrect,
resulting in `.git` files (those within a subdirectory, anyway)
not being ignored on Windows as intended. Splitting on `b'/'` regardless
of OS fixes this.
2020-03-08 19:40:50 +01:00
Frej Drejhammar
afeb58ae95 Merge branch 'PR/209' 2020-03-06 17:30:52 +01:00
chrisjbillington
48508ee299 Fix failure to print error message in verify_heads
On Python 3, `b'%s' % None` fails with a TypeError. In verify_heads,
an error message prints the sha1 of a git commit, but that sha1
can be None.

This commit instead prints `b'<None>'` if sha1 is None.
2020-03-06 11:02:38 -05:00
Frej Drejhammar
56da62847a Merge branch 'PR/208'
Closes #207.
2020-03-01 14:34:38 +01:00
Max Fuqua
750fe6d3e1 Resolve type error resulting from passing an int to b'%s' in python3 2020-02-29 14:55:15 -05:00
Frej Drejhammar
e4d6d433ec Merge branch 'PR/206' 2020-02-29 14:48:46 +01:00
Steven Peters
058c791b75 Check python's mercurial version for compatibility
When checking that python has the mercurial package in hg-fast-export.sh,
use the same import statement that is used in hg-fast-export.py.

hg-fast-export.py imports revsymbol from mercurial.scmutil,
which was introduced in mercurial 4.6, but Ubuntu 18.04 only has
mercurial 4.5.3 using python2, so an incompatible python version may be
chosen without this change.
2020-02-28 15:41:24 -08:00
Frej Drejhammar
13010f7a25 Merge branch 'PR/204'
Closes #203.
2020-02-21 16:34:03 +01:00
chrisjbillington
4071f720b0 Fix issue #203: Resolve stderr encoding issues
In Python 3, `sys.stderr.write()` requires unicode strings, and all
output on standard streams is UTF8 encoded. Therefore in the port to
Python 3, we `.decode()`d all strings that are used in `%` formatting of
strings to be printed to stderr.

However, in Python 2, `sys.stderr` accepts either bytestrings or unicode
strings, and:

- `%s` formatting of a bytestring with a unicode string, i.e  `"%s" %
  u"foo"` results in a unicode string.
- Writing a unicode string to stderr/stdout uses that stream's encoding
- When the output of the process is being piped somewhere other than a
  terminal (as it is when called with pipes and shell redirection from
  hg-fast-export.sh), that encoding is None, which implies ASCII.
- This raises UnicodeEncodeError if the unicode strings passed to
  `stderr.write()` have non-ascii characters.

We cannot fix this problem simply by encoding UTF8 again before writing
to stderr on Python 2. This is because the *decoding* of filenames with
the UTF8 codec may fail - filenames may not even be valid UTF8 desite
this being the declared filesystem encoding.

We could `fsdecode()` filenames on Python 3, which would use the
`surrogateescape` error handler, but stderr does not use this error
handler for output, meaning we would just have to encode again (with the
same error handler) anyway. And Python 2 lacks the `surrogateescape`
error handler in any case - we would need to reimplement it just to do a
round-trip decode and encode for no reason.

This commit leaves filenames and other repository data as bytestrings,
and simply writes them to `sys.stderr.buffer` on Python 3 or
`sys.stderr` on Python 2 as-is, after `%` formatting with bytestring
literals. This avoids encoding issues of filenames altogether.

Other writing to stderr that does not involve repository data has been
left with "native" strings, i.e.
`sys.stderr.write("a string literal %s" % a_command_line_arg)`. These
will still fail on Python 3 if the user passes a non-UTF filename as a
command line argument or similar. This is acceptable IMHO - although
`hg-fast-export` may encounter invalid UTF8 in mercurial repositories,
it is not too much to impose that the user name their branch mapping
files etc with valid UTF8!
2020-02-19 12:18:00 -05:00
Frej Drejhammar
160aa3c9ef Add a reference to hg-export-tool in the documentation
Add pointers to hg-export-tool as a way to batch convert multiple
Mercurial repos, and deal with duplicate heads.
2020-02-14 17:16:18 +01:00
Frej Drejhammar
883474184d Merge branch 'PR/201'
Closes 201
2020-02-14 17:01:35 +01:00
chrisjbillington
b961f146df Support Python 3
Port hg-fast-import to Python 2/3 polyglot code.

Since mercurial accepts and returns bytestrings for all repository data,
the approach I've taken here is to use bytestrings throughout the
hg-fast-import code. All strings pertaining to repository data are
bytestrings. This means the code is using the same string datatype for
this data on Python 3 as it did (and still does) on Python 2.

Repository data coming from subprocess calls to git, or read from files,
is also left as the bytestrings either returned from
subprocess.check_output or as read from the file in 'rb' mode.

Regexes and string literals that are used with repository data have
all had a b'' prefix added.

When repository data is used in error/warning messages, it is decoded
with the UTF8 codec for printing.

With this patch, hg-fast-export.py writes binary output to
sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it
still uses sys.stdout.

The only strings that are left as "native" strings and not coerced to
bytestrings are filepaths passed in on the command line, and dictionary
keys for internal data structures used by hg-fast-import.py, that do
not originate in repository data.

Mapping files are read in 'rb' mode, and thus bytestrings are read from
them. When an encoding is given, their contents are decoded with that
encoding, but then immediately encoded again with UTF8 and they are
returned as the resulting bytestrings

Other necessary changes were:

 - indexing byestrings with a single index returns an integer on Python.
   These indexing operations have been replaced with a one-element
   slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring.

 - raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash)

 - str(integer) -> b'%d' % integer

 - 'string_escape' codec replaced with 'unicode_escape' (which was
    backported to python 2.7). Strings decoded with this codec were then
    immediately re-encoded with UTF8.

 - Calls to map() intended to execute their contents immediately were
   unwrapped or converted to list comprehensions, since map() is an
   iterator and does not execute until iterated over.

hg-fast-export.sh has been modified to not require Python 2. Instead, if
PYTHON has not been defined, it checks python2, python, then python3,
and uses the first one that exists and can import the mercurial module.
2020-02-13 14:35:19 -05:00
Frej Drejhammar
595587b245 Merge branch 'PR/197'
Closes #197, #185, #196
v200213
2020-02-09 19:39:21 +01:00
Matthijs van der Burgh
0b6b83c3de Adapt to status becoming an object in Mercurial 5.3
Status has always been a tuple, but since 5.3, commit:
https://www.mercurial-scm.org/repo/hg/rev/c5548b0b6847, it is an object.
Therefore the __getitem__ of the tuple isn't available anymore.

This fix is compatible with mercurial>=4.6, as the old status tuple
still has the same properties.
2020-02-08 17:23:30 +01:00
Frej Drejhammar
29a457eccf Merge branch 'PR/198'
Closes 198
2020-02-08 16:08:56 +01:00
Frej Drejhammar
4bc6dec5eb Merge branch 'PR/199'
Closes #199
2020-02-08 16:05:01 +01:00
Frej Drejhammar
fa8ebd994d Add link to what's expected for commit messages to the README 2020-02-08 15:50:17 +01:00
Frej Drejhammar
e83501d30d Make README issue tracker link a Markdown link 2020-02-08 15:43:10 +01:00
chrisjbillington
8efbb57822 Add additional options to branch_name_in_commit plugin
- Allow skipping writing the branch name if the branch is 'master'.

- Allow writing the branch name on the same line as the first line of
  the commit message separated by a colon, instead of it having its own
  line.
2020-02-07 20:48:49 -05:00
chrisjbillington
8d135fe700 Ignore files and directories called .git
Git cannot track these files. Print a warning if encountering one.

Fixes #166
2020-02-07 17:52:57 -05:00
Frej Drejhammar
ed36227c62 Merge branch 'PR/192'
Closes #192
2020-01-31 17:12:30 +01:00
Frej Drejhammar
507c17cc1b Revert "Handle --force option correctly in any position"
This reverts commit 0c5617bf8d.

The changes turned out to require bash. Traditionally we have tried to
stay compatible with plain old sh, so this is a revert.

Closes #195.
2020-01-31 17:01:04 +01:00
James Douglass
1841ba4be9 Add a plugin to prefix an issue number with a user-defined string. 2020-01-29 14:18:17 -08:00