When Plugins are used in a repository that contains largefiles,
the following exception is thrown as soon as the first largefile
is converted:
```
Traceback (most recent call last):
File "fast-export/hg-fast-export.py", line 728, in <module>
sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile,
File "fast-export/hg-fast-export.py", line 581, in hg2git
c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap,
File "fast-export/hg-fast-export.py", line 366, in export_commit
export_file_contents(ctx,man,modified,hgtags,fn_encoding,plugins)
File "fast-export/hg-fast-export.py", line 222, in export_file_contents
file_data = {'filename':filename,'file_ctx':file_ctx,'data':d}
UnboundLocalError: local variable 'file_ctx' referenced before assignment
```
This commit fixes the error by:
* initializing the file_ctx before the largefile handling takes place
* Providing a new `is_largefile` value for plugins so they can detect
if largefile handling was applied (and therefore the file_ctx
object may no longer be in sync with the git version of the file)
The `file_data_filter` method should be called when files are deleted.
In this case the `data` and `file_ctx` keys map to None. This is so
that a filter which modifies file names can apply the same name
transformations before files are deleted.
Python 3.12 has removed imp and it's recommended to use importlib
instead. Python 2.7 doesn't have importlib, so Python 2.7 support is
ceased (not a big deal since it's been more than 3 years since it was
EOLed) as a part of this change.
The `git init` command can create the directory, and HEAD doesn't need
to be specified in `git checkout` (it's the default).
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Try to make it clear that sloppy, throw it over the fence, patches
won't be accepted without revision and try to make sure a potential
contributor sees the warning while creating a pull request.
Plugins have since they were introduced been able to modify the author
of a commit, but not the committer. This patch adds the necessary
support for allowing them to also modify the committer.
We were previously processing entries in mapping files (when
`--mappings-are-raw` is not given) with
`.decode('unicode_escape').encode('utf8')` to replace backslash escape
sequences in bytestrings with the utf-8 encoded characters they
represent. However, it turns out that `.decode
('unicode_escape')` assumes latin-1 encoding if it encounters non-ascii
bytes: https://bugs.python.org/issue21331. So this gave incorrect
results if non-ascii utf8 data was present in the mapping.
To fix this, we now add an extra layer of `.decode('utf8').encode
('unicode-escape')` in order to convert any non-ascii characters into
their backslash escape sequences. Then the subsequent
`.decode('unicode_escape')` only encounters ascii characters and gives
correct results.
Port hg-fast-import to Python 2/3 polyglot code.
Since mercurial accepts and returns bytestrings for all repository data,
the approach I've taken here is to use bytestrings throughout the
hg-fast-import code. All strings pertaining to repository data are
bytestrings. This means the code is using the same string datatype for
this data on Python 3 as it did (and still does) on Python 2.
Repository data coming from subprocess calls to git, or read from files,
is also left as the bytestrings either returned from
subprocess.check_output or as read from the file in 'rb' mode.
Regexes and string literals that are used with repository data have
all had a b'' prefix added.
When repository data is used in error/warning messages, it is decoded
with the UTF8 codec for printing.
With this patch, hg-fast-export.py writes binary output to
sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it
still uses sys.stdout.
The only strings that are left as "native" strings and not coerced to
bytestrings are filepaths passed in on the command line, and dictionary
keys for internal data structures used by hg-fast-import.py, that do
not originate in repository data.
Mapping files are read in 'rb' mode, and thus bytestrings are read from
them. When an encoding is given, their contents are decoded with that
encoding, but then immediately encoded again with UTF8 and they are
returned as the resulting bytestrings
Other necessary changes were:
- indexing byestrings with a single index returns an integer on Python.
These indexing operations have been replaced with a one-element
slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring.
- raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash)
- str(integer) -> b'%d' % integer
- 'string_escape' codec replaced with 'unicode_escape' (which was
backported to python 2.7). Strings decoded with this codec were then
immediately re-encoded with UTF8.
- Calls to map() intended to execute their contents immediately were
unwrapped or converted to list comprehensions, since map() is an
iterator and does not execute until iterated over.
hg-fast-export.sh has been modified to not require Python 2. Instead, if
PYTHON has not been defined, it checks python2, python, then python3,
and uses the first one that exists and can import the mercurial module.
Document the default behavior of renaming the `default` hg branch to `master`
on git, and how to override from the command line when this causes problems.
See also: #182
Make it possible to completely disable the name sanitizer by the
--no-auto-sanitize flag. Previously the sanitizer was run on user
remapped names. As the sanitizer rewrites perfectly legal git
names (such as __.*) this is probably not what the user wants.
Closes#155.
This adds a new command line option (--subrepo-map) that will
map mercurial subrepos to git submodules.
The --subrepo-map takes a mapping file as an argument that will
be used to map a subrepo folder to a git submodule.
For more information see the README-SUBMODULES.md.
This commit is inspired by the changes made by daolis in PR#38
that was never merged.
Closes: #51Closes: #147
Change <repo> to <local-repo> so that it's clear that we invoke from a local repository;
Add 'git checkout HEAD' command as we need to run it as the final step.
Thanks